Photovoltaic array fault risk diagnosis method and device based on multi-mode data fusion, equipment and medium

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
By employing a multi-modal data fusion method, combining electrical operation data and inspection image data, and using a temporal convolutional network with multi-layer causal dilated convolution and residual connection modules, accurate prediction and diagnosis of photovoltaic array fault risks were achieved. This solved the problem of insufficient multi-source information fusion in existing technologies and improved the ability to identify and warn of fault risks.

CN122241041APending Publication Date: 2026-06-19CHANGSHA UNIVERSITY OF SCIENCE AND TECHNOLOGY

View PDF 0 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Applications(China)
Current Assignee / Owner: CHANGSHA UNIVERSITY OF SCIENCE AND TECHNOLOGY
Filing Date: 2026-02-02
Publication Date: 2026-06-19

Application Information

Patent Timeline

02 Feb 2026

Application

19 Jun 2026

Publication

CN122241041A

IPC: G06F18/20; G06F18/10; G06F18/25; G06F18/213; G06N3/042; G06N3/0464; G06N3/082; G06N3/0455; G08B31/00; G08B21/18; H02J3/001; H02J3/38; H02J101/24

AI Tagging

Application Domain

Single network parallel feeding arrangements Biological models

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

Technical Problem

Existing fault risk diagnosis technologies for photovoltaic power plants are unable to effectively integrate multi-source information such as electrical operation data and inspection images, and cannot characterize the coupled evolution and multi-scale features of risks. This results in insufficient interpretability and traceability of latent risks, and makes it impossible to achieve stable identification and graded early warning of photovoltaic array fault risks.

Method used

A multi-modal data fusion method is adopted. By preprocessing the time-series data of electrical parameters, environmental monitoring data and inspection image data of the target photovoltaic power station, a basic data sample set is constructed. Then, the time-series feature vector and image feature vector are extracted using a multi-modal data feature extraction model. The effective information contribution of each modality feature is calculated and weighted fusion is performed. Finally, the data is input into a risk prediction model for fault risk diagnosis. The risk prediction model is a time-series convolutional network composed of multi-layer causal dilated convolution and residual connection modules for modeling and analysis.

Benefits of technology

It enables accurate prediction and diagnosis of photovoltaic array failure risks, improves the interpretability and traceability of latent risks, reduces the probability of missed or misdiagnosed faults, reduces the cost of blind inspections, and improves the operational stability and power generation reliability of photovoltaic arrays.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN122241041A_ABST

Patent Text Reader

Abstract

This invention discloses a method, device, equipment, and medium for photovoltaic array fault risk diagnosis based on multi-modal data fusion. The method includes: preprocessing multi-source monitoring data of the target photovoltaic power station to construct a basic data sample set, inputting it into a multi-modal feature extraction model to output multi-modal features, calculating the effective information contribution of each modal feature, weighted fusion to obtain a fused feature vector, inputting the feature into a temporal convolutional network model composed of causal dilated convolution and residual connections, modeling its time dependence and evolution mode to output risk prediction results, and completing the photovoltaic array fault risk diagnosis. Because this invention fully utilizes the effective contribution of multi-source information through multi-modal data fusion and temporal modeling, it accurately portrays the risk coupling evolution and multi-scale features, improves the interpretability and traceability of latent risks, and realizes stable identification and hierarchical early warning of photovoltaic array fault risks.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of photovoltaic array fault risk diagnosis based on multi-mode data fusion, and particularly to a method, apparatus, equipment, medium, and program product for photovoltaic array fault risk diagnosis based on multi-mode data fusion. Background Technology

[0002] Photovoltaic power plants are generally characterized by remote geographical locations and variable operating environments, resulting in multi-source, heterogeneous, and insidiously evolving fault characteristics in their photovoltaic modules. Existing photovoltaic module fault risk diagnosis technologies typically rely on operational data and inspection information for identification and alarm functions. Common solutions collect operational parameters such as voltage, current, power, temperature, and insulation status at the module or string level, and combine this with inverter-side monitoring records and external conditions such as environmental irradiance and temperature, using threshold criteria, power deviation comparisons, or parameter identification to determine anomalies. Another approach relies on inspection methods to obtain information on module appearance and thermal anomalies. For example, infrared thermography images are used to identify hot spots and localized heating areas, and luminescent detection images are used to identify problems such as microcracks and welding defects. Then, image processing or data-driven models are used to complete defect identification and location, and output fault types or alarm conclusions.

[0003] In actual power plant operation, a single diagnostic approach cannot simultaneously address the multi-scale evolution of risks and the real-time nature of diagnosis. Furthermore, current methods struggle to effectively integrate multi-source information such as electrical operation data and inspection images, and fail to characterize the coupled evolution and multi-scale features of risks. This results in insufficient interpretability and traceability of latent risks, hindering the stable identification and tiered early warning of photovoltaic array fault risks. Summary of the Invention

[0004] The main objective of this invention is to provide a photovoltaic array fault risk diagnosis method, device, equipment, medium, and program product based on multi-mode data fusion. It aims to solve the technical problems of existing technologies, which are unable to effectively integrate multi-source information such as electrical operation data and inspection images, and are unable to characterize the risk coupling evolution and multi-scale features, resulting in insufficient interpretability and traceability of latent risks, and thus failing to achieve stable identification and graded early warning of photovoltaic array fault risks.

[0005] To achieve the above objectives, this invention provides a photovoltaic array fault risk diagnosis method based on multi-mode data fusion, the method comprising the following steps: Data monitoring is performed on the target photovoltaic power station, and the multi-source monitoring data is preprocessed to construct a basic data sample set. The multi-source monitoring data includes the time series data of the electrical parameters of the target photovoltaic power station, the time series data of environmental monitoring, and the inspection image data. The basic data sample set is input into a pre-constructed multimodal data feature extraction model, and multimodal features are output, including temporal feature vectors and image feature vectors. Calculate the effective information contribution of each modality feature, and perform weighted fusion of the multimodal features based on the effective information contribution to obtain a fused feature vector; The fused feature vector is input into a pre-built risk prediction model, which outputs risk prediction results. Based on the risk prediction results, the photovoltaic array components of the target photovoltaic power station are diagnosed for fault risk. The risk prediction model is a temporal convolutional network composed of multi-layer causal dilated convolution and residual connection modules. The risk prediction model is configured to model and analyze the time dependence and evolution pattern of the fused feature vector.

[0006] Optionally, the step of monitoring the target photovoltaic power station and preprocessing the multi-source monitoring data to construct a basic data sample set includes: The electrical parameter time-series data in the operation and management system of the target photovoltaic power station are monitored. The electrical parameter time-series data is structured and then cleaned to obtain electrical parameter sequence samples. Environmental monitoring time-series data of environmental monitoring points in the operation scenario of the target photovoltaic power station are monitored, the environmental monitoring time-series data are structured, and the structured environmental monitoring time-series data are cleaned to obtain environmental parameter sequence samples. Obtain inspection image data of the target photovoltaic power station, perform feature analysis on the inspection image data, and extract candidate image samples related to defects in the photovoltaic array components from the inspection image data; The candidate image samples are bound to the component identifiers and timestamp identifiers of the associated photovoltaic array components to generate initial image modal samples. The initial image modal samples are then cleaned to obtain target image modal samples. A basic data sample set is constructed based on electrical parameter sequence samples, environmental parameter sequence samples, target image modal samples, and the time information of each sample.

[0007] Optionally, the multimodal data feature extraction model includes a temporal feature extraction module and a visual feature extraction module. The temporal feature extraction module is constructed based on a temporal encoder of the Informer model, and the visual feature extraction module is constructed based on an image encoder of the visual Transformer model. The time-series feature extraction module is configured to encode the electrical parameter sequence samples and environmental parameter sequence samples in the basic data sample set using a sparse attention mechanism, and output a time-series feature vector. The visual feature extraction module is configured to encode the target image modality samples in the basic data sample set using a self-attention mechanism and output an image feature vector.

[0008] Optionally, the time-series feature extraction module is further configured to concatenate the input electrical parameter sequence sample and environmental parameter sequence sample according to the time dimension to obtain a joint time-series data sample at the same time scale, and to perform a linear transformation on the joint time-series data sample to obtain a query vector matrix, a key vector matrix and a value vector matrix. The temporal feature extraction module is further configured to perform local attention calculation based on the query vector matrix, key vector matrix, and value vector matrix using a sparse attention mechanism to obtain the local attention probability distribution at each time step, and generate global attention features based on the local attention probability distribution, as shown in the following formula: in, Represents global attention features. , and These represent the query vector matrix, key vector matrix, and value vector matrix, respectively. Let represent the local attention probability distribution for all key vectors at the i-th time step. Indicates transpose. Representing the query matrix The row vector corresponding to the i-th time step. The dimension of the attention feature space is used to scale the query-key product result. This represents the activation function, which is used to transform the input time-series data samples into a probability distribution representation; The temporal feature extraction module is further configured to calculate the information entropy of the attention probability distribution at each time step based on the global attention features, and to filter the sparse location set according to the information entropy, as shown in the following formula: in, Represents the attention probability distribution at the i-th time position. Information entropy This represents the attention probability distribution at the i-th time position. In the diagram, the probability value corresponding to the j-th key vector is... This represents the total number of key vectors in the key matrix. This represents the set of sparse locations obtained by filtering based on information entropy. This represents a function that takes the minimum value of the corresponding variable. The number of elements is represented by , and u represents the size of the sparse position set. Represents the sampling factor constant. This represents the floor function; The temporal feature extraction module is further configured to determine the attention weight vector of the sparse positions based on the sparse position set and the global attention features, and to determine the temporal feature vector of the sparse positions based on the attention weight vector. The temporal feature vector of the non-sparse positions is then filled with the mean value, as shown in the following formula: in, Indicates the first Attention weight vectors at each time position, Indicates the first Attention weight vector at each position In the diagram, the weight value corresponding to the j-th key is... Value matrix The row vector corresponding to the j-th time position in the middle. This represents the temporal feature vector output at the i-th time position.

[0009] Optionally, the visual feature extraction module is further configured to divide the input target image modality sample into multiple non-overlapping image blocks according to a preset size, flatten the multiple non-overlapping image blocks into image vectors, and concatenate the image vectors in sequence to obtain an image block sequence matrix; The visual feature extraction module is further configured to define a linear embedding matrix and a bias based on the preset size and the number of channels of the target image modality samples, and to map the image patch sequence matrix to the image feature space of the visual Transformer model based on the linear embedding matrix and the bias to generate an embedding representation vector, as shown in the following formula: in, This represents the k-th embedding vector. Represents a linear embedding matrix. Indicates bias. Indicates the embedding dimension. This represents the image vector after the k-th image patch has been flattened and transformed. This represents the number of channels in the modal samples of the target image. This indicates a preset size, which is the length of the image vector. This represents the dimension of the linear embedding matrix. Indicates the total number of image patches; The visual feature extraction module is further configured to initialize and generate a position encoding vector and a category label vector based on the embedding dimension, and generate sequence features based on the position encoding vector, the category label vector, and the embedding representation vector, as shown in the following formula: in, Representing sequence features, This represents the encoding vector at the k-th position. Represents the category label vector, This represents the total length of the category label vector plus the image patch sequence. Dimensions representing sequence features; The visual feature extraction module is further configured to input the sequence features into a multilayer image encoder for feature encoding and output an image feature vector. The interlayer mapping of the multilayer image encoder is defined as follows: in, This represents the image feature vector output by the i-th layer image encoder. For encoder layers, This represents an encoder block composed of a multi-head self-attention module and a feedforward network, used to model global correlation features between different image blocks and extract image representation features.

[0010] Optionally, the step of calculating the effective information contribution of each modal feature and weighting and fusing the multimodal features based on the effective information contribution to obtain a fused feature vector includes: The probability vectors of each modality feature for different risk levels are calculated using the modality classification head, referring to the following formula: in, This represents the modality classification header function. Indicates the first Modal feature vectors of each mode, Indicates the number of risk levels. Indicates the first modal pair A probability vector for each risk level. Indicates the first The probability value of each modality corresponding to the j-th risk level; The information contribution supervision quantity of each modality feature is determined based on the probability vector, referring to the following formula: in, Indicates the first The information contribution of each modal feature is supervised. This represents the one-hot vector corresponding to the true label of the input sample. This represents the transpose of the one-hot vector; A contribution mapping function and a contribution analysis network are constructed. The modal features are input into the contribution mapping function for mapping to obtain a discriminant representation related to information contribution. This discriminant representation is then input into the contribution analysis network to output an estimated contribution value, as shown in the following formula: in, This represents the contribution estimate of the m-th modal feature. This represents the contribution mapping function, used to extract discriminant representations related to information contribution. Indicate contribution analysis network; The contribution estimate and the information contribution supervision quantity are iterated through regression loss to make the contribution estimate approach the information contribution supervision quantity. At the end of the iteration, the effective information contribution is output. Based on the effective information contribution of each modal feature, the contribution is normalized to generate the normalized weight coefficient of each modal feature; The multimodal features are weighted and fused based on the normalized weight coefficients to obtain a fused feature vector.

[0011] Optionally, the step of inputting the fused feature vector into a pre-built risk prediction model and outputting a risk prediction result includes: The fused feature vectors corresponding to multiple consecutive time windows are arranged in chronological order to obtain a fused feature vector sequence, and the fused feature vector sequence is input into a pre-constructed risk prediction model; The risk prediction model is configured to construct a causal dilated convolution operation based on a preset convolution kernel length and dilation coefficient for the input fused feature vector sequence. The model performs weighted calculations by weighting the feature vectors at corresponding positions through the weight matrix of each convolution kernel and adding a bias term to obtain the convolution output features at each time step. The risk prediction model is further configured to combine the convolutional output features at each time step in chronological order to obtain the initial input feature sequence of the residual connection module, and input the initial input feature sequence into the residual connection module to perform multi-layer residual processing and output the target time feature. The risk prediction model is further configured to input the target time feature into the classification layer, and process the target time feature and the classification layer parameters through the softmax function to output a risk category probability vector. The risk prediction model is further configured to output a risk level category based on the risk category probability vector using the maximum a posteriori principle, and output the risk level category as a risk prediction result.

[0012] Furthermore, to achieve the above objectives, the present invention also proposes a photovoltaic array fault risk diagnosis device based on multi-mode data fusion. The device is configured to implement the steps of the photovoltaic array fault risk diagnosis method based on multi-mode data fusion described above. The device includes: The data processing module is used to monitor the target photovoltaic power station and preprocess the multi-source monitoring data to construct a basic data sample set. The multi-source monitoring data includes the time series data of the electrical parameters of the target photovoltaic power station, the time series data of environmental monitoring, and the inspection image data. The feature extraction module is used to input the basic data sample set into a pre-constructed multimodal data feature extraction model and output multimodal features, including temporal feature vectors and image feature vectors; The feature fusion module is used to calculate the effective information contribution of each modality feature, and to perform weighted fusion of the multimodal features based on the effective information contribution to obtain a fused feature vector; The risk prediction module is used to input the fused feature vector into a pre-built risk prediction model, output the risk prediction result, and perform fault risk diagnosis on the photovoltaic array components of the target photovoltaic power station based on the risk prediction result. The risk prediction model is a temporal convolutional network composed of multi-layer causal dilated convolution and residual connection modules. The risk prediction model is configured to model and analyze the time dependence and evolution mode of the fused feature vector.

[0013] Furthermore, to achieve the above objectives, this application also proposes a photovoltaic array fault risk diagnosis device based on multi-mode data fusion. The device includes: a memory, a processor, and a photovoltaic array fault risk diagnosis program based on multi-mode data fusion stored in the memory. The processor is used to run the photovoltaic array fault risk diagnosis program based on multi-mode data fusion. The computer program is configured to implement the steps of the photovoltaic array fault risk diagnosis method based on multi-mode data fusion as described above.

[0014] In addition, to achieve the above objectives, this application also proposes a computer-readable storage medium storing a computer program, which, when executed by a processor, implements the steps of the photovoltaic array fault risk diagnosis method based on multi-mode data fusion as described above.

[0015] This invention ensures the reliability and comprehensiveness of diagnostic data through comprehensive monitoring and preprocessing of multi-source data. By extracting multimodal features, it effectively mines core features from different modalities, achieving full extraction of multi-dimensional information. Through weighted feature fusion, it achieves complementary advantages of various modal features, solving the problem of one-sided information from a single data source and improving the representativeness and effectiveness of features. Through targeted risk prediction models and fault diagnosis processes, it achieves accurate prediction and diagnosis of photovoltaic array fault risks. Because this invention utilizes multi-modal data fusion and time-series modeling, it fully leverages the effective contributions of multi-source information to accurately characterize the coupled evolution of risks and multi-scale features, improving the interpretability and traceability of latent risks. It achieves stable identification and graded early warning of photovoltaic array fault risks, effectively improving the accuracy and efficiency of photovoltaic array fault risk diagnosis, reducing the probability of missed or misdiagnosed faults, and reducing the manpower and material costs of blind inspections. Simultaneously, it can predict fault risks in advance, guiding maintenance personnel to carry out timely maintenance work, improving the operational stability and power generation reliability of photovoltaic arrays, and effectively mining potential fault characteristics during photovoltaic array operation, providing technical support for the efficient operation and maintenance of photovoltaic power plants. Attached Figure Description

[0016] To more clearly illustrate the technical solutions in the embodiments of this application or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, for those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0017] Figure 1 This is a schematic diagram of the structure of a photovoltaic array fault risk diagnosis device based on multi-mode data fusion in the hardware operating environment of the embodiment of the present invention; Figure 2 This is a flowchart illustrating the first embodiment of the photovoltaic array fault risk diagnosis method based on multi-mode data fusion of the present invention; Figure 3 This is a flowchart illustrating the second embodiment of the photovoltaic array fault risk diagnosis method based on multi-mode data fusion of the present invention; Figure 4 This is a flowchart illustrating the third embodiment of the photovoltaic array fault risk diagnosis method based on multi-mode data fusion of the present invention; Figure 5 This is a structural block diagram of the first embodiment of the photovoltaic array fault risk diagnosis device based on multi-mode data fusion of the present invention.

[0018] The realization of the objective, functional features and advantages of the present invention will be further explained in conjunction with the embodiments and with reference to the accompanying drawings. Detailed Implementation

[0019] It should be understood that the specific embodiments described herein are for illustrative purposes only and are not intended to limit the scope of the invention.

[0020] Reference Figure 1 , Figure 1 This is a schematic diagram of the structure of a photovoltaic array fault risk diagnosis device based on multi-mode data fusion, which is part of the hardware operating environment of the embodiment of the present invention.

[0021] like Figure 1 As shown, the photovoltaic array fault risk diagnosis device based on multi-mode data fusion may include: a processor 1001, such as a central processing unit (CPU), a communication bus 1002, a user interface 1003, a network interface 1004, and a memory 1005. The communication bus 1002 is used to enable communication between these components. The user interface 1003 may include a display screen or an input unit such as a keyboard; optionally, the user interface 1003 may also include a standard wired interface or a wireless interface. The network interface 1004 may optionally include a standard wired interface or a wireless interface (such as a Wireless-Fidelity (Wi-Fi) interface). The memory 1005 may be high-speed random access memory (RAM) or stable non-volatile memory (NVM), such as a disk drive. The memory 1005 may also optionally be a storage device independent of the aforementioned processor 1001.

[0022] Those skilled in the art will understand that Figure 1 The structure shown does not constitute a limitation on the photovoltaic array fault risk diagnosis device based on multimode data fusion, and may include more or fewer components than shown, or combine certain components, or have different component arrangements.

[0023] like Figure 1 As shown, the memory 1005, which is a computer-readable storage medium, may include an operating system, a network communication module, a user interface module, and a photovoltaic array fault risk diagnosis program based on multi-mode data fusion.

[0024] exist Figure 1In the photovoltaic array fault risk diagnosis device based on multi-mode data fusion shown, the network interface 1004 is mainly used for data communication with the network server; the user interface 1003 is mainly used for data interaction with the user; the processor 1001 and memory 1005 in the photovoltaic array fault risk diagnosis device based on multi-mode data fusion of the present invention can be set in the photovoltaic array fault risk diagnosis device based on multi-mode data fusion. The photovoltaic array fault risk diagnosis device based on multi-mode data fusion calls the photovoltaic array fault risk diagnosis program based on multi-mode data fusion stored in the memory 1005 through the processor 1001 and executes the photovoltaic array fault risk diagnosis method based on multi-mode data fusion provided in the embodiment of the present invention.

[0025] This invention provides a method for diagnosing photovoltaic array fault risks based on multi-mode data fusion, referring to... Figure 2 , Figure 2 This is a flowchart illustrating the first embodiment of the photovoltaic array fault risk diagnosis method based on multi-mode data fusion of the present invention.

[0026] In this embodiment, the photovoltaic array fault risk diagnosis method based on multi-mode data fusion includes the following steps: Step S10: Monitor the target photovoltaic power station and preprocess the multi-source monitoring data to construct a basic data sample set.

[0027] It should be understood that the executing entity of this embodiment can be a computing service device with data processing, network communication, and program execution functions, such as a tablet computer, personal computer, or mobile phone, or a terminal electronic device capable of performing the above functions. The following description uses a photovoltaic array fault risk diagnosis device based on multi-mode data fusion (hereinafter referred to as the diagnosis device) as an example to illustrate this embodiment and the following embodiments.

[0028] It should be noted that the multi-source monitoring data includes time-series data of electrical parameters, time-series data of environmental monitoring, and inspection image data of the target photovoltaic power station. The target photovoltaic power station refers to a specific photovoltaic power station (such as a single-plant photovoltaic power station, a centralized photovoltaic power station in a certain area, etc.) that requires photovoltaic array fault risk diagnosis.

[0029] It should be noted that electrical parameter time-series data can be electrical parameter data that is continuously collected in chronological order and reflects the power generation performance of the photovoltaic array, covering key indicators such as voltage, current, power, and power generation of the photovoltaic modules. Environmental monitoring time-series data can be external environmental parameter data that affects the operation of the photovoltaic array and is collected continuously in chronological order, including indicators such as light intensity, ambient temperature, wind speed, and humidity. Inspection image data can be image data that reflects the appearance status of the photovoltaic array modules, collected through manual inspection or intelligent inspection equipment (such as drones and high-definition cameras), and can capture appearance abnormalities such as module damage, dust accumulation, and aging.

[0030] Understandably, data preprocessing can involve targeted processing of the collected multi-source monitoring data, such as eliminating data noise, filling in missing data, and standardizing data formats. The basic data sample set can be a standardized dataset formed by integrating all multi-source monitoring data after preprocessing.

[0031] Step S20: Input the basic data sample set into the pre-constructed multimodal data feature extraction model and output multimodal features.

[0032] It should be noted that the multimodal features include time-series feature vectors and image feature vectors. Time-series feature vectors can be extracted from electrical parameter time-series data and environmental monitoring time-series data; they are vector-based data that reflects the temporal variation patterns and trends of the data, and can capture the evolution and abnormal fluctuations of photovoltaic array operating parameters over time. Image feature vectors can be extracted from inspection image data; they are vector-based data that reflects the appearance status and potential defect characteristics of photovoltaic array components, and can capture feature information corresponding to component appearance anomalies (such as damage or dust accumulation).

[0033] It should be noted that the pre-built multimodal data feature extraction model can be a model that has been optimized in advance through sample training. It can simultaneously process multiple types of modal data (time series data, image data) and extract features. It integrates the core functions of time series feature extraction and image feature extraction and can adapt to the feature extraction needs of multi-source data.

[0034] In some embodiments, the multimodal feature extraction model includes a temporal feature extraction module and an image feature extraction module. The temporal feature extraction module can use temporal analysis algorithms (such as LSTM, temporal convolutional networks) to extract trend features, fluctuation features, correlation features, etc. from temporal data and convert them into standardized temporal feature vectors. The image feature extraction module can use deep learning image processing algorithms (such as CNN, ResNet) to extract texture features, contour features, defect features, etc. from preprocessed inspection images and convert them into standardized image feature vectors. The model performs preliminary integration of the two types of feature vectors and outputs complete multimodal features.

[0035] Furthermore, in order to accurately extract temporal and image features from multimodal data, in one embodiment, the multimodal data feature extraction model includes a temporal feature extraction module and a visual feature extraction module. The temporal feature extraction module is constructed based on a temporal encoder of the Informer model, and the visual feature extraction module is constructed based on an image encoder of the visual Transformer model. The time-series feature extraction module is configured to encode the electrical parameter sequence samples and environmental parameter sequence samples in the basic data sample set using a sparse attention mechanism, and output a time-series feature vector. The visual feature extraction module is configured to encode the target image modality samples in the basic data sample set using a self-attention mechanism and output an image feature vector.

[0036] It is understood that the time-series feature extraction module built on the Informer model time-series encoder in this embodiment, combined with the sparse attention mechanism, can effectively reduce the computational load of long time-series data processing and improve the efficiency of time-series feature encoding, while avoiding the problem that traditional time-series models are unable to capture long-period dependencies. The addition of position encoding can ensure that the model accurately identifies the temporal order of time-series data and accurately captures the evolution of photovoltaic array operating parameters over time. The sparse attention mechanism can focus on the feature information of key time steps, effectively mine abnormal features in electrical and environmental parameters, highlight time-series signals related to fault risks, and reduce redundant information interference. The standardized time-series feature vector has uniform dimensions and distribution, which can accurately reflect the time-series changes of photovoltaic array power generation performance and environmental influences, providing high-quality time-series feature support for subsequent multimodal feature fusion and risk prediction.

[0037] It should be understood that the visual feature extraction module built on the visual Transformer model image encoder in this embodiment, combined with the self-attention mechanism, can effectively capture the global features and local details of the image, solving the problem that traditional image processing algorithms are difficult to accurately identify subtle defects in components. Image block segmentation and embedding processing can transform the image into a sequence form that the model can process, while preserving the spatial structure information of the image, ensuring that the model can accurately locate the component defect. The self-attention mechanism can focus on abnormal areas in the image, accurately mine image features related to component defects, reduce the interference of background redundant information, and improve the representativeness of image features. The standardized image feature vector has the same dimension and distribution as the temporal feature vector, which can achieve efficient fusion of the two types of features, and accurately reflect the appearance status of the photovoltaic array component, providing high-quality visual feature support for subsequent multimodal feature fusion and risk prediction.

[0038] Furthermore, in order to accurately capture the feature information of key time steps and effectively mine abnormal features in electrical parameters and environmental parameters, in one embodiment, the time-series feature extraction module is also configured to concatenate the input electrical parameter sequence sample and environmental parameter sequence sample according to the time dimension to obtain a joint time-series data sample at the same time scale, and perform a linear transformation on the joint time-series data sample to obtain a query vector matrix, a key vector matrix and a value vector matrix; The temporal feature extraction module is further configured to perform local attention calculation based on the query vector matrix, key vector matrix, and value vector matrix through a sparse attention mechanism, obtain the local attention probability distribution at each time step, and generate global attention features based on the local attention probability distribution.

[0039] It is understood that this embodiment uses the Informer model encoder as the temporal feature extraction model. Its sparse attention mechanism can reduce the computational complexity of long sequences while maintaining the ability to model long dependencies, making it more suitable for efficient encoding of multi-channel long sequences in power plants. The sparse attention calculation process inside the encoder is as follows: in, Represents global attention features. , and These represent the query vector matrix, key vector matrix, and value vector matrix, respectively. Let represent the local attention probability distribution for all key vectors at the i-th time step. Indicates transpose. Representing the query matrix The row vector corresponding to the i-th time step. The dimension of the attention feature space is used to scale the query-key product result. This represents the activation function, which is used to transform the input time-series data samples into a probability distribution representation.

[0040] The temporal feature extraction module is further configured to calculate the information entropy of the attention probability distribution at each time step based on the global attention features, and to filter the sparse location set according to the information entropy, as shown in the following formula: in, Represents the attention probability distribution at the i-th time position. Information entropy This represents the attention probability distribution at the i-th time position. In the diagram, the probability value corresponding to the j-th key vector is... This represents the total number of key vectors in the key matrix. This represents the set of sparse locations obtained by filtering based on information entropy. This represents a function that takes the minimum value of the corresponding variable. The number of elements is represented by , and u represents the size of the sparse position set. Represents the sampling factor constant. This represents the floor function; The temporal feature extraction module is further configured to determine the attention weight vector of the sparse positions based on the sparse position set and the global attention features, and to determine the temporal feature vector of the sparse positions based on the attention weight vector. The temporal feature vector of the non-sparse positions is then filled with the mean value, as shown in the following formula: in, Indicates the first Attention weight vectors at each time position, Indicates the first Attention weight vector at each position In the diagram, the weight value corresponding to the j-th key is... Value matrix The row vector corresponding to the j-th time position in the middle. This represents the temporal feature vector output at the i-th time position, which is used as the temporal feature input for subsequent multimodal fusion and fault risk identification.

[0041] Furthermore, in order to effectively capture the global features and local detail features of the photovoltaic array component image, in one embodiment, the visual feature extraction module is further configured to divide the input target image modal sample into multiple non-overlapping image blocks according to a preset size, flatten the multiple non-overlapping image blocks into image vectors, and sequentially concatenate the image vectors to obtain an image block sequence matrix; The visual feature extraction module is further configured to define a linear embedding matrix and a bias based on the preset size and the number of channels of the target image modal sample, and to map the image block sequence matrix to the image feature space of the visual Transformer model based on the linear embedding matrix and the bias to generate an embedding representation vector; The visual feature extraction module is further configured to initialize and generate a position encoding vector and a category label vector based on the embedding dimension, and generate sequence features based on the position encoding vector, the category label vector and the embedding representation vector; The visual feature extraction module is further configured to input the sequence features into a multilayer image encoder for feature encoding and output an image feature vector.

[0042] Understandably, after acquiring visible light inspection image data and associating it with objects, the visible light images are preprocessed and sampled in a unified manner. An image feature extraction network is then constructed to learn the representation of component appearance defects and abnormal features, resulting in image feature vectors for subsequent multimodal fusion and fault risk identification.

[0043] Considering that defects such as contamination, occlusion, and hidden cracks in inspection images often exhibit characteristics such as distribution, weak texture, and diverse scales, it is difficult to stably characterize the relationship between defects and their spatial context by relying solely on feature extraction methods based on local receptive fields. Therefore, this embodiment uses a visual Transformer (ViT) based on a self-attention mechanism to encode visible light images in order to obtain robust image representation vectors for subsequent multimodal fusion and risk identification.

[0044] The preprocessed single-frame visible light image is represented as follows: Where A, B, and G represent the image height, width, and number of channels, respectively. The image is then sorted by size... The non-overlapping patch division yields K patches, and the k-th patch (i.e., the image patch) is flattened into a vector. Concatenate all patch vectors in order to obtain a patch sequence matrix: In the formula, R is the input visible light image; A, B, and G are the image size and number of channels; Q is the patch side length; and K is the number of patches. The vector flattened from the k-th patch; This is a patch sequence matrix.

[0045] To map the patch sequence to the feature space of ViT, a linear embedding matrix is defined. With bias Map each patch vector to 3D embedding vector: in, This represents the k-th embedding vector. Represents a linear embedding matrix. Indicates bias. Indicates the embedding dimension. This represents the image vector after the k-th image patch has been flattened and transformed. This represents the number of channels in the modal samples of the target image. This indicates a preset size, which is the length of the image vector. This represents the dimension of the linear embedding matrix. This indicates the total number of image patches.

[0046] Furthermore, a positional encoding vector is introduced to inject spatial location information into the patch. and set the category tag vector Then the input sequence of the ViT encoder can be constructed as follows: in, Representing sequence features, This represents the encoding vector at the k-th position. Represents the category label vector, This represents the total length of the category label vector plus the image patch sequence. Dimensions representing sequence features.

[0047] Will After inputting into a multi-layer Transformer encoder, the output vector corresponding to its category label is taken as the representation vector of the visible light image for that frame. This method is used for subsequent multimodal fusion and fault risk identification. It can explicitly model the correlation between distant regions in an image, thereby improving the characterization ability under distributed defects and complex background interference.

[0048] To describe the feature update process between encoder layers, the inter-layer mapping of the ViT encoder is defined as follows: in, This represents the image feature vector output by the i-th layer image encoder. For encoder layers, This represents an encoder block composed of a multi-head self-attention module and a feedforward network, used to model global correlation features between different image blocks and extract image representation features.

[0049] Step S30: Calculate the effective information contribution of each modal feature, and perform weighted fusion of the multimodal features based on the effective information contribution to obtain a fused feature vector.

[0050] It should be noted that the effective information contribution refers to the proportion of effective diagnostic information that different modal features (time series feature vectors, image feature vectors) can provide in the process of photovoltaic array fault risk diagnosis, reflecting the degree of influence of each modal feature on the fault risk diagnosis results.

[0051] A fused feature vector can be a single feature vector that integrates the core information of time series features and image features after weighted fusion. It combines the time evolution law of time series data with the appearance defect features of image data, and can comprehensively reflect the operating status and fault risk of photovoltaic array.

[0052] In some embodiments, the diagnostic device may employ algorithms such as information entropy and variance analysis to calculate the effective information contribution of time-series feature vectors and image feature vectors. By analyzing the correlation between each modal feature and the fault risk, the weight allocation basis for different modal features is determined. Modal features that are highly correlated with the fault risk and have rich effective information are assigned a higher effective information contribution.

[0053] Step S40: Input the fused feature vector into the pre-constructed risk prediction model, output the risk prediction result, and perform fault risk diagnosis on the photovoltaic array components of the target photovoltaic power station based on the risk prediction result.

[0054] It should be noted that the risk prediction model is a temporal convolutional network composed of multi-layer causal dilated convolutions and residual connection modules, and the risk prediction model is configured to model and analyze the time dependence and evolution pattern of the fused feature vector.

[0055] In some embodiments, the diagnostic device can input the fused feature vector into a pre-built risk prediction model. The model performs convolution operations on the fused feature vector through multi-layer causal dilated convolution to expand the receptive field and capture the temporal dependence of the features. At the same time, it alleviates the gradient vanishing problem through residual connections to ensure the stable operation of the model. The model models and analyzes the temporal dependence and evolution pattern of the fused feature vector to explore the core patterns related to fault risk in the fused features, and finally outputs the risk prediction result (risk level, etc.).

[0056] In some embodiments, the diagnostic equipment can combine risk prediction results with multi-source monitoring data and pre-processed data to comprehensively assess the failure risk of photovoltaic array modules. If the prediction result is high failure risk, the equipment can preliminarily locate potential faulty modules and possible failure types (such as module damage, reduced power generation efficiency due to dust accumulation, etc.) by combining inspection image features and time-series parameter fluctuations. If the prediction result is low failure risk, the equipment can focus on the changing trend of the corresponding module's operating parameters and conduct continuous monitoring. If there is no failure risk, the equipment can confirm that the module is operating normally and generate a complete failure risk diagnosis report to provide guidance for subsequent maintenance work.

[0057] Furthermore, in order to fully exploit the temporal correlation information and fault risk characteristics in the fusion features and improve the accuracy of risk prediction, in one embodiment, the above step S40 may include: Step S401: Arrange the fused feature vectors corresponding to multiple consecutive time windows in chronological order to obtain a fused feature vector sequence, and input the fused feature vector sequence into the pre-constructed risk prediction model.

[0058] It should be noted that the risk prediction model is configured to construct a causal dilated convolution operation based on a preset convolution kernel length and dilation coefficient for the input fused feature vector sequence. The model is then weighted by the weight matrix of each convolution kernel and the feature vector at the corresponding position, and a bias term is added to obtain the convolution output features at each time step. The risk prediction model is further configured to combine the convolutional output features at each time step in chronological order to obtain the initial input feature sequence of the residual connection module, and input the initial input feature sequence into the residual connection module to perform multi-layer residual processing and output the target time feature. The risk prediction model is further configured to input the target time feature into the classification layer, and process the target time feature and the classification layer parameters through the softmax function to output a risk category probability vector. The risk prediction model is further configured to output a risk level category based on the risk category probability vector using the maximum a posteriori principle, and output the risk level category as a risk prediction result.

[0059] In practical implementation, the diagnostic equipment uses the obtained fusion feature representation as the input for risk prediction, constructs a TCN network composed of multi-layer causal dilated convolution and residual connections, models the time dependence and evolution pattern of the fusion features, outputs the corresponding risk score and generates risk level results, and realizes the prediction and graded early warning of photovoltaic module operation risks.

[0060] Suppose the fused feature sequence obtained within a continuous time window is: In the formula, For the first The fused feature vectors corresponding to each time window; The length of the input sequence; To fuse feature dimensions; The input TCN is the fused feature sequence.

[0061] For the input sequence The At time n, the output of the causal dilated convolution can be represented as: In the formula, Indicates the input sequence at time... eigenvectors; The output features are from the convolution. The kernel length is [length]. This is the expansion coefficient, used to expand the receptive field; For the first Each convolutional kernel weight matrix; This is a bias term. Causality is determined by... Ensure that future information is avoided.

[0062] Let the first The input of each TCN residual block is Then its output is defined as: In the formula, and The first Layer and First Layer residual block output feature sequence; The number of residual block layers; This represents the first convolutional layer composed of causal dilated convolution, nonlinear activation, and regularization. A TCN mapping is used to extract multi-scale temporal patterns; residual terms are used to stabilize training and enhance deep modeling capabilities.

[0063] Extract the features from the last time step of the TCN output sequence, and output a risk probability vector through a classification layer: In the formula, , For classification layer parameters; For risk category probability vectors; according to The risk level is output by the maximum probability category or threshold rule.

[0064] Based on the obtained risk category probability vector, the risk level category is output using the maximum a posteriori principle: In the formula, This represents the total number of risk level categories. For the sample to belong to the first The predicted probability of a risk level; The predicted risk level categories are used to guide the subsequent issuance of early warnings and the prioritization of operation and maintenance.

[0065] This embodiment ensures the reliability and comprehensiveness of diagnostic data through comprehensive monitoring and preprocessing of multi-source data. Multimodal feature extraction effectively uncovers core features from different modalities, achieving full extraction of multi-dimensional information. Weighted feature fusion enables complementary advantages of various modal features, solving the problem of one-sided information from a single data source and improving the representativeness and effectiveness of features. Targeted risk prediction models and fault diagnosis processes enable accurate prediction and diagnosis of photovoltaic array fault risks. Because this invention utilizes multimodal data fusion and time-series modeling, it fully leverages the effective contributions of multi-source information to accurately characterize risk coupling evolution and multi-scale features, improving the interpretability and traceability of latent risks. This achieves stable identification and graded early warning of photovoltaic array fault risks, effectively improving the accuracy and efficiency of photovoltaic array fault risk diagnosis, reducing the probability of missed or misdiagnosed faults, and minimizing the manpower and material costs of blind inspections. Simultaneously, it can predict fault risks in advance, guiding maintenance personnel to carry out timely maintenance work, improving the operational stability and power generation reliability of photovoltaic arrays, and effectively uncovering potential fault characteristics during photovoltaic array operation, providing technical support for the efficient operation and maintenance of photovoltaic power plants.

[0066] refer to Figure 3 , Figure 3 This is a flowchart illustrating the second embodiment of the photovoltaic array fault risk diagnosis method based on multi-mode data fusion of the present invention.

[0067] Based on the first embodiment described above, in this embodiment, step S10 further includes: Step S101: Monitor the time series data of electrical parameters in the operation and management system of the target photovoltaic power station, perform structured processing on the time series data of electrical parameters, and perform data cleaning processing on the structured time series data of electrical parameters to obtain electrical parameter sequence samples.

[0068] In practical implementation, the long-term time-series monitoring data recorded in the operation and management system of the target photovoltaic power station is used as the data source. A portion of the photovoltaic modules within the power station are selected as the analysis objects. Operating electrical parameters are extracted and organized according to the "equipment identifier—timestamp—parameter sequence" method to form the electrical parameter time-series sample set required for subsequent multimodal feature extraction and fusion modeling. The electrical parameter data originates from the online monitoring records of the operation and management system. By mapping the monitoring point number or module number, the electrical parameter curves within the corresponding time period are read, and the sequences are indexed and stored using a unified timestamp. Simultaneously, quality control processing is performed on the extracted data, removing missing values, incomplete records, and data with abnormal formats. The cleaned electrical parameter sequences are organized into samples according to a unified time window to facilitate time correlation with subsequent environmental monitoring data and inspection image data.

[0069] In terms of electrical parameter types, diagnostic equipment can extract the following electrical parameters that are directly related to the output status of components and strings: voltage, current, and power. Voltage and current are used to characterize changes in output characteristics, while power is used to characterize output capability and abnormal degradation trends, providing basic inputs for subsequent risk assessment and early warning modeling.

[0070] Step S102: Monitor the environmental monitoring time series data of the environmental monitoring points in the operation scenario of the target photovoltaic power station, perform structured processing on the environmental monitoring time series data, and perform data cleaning processing on the structured environmental monitoring time series data to obtain environmental parameter sequence samples.

[0071] In practical implementation, environmental monitoring records in the target photovoltaic power plant operation scenario are used as the data source. Environmental monitoring data closely related to component output fluctuations are extracted from the time-series monitoring data recorded by the power plant operation management system, following the format of "measuring point identifier—timestamp—environmental parameter sequence." This data is then linked and organized with the electrical parameter data in S11 using a unified time index, forming an environmental time-series sample set for subsequent multimodal feature extraction and fusion modeling. The environmental data is mapped to the association between environmental monitoring measuring points and components or string objects, and environmental parameter curves within the corresponding time period are read. The sequences are indexed and stored using a unified timestamp. Simultaneously, during the sample processing stage, quality control processing is performed on the environmental time series data, removing missing values, incomplete records, and data with abnormal formats to ensure the validity and consistency of subsequent modeling inputs.

[0072] Regarding the types of environmental monitoring parameters, diagnostic equipment can extract the following environmental time-series quantities: temperature, irradiance, wind speed, and wind direction. Temperature and irradiance are used to characterize changes in the heating and light conditions of components, while wind speed and wind direction are used to characterize the intensity of environmental disturbances and their impact on fluctuations in operating status, thereby providing external driving information related to changes in operating conditions for risk assessment.

[0073] Step S103: Obtain inspection image data of the target photovoltaic power station, perform feature analysis on the inspection image data, and extract candidate image samples related to defects in the photovoltaic array components from the inspection image data.

[0074] In practical implementation, the inspection image data of the target photovoltaic power station is used as the data source. Image samples related to the characterization of component defects are extracted, and the image samples are bound with component object and time information to form an image modality sample set that can be associated with electrical parameter sequence samples and environmental parameter sequence samples. The inspection image data is visible light inspection image, used for appearance features such as surface defects, contamination, and shading.

[0075] Step S104: Bind the candidate image sample with the component identifier and timestamp identifier of the associated photovoltaic array component to generate an initial image modal sample, and perform data cleaning processing on the initial image modal sample to obtain the target image modal sample.

[0076] Understandably, during the image sample processing, diagnostic equipment can attach or extract corresponding timestamps and component identification information for each image. Based on these timestamps and identification information, it can establish a correspondence between the image and electrical parameter time series segments and environmental time series segments within the same time window, thereby ensuring that data from different modalities can be consistently correlated within the same event window. Simultaneously, quality control processing is also performed on the image data, removing missing, damaged, or formatted image records to ensure that the image samples can be used for subsequent feature extraction and fusion evaluation.

[0077] Step S105: Construct a basic data sample set based on electrical parameter sequence samples, environmental parameter sequence samples, target image modal samples, and the time information of each sample.

[0078] In practical implementation, the diagnostic equipment can extract the time information (time stamps) of electrical parameter sequence samples, environmental parameter sequence samples, and target image modal samples, and align the three types of samples according to the timestamps to ensure that there are complete electrical parameters, environmental parameters, and image samples at the same timestamp (if there is no image sample at a certain timestamp, it is marked as "no corresponding image", which does not affect the integration of other samples); the time alignment accuracy is controlled within 10 minutes (consistent with the data acquisition frequency) to avoid sample association errors caused by time deviation.

[0079] Using timestamps as the core association basis, electrical parameter sequence samples and environmental parameter sequence samples under the same timestamp are associated with target image modal samples of the corresponding component identifiers; at the same time, sample association information (such as component identifier, inspection area, monitoring point number) is supplemented to ensure that each integrated sample contains electrical parameters, environmental parameters and appearance image information of a certain component (or area) of the photovoltaic array at a certain time.

[0080] All the integrated samples are summarized, and invalid sample entries with association errors or serious data missingness are removed; the samples are classified and labeled (electrical parameters, environmental parameters, images), and a unified sample storage standard is established to form a complete set of basic data samples; the sample set is verified to ensure the integrity, accuracy and correlation of the samples, and the sample collection period and coverage are labeled to facilitate subsequent model calling and maintenance.

[0081] This embodiment achieves accurate acquisition, standardized processing, targeted cleaning, and data integration of multi-source data from three dimensions: electrical parameters, environmental parameters, and inspection images. This ensures that different modal samples can correspond to the operating status of the photovoltaic array at the same time, providing a foundation for subsequent analysis of the correlation between electrical parameters, environmental parameters, and component appearance. Sample association and integration can organically combine multi-dimensional data, break down the information barriers of single-modal data, realize the collaborative use of data, form a unified, standardized, and high-quality data source, avoid the interference of inferior samples and invalid data on subsequent feature extraction and risk prediction, ensure the accuracy and reliability of subsequent model inputs, and lay a solid data foundation for the entire photovoltaic array fault risk diagnosis process.

[0082] refer to Figure 4 , Figure 4 This is a flowchart illustrating the third embodiment of the photovoltaic array fault risk diagnosis method based on multi-mode data fusion of the present invention.

[0083] Based on the above embodiments, in this embodiment, step S30 further includes: Step S301: Calculate the probability vector of each modality feature for different risk levels using the modality classification head.

[0084] Understandably, based on the obtained temporal and image features, the effective information contribution of each modal feature to fault risk judgment is calculated, and an adaptive fusion weight is generated accordingly. The multimodal features are then weighted and fused to form a unified fusion representation.

[0085] To address the issue of varying information utility of different modalities on different samples during multimodal feature fusion, this scheme quantifies the proportion of effective information of each modality based on its information contribution, and adaptively allocates fusion weights accordingly to form a unified fusion feature representation for subsequent fault risk identification and graded early warning.

[0086] Assume there is a total The modality, for any input sample, the modality... The features of each modality are represented as follows: Based on this, in order to characterize the effective information contribution of each modal feature to fault risk assessment, the following analysis is conducted for the first... Each modality is set as a modal classification head. Output the mode pair A probability vector for each risk level: Refer to the following formula: in, This represents the modality classification header function. Indicates the first Modal feature vectors of each mode, Indicates the number of risk levels. Indicates the first modal pair A probability vector for each risk level. Indicates the first The probability value of each modality corresponding to the j-th risk level.

[0087] Step S302: Determine the information contribution supervision quantity of each modality feature based on the probability vector.

[0088] Understandably, during the training phase, let the one-hot vector (i.e., the one-hot vector) of the true label of the sample be... , define the first The "information contribution" supervision measure for each modality is its support probability for the true category, as shown in the following formula: in, Indicates the first The information contribution of each modal feature is supervised. This represents the one-hot vector corresponding to the true label of the input sample. This represents the transpose of the one-hot vector; where, The larger the value, the stronger the modality's support for the true risk category in the current sample, reflecting its more complete and reliable information content; conversely, the smaller the value, the more uncertain the modality is and the lower its contribution to effective information.

[0089] Step S303: Construct a contribution mapping function and a contribution analysis network. Input each modal feature into the contribution mapping function for mapping to obtain a discriminant representation related to the information contribution. Then input the discriminant representation into the contribution analysis network to output the contribution estimate.

[0090] It should be understood that, given that true labels cannot be obtained during the reasoning stage... To enable online calculation of contributions, a network is constructed. ,by Input-output contribution estimates Refer to the following formula: in, This represents the contribution estimate of the m-th modal feature. This represents the contribution mapping function, used to map the m-th modal feature to extract a discriminant representation related to its information contribution. This represents the contribution analysis network, which outputs the contribution of the modality in the current sample and provides a basis for subsequent normalization to generate adaptive fusion weights.

[0091] Step S304: Iterate the contribution estimate and the information contribution supervision quantity through regression loss so that the contribution estimate approaches the information contribution supervision quantity. At the end of the iteration, output the effective information contribution.

[0092] In the specific implementation, regression loss is used during the training phase. make Approaching : Step S305: Normalize the contribution based on the effective information contribution of each modal feature to generate normalized weight coefficients for each modal feature.

[0093] In practical implementation, to ensure that the effects of different modalities are comparable during fusion, the contribution of each modality is normalized into a weighting coefficient. : Step S306: Based on the normalized weight coefficients, the multimodal features are weighted and fused to obtain a fused feature vector.

[0094] In the specific implementation, after the weight allocation is completed, the deep features of each modality are weighted and fused to obtain a unified fused feature representation. : This weighted fusion method highlights high-contribution modes at the feature level while suppressing interference from low-contribution modes, which helps improve the discrimination ability of subsequent models in complex scenarios.

[0095] This embodiment achieves accurate quantification of effective information contribution and efficient fusion of multimodal features. Specifically, it includes: combining the synergistic effects of modality classification heads, contribution mapping functions, and contribution analysis networks to transform the abstract "effective information contribution" into a quantifiable indicator, solving the core problems of blind weight allocation and difficulty in measuring information contribution in traditional multimodal fusion; significantly improving the accuracy of effective information contribution through iterative optimization of regression loss, ensuring that the allocation of weight coefficients can truly reflect the information value of various modal features, and avoiding the decline in fusion effect caused by weight bias; and achieving deep fusion of temporal and image modal features through standardized normalization processing and weighted fusion calculation, highlighting effective information, suppressing redundant information, realizing the complementary advantages of the two types of features, and solving the problem of one-sided information from a single modality.

[0096] Furthermore, this embodiment of the invention also proposes a computer-readable storage medium storing a photovoltaic array fault risk diagnosis program based on multi-mode data fusion. When the photovoltaic array fault risk diagnosis program based on multi-mode data fusion is executed by a processor, it implements the steps of the photovoltaic array fault risk diagnosis method based on multi-mode data fusion as described above.

[0097] The computer-readable storage medium provided in this application may be, for example, a USB flash drive, but is not limited to, electrical, magnetic, optical, electromagnetic, infrared, or semiconductor systems, devices, or any combination thereof. More specific examples of computer-readable storage media may include, but are not limited to: electrical connections having one or more wires, portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination thereof. In this embodiment, the computer-readable storage medium may be any tangible medium containing or storing a program that can be used by or in conjunction with an instruction execution system, system, or device. The program code contained on the computer-readable storage medium may be transmitted using any suitable medium, including but not limited to: wires, optical cables, RF (Radio Frequency), etc., or any suitable combination thereof.

[0098] The aforementioned computer-readable storage medium may be included in a photovoltaic array fault risk diagnosis device based on multi-mode data fusion; or it may exist independently and not be assembled into a photovoltaic array fault risk diagnosis device based on multi-mode data fusion.

[0099] Furthermore, this invention also proposes a computer program product, including a photovoltaic array fault risk diagnosis program based on multi-mode data fusion. When the photovoltaic array fault risk diagnosis program based on multi-mode data fusion is executed by a processor, it implements the steps of the photovoltaic array fault risk diagnosis method based on multi-mode data fusion as described above.

[0100] The specific implementation of the computer program product of the present invention is basically the same as the embodiments of the photovoltaic array fault risk diagnosis method based on multi-mode data fusion described above, and will not be repeated here.

[0101] Reference Figure 5 , Figure 5This is a structural block diagram of the first embodiment of the photovoltaic array fault risk diagnosis device based on multi-mode data fusion of the present invention.

[0102] like Figure 5 As shown, the photovoltaic array fault risk diagnosis device based on multi-mode data fusion proposed in this embodiment of the invention includes: The data processing module 10 is used to monitor the target photovoltaic power station and preprocess the multi-source monitoring data to construct a basic data sample set. The multi-source monitoring data includes the electrical parameter time series data, environmental monitoring time series data, and inspection image data of the target photovoltaic power station. The feature extraction module 20 is used to input the basic data sample set into a pre-constructed multimodal data feature extraction model and output multimodal features, the multimodal features including temporal feature vectors and image feature vectors; The feature fusion module 30 is used to calculate the effective information contribution of each modality feature, and to perform weighted fusion of the multimodal features based on the effective information contribution to obtain a fused feature vector; The risk prediction module 40 is used to input the fused feature vector into a pre-built risk prediction model, output the risk prediction result, and perform fault risk diagnosis on the photovoltaic array components of the target photovoltaic power station based on the risk prediction result. The risk prediction model is a temporal convolutional network composed of multi-layer causal dilated convolution and residual connection modules. The risk prediction model is configured to model and analyze the time dependence and evolution mode of the fused feature vector.

[0103] This embodiment ensures the reliability and comprehensiveness of diagnostic data through comprehensive monitoring and preprocessing of multi-source data. Multimodal feature extraction effectively uncovers core features from different modalities, achieving full extraction of multi-dimensional information. Weighted feature fusion enables complementary advantages of various modal features, solving the problem of one-sided information from a single data source and improving the representativeness and effectiveness of features. Targeted risk prediction models and fault diagnosis processes enable accurate prediction and diagnosis of photovoltaic array fault risks. Because this invention utilizes multimodal data fusion and time-series modeling, it fully leverages the effective contributions of multi-source information to accurately characterize risk coupling evolution and multi-scale features, improving the interpretability and traceability of latent risks. This achieves stable identification and graded early warning of photovoltaic array fault risks, effectively improving the accuracy and efficiency of photovoltaic array fault risk diagnosis, reducing the probability of missed or misdiagnosed faults, and minimizing the manpower and material costs of blind inspections. Simultaneously, it can predict fault risks in advance, guiding maintenance personnel to carry out timely maintenance work, improving the operational stability and power generation reliability of photovoltaic arrays, and effectively uncovering potential fault characteristics during photovoltaic array operation, providing technical support for the efficient operation and maintenance of photovoltaic power plants.

[0104] The photovoltaic array fault risk diagnosis device based on multi-mode data fusion provided in this application adopts the photovoltaic array fault risk diagnosis method based on multi-mode data fusion in the above embodiments, and can solve the technical problems of photovoltaic array fault risk diagnosis based on multi-mode data fusion. Compared with the prior art, the beneficial effects of the photovoltaic array fault risk diagnosis device based on multi-mode data fusion provided in this application are the same as the beneficial effects of the photovoltaic array fault risk diagnosis method based on multi-mode data fusion provided in the above embodiments, and other technical features in the photovoltaic array fault risk diagnosis device based on multi-mode data fusion are the same as the features disclosed in the method of the above embodiments, and will not be repeated here.

[0105] It should be understood that the above are merely illustrative examples and do not constitute any limitation on the technical solutions of the present invention. In specific applications, those skilled in the art can make settings as needed, and the present invention does not impose any restrictions on this.

[0106] It should be noted that the workflow described above is merely illustrative and does not limit the scope of protection of this invention. In practical applications, those skilled in the art can select some or all of the workflow to achieve the purpose of this embodiment according to actual needs, and no restrictions are imposed here.

[0107] In addition, for technical details not described in detail in this embodiment, please refer to the photovoltaic array fault risk diagnosis method based on multi-mode data fusion provided in any embodiment of the present invention, which will not be repeated here.

[0108] It should be noted that, in this document, the terms "comprising," "including," or any other variations thereof are intended to cover non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such a process, method, article, or system. Unless otherwise specified, an element defined by the phrase "comprising one..." does not exclude the presence of other identical elements in the process, method, article, or system that includes that element.

[0109] It should be noted that the user information (including but not limited to user device information, user personal information, user location information, user behavior information, etc.) and data (including but not limited to data used for analysis, data stored, data displayed, etc.) involved in this application are all information and data authorized by the user or fully authorized by all parties, and the collection, use and processing of related data must comply with the relevant laws, regulations and standards of the relevant countries and regions.

[0110] The sequence numbers of the above embodiments of the present invention are for descriptive purposes only and do not represent the superiority or inferiority of the embodiments.

[0111] Through the above description of the embodiments, those skilled in the art can clearly understand that the methods of the above embodiments can be implemented by means of software plus necessary general-purpose hardware platforms. Of course, they can also be implemented by hardware, but in many cases the former is a better implementation method. Based on this understanding, the technical solution of the present invention, or the part that contributes to the prior art, can be embodied in the form of a software product. This computer software product is stored in a storage medium (such as read-only memory / random access memory, magnetic disk, optical disk) and includes several instructions to cause a terminal device (which may be a mobile phone, computer, server, air conditioner, or network device, etc.) to execute the methods described in the various embodiments of the present invention.

[0112] The above are merely preferred embodiments of the present invention and do not limit the scope of the patent. Any equivalent structural or procedural transformations made based on the description and drawings of the present invention, or direct or indirect applications in other related technical fields, are similarly included within the scope of patent protection of the present invention.

Claims

1. A method for diagnosing photovoltaic array fault risks based on multi-mode data fusion, characterized in that, The method includes: Data monitoring is performed on the target photovoltaic power station, and the multi-source monitoring data is preprocessed to construct a basic data sample set. The multi-source monitoring data includes the time series data of the electrical parameters of the target photovoltaic power station, the time series data of environmental monitoring, and the inspection image data. The basic data sample set is input into a pre-constructed multimodal data feature extraction model, and multimodal features are output, including temporal feature vectors and image feature vectors. Calculate the effective information contribution of each modality feature, and perform weighted fusion of the multimodal features based on the effective information contribution to obtain a fused feature vector; The fused feature vector is input into a pre-built risk prediction model, which outputs risk prediction results. Based on the risk prediction results, the photovoltaic array components of the target photovoltaic power station are diagnosed for fault risk. The risk prediction model is a temporal convolutional network composed of multi-layer causal dilated convolution and residual connection modules. The risk prediction model is configured to model and analyze the time dependence and evolution pattern of the fused feature vector.

2. The photovoltaic array fault risk diagnosis method based on multi-mode data fusion as described in claim 1, characterized in that, The process involves data monitoring of the target photovoltaic power station, preprocessing multi-source monitoring data, and constructing a basic data sample set, including: The system monitors the time-series data of electrical parameters in the operation and management system of the target photovoltaic power station, performs structured processing on the time-series data of electrical parameters, and performs data cleaning processing on the structured time-series data of electrical parameters to obtain electrical parameter sequence samples. Environmental monitoring time-series data of environmental monitoring points in the operation scenario of the target photovoltaic power station are monitored, the environmental monitoring time-series data are structured, and the structured environmental monitoring time-series data are cleaned to obtain environmental parameter sequence samples. Obtain inspection image data of the target photovoltaic power station, perform feature analysis on the inspection image data, and extract candidate image samples related to defects in the photovoltaic array components from the inspection image data; The candidate image samples are bound to the component identifiers and timestamp identifiers of the associated photovoltaic array components to generate initial image modal samples. The initial image modal samples are then cleaned to obtain target image modal samples. A basic data sample set is constructed based on electrical parameter sequence samples, environmental parameter sequence samples, target image modal samples, and the time information of each sample.

3. The photovoltaic array fault risk diagnosis method based on multi-mode data fusion as described in claim 2, characterized in that, The multimodal data feature extraction model includes a temporal feature extraction module and a visual feature extraction module. The temporal feature extraction module is constructed based on the temporal encoder of the Informer model, and the visual feature extraction module is constructed based on the image encoder of the visual Transformer model. The time-series feature extraction module is configured to encode the electrical parameter sequence samples and environmental parameter sequence samples in the basic data sample set using a sparse attention mechanism, and output a time-series feature vector. The visual feature extraction module is configured to encode the target image modality samples in the basic data sample set using a self-attention mechanism and output an image feature vector.

4. The photovoltaic array fault risk diagnosis method based on multi-mode data fusion as described in claim 3, characterized in that, The time-series feature extraction module is further configured to concatenate the input electrical parameter sequence sample and environmental parameter sequence sample according to the time dimension to obtain a joint time-series data sample at the same time scale, and to perform a linear transformation on the joint time-series data sample to obtain a query vector matrix, a key vector matrix and a value vector matrix. The temporal feature extraction module is further configured to perform local attention calculation based on the query vector matrix, key vector matrix, and value vector matrix using a sparse attention mechanism to obtain the local attention probability distribution at each time step, and generate global attention features based on the local attention probability distribution, as shown in the following formula: in, Represents global attention features. , and These represent the query vector matrix, key vector matrix, and value vector matrix, respectively. Let represent the local attention probability distribution for all key vectors at the i-th time step. Indicates transpose. Representing the query matrix The row vector corresponding to the i-th time step. The dimension of the attention feature space is used to scale the query-key product result. This represents the activation function, which is used to transform the input time-series data samples into a probability distribution representation; The temporal feature extraction module is further configured to calculate the information entropy of the attention probability distribution at each time step based on the global attention features, and to filter the sparse location set according to the information entropy, as shown in the following formula: in, Represents the attention probability distribution at the i-th time position. Information entropy This represents the attention probability distribution at the i-th time position. In the diagram, the probability value corresponding to the j-th key vector is... This represents the total number of key vectors in the key matrix. This represents the set of sparse locations obtained by filtering based on information entropy. This represents a function that takes the minimum value of the corresponding variable. The number of elements is represented by , and u represents the size of the sparse position set. Represents the sampling factor constant. This represents the floor function; The temporal feature extraction module is further configured to determine the attention weight vector of the sparse positions based on the sparse position set and the global attention features, and to determine the temporal feature vector of the sparse positions based on the attention weight vector. The temporal feature vector of the non-sparse positions is then filled with the mean value, as shown in the following formula: in, Indicates the first Attention weight vectors at each time position, Indicates the first Attention weight vector at each position In the diagram, the weight value corresponding to the j-th key is... Value matrix The row vector corresponding to the j-th time position in the middle. This represents the temporal feature vector output at the i-th time position.

5. The photovoltaic array fault risk diagnosis method based on multi-mode data fusion as described in claim 4, characterized in that, The visual feature extraction module is further configured to divide the input target image modal sample into multiple non-overlapping image blocks according to a preset size, flatten the multiple non-overlapping image blocks into image vectors, and concatenate the image vectors in sequence to obtain an image block sequence matrix. The visual feature extraction module is further configured to define a linear embedding matrix and a bias based on the preset size and the number of channels of the target image modality samples, and to map the image patch sequence matrix to the image feature space of the visual Transformer model based on the linear embedding matrix and the bias to generate an embedding representation vector, as shown in the following formula: in, This represents the k-th embedding vector. Represents a linear embedding matrix. Indicates bias. Indicates the embedding dimension. This represents the image vector after the k-th image patch has been flattened and transformed. This represents the number of channels in the modal samples of the target image. This indicates a preset size, which is the length of the image vector. This represents the dimension of the linear embedding matrix. Indicates the total number of image patches; The visual feature extraction module is further configured to initialize and generate a position encoding vector and a category label vector based on the embedding dimension, and generate sequence features based on the position encoding vector, the category label vector, and the embedding representation vector, as shown in the following formula: in, Representing sequence features, This represents the encoding vector at the k-th position. Represents the category label vector, This represents the total length of the category label vector plus the image patch sequence. Dimensions representing sequence features; The visual feature extraction module is further configured to input the sequence features into a multilayer image encoder for feature encoding and output an image feature vector. The interlayer mapping of the multilayer image encoder is defined as follows: in, This represents the image feature vector output by the i-th layer image encoder. For encoder layers, This represents an encoder block composed of a multi-head self-attention module and a feedforward network, used to model global correlation features between different image blocks and extract image representation features.

6. The photovoltaic array fault risk diagnosis method based on multi-mode data fusion as described in any one of claims 1 to 5, characterized in that, The calculation of the effective information contribution of each modal feature, and the weighted fusion of the multimodal features based on the effective information contribution to obtain a fused feature vector, includes: The probability vectors of each modality feature for different risk levels are calculated using the modality classification head, referring to the following formula: in, This represents the modality classification header function. Indicates the first Modal feature vectors of each mode, Indicates the number of risk levels. Indicates the first modal pair A probability vector for each risk level. Indicates the first Each modality corresponds to the probability value of the j-th risk level; The information contribution supervision quantity of each modality feature is determined based on the probability vector, referring to the following formula: in, Indicates the first The information contribution of each modal feature is supervised. This represents the one-hot vector corresponding to the true label of the input sample. This represents the transpose of the one-hot vector; A contribution mapping function and a contribution analysis network are constructed. The modal features are input into the contribution mapping function for mapping to obtain a discriminant representation related to information contribution. This discriminant representation is then input into the contribution analysis network to output an estimated contribution value, as shown in the following formula: in, This represents the contribution estimate of the m-th modal feature. This represents the contribution mapping function, used to extract discriminant representations related to information contribution. Indicate contribution analysis network; The contribution estimate and the information contribution supervision quantity are iterated through regression loss to make the contribution estimate approach the information contribution supervision quantity. At the end of the iteration, the effective information contribution is output. Based on the effective information contribution of each modal feature, the contribution is normalized to generate the normalized weight coefficient of each modal feature; The multimodal features are weighted and fused based on the normalized weight coefficients to obtain a fused feature vector.

7. The photovoltaic array fault risk diagnosis method based on multi-mode data fusion as described in any one of claims 1 to 5, characterized in that, The step of inputting the fused feature vector into a pre-built risk prediction model and outputting risk prediction results includes: The fused feature vectors corresponding to multiple consecutive time windows are arranged in chronological order to obtain a fused feature vector sequence, and the fused feature vector sequence is input into a pre-constructed risk prediction model; The risk prediction model is configured to construct a causal dilated convolution operation based on a preset convolution kernel length and dilation coefficient for the input fused feature vector sequence. The model performs weighted calculations by weighting the feature vectors at corresponding positions through the weight matrix of each convolution kernel and adding a bias term to obtain the convolution output features at each time step. The risk prediction model is further configured to combine the convolutional output features at each time step in chronological order to obtain the initial input feature sequence of the residual connection module, and input the initial input feature sequence into the residual connection module to perform multi-layer residual processing and output the target time feature. The risk prediction model is further configured to input the target time feature into the classification layer, and process the target time feature and the classification layer parameters through the softmax function to output a risk category probability vector. The risk prediction model is further configured to output a risk level category based on the risk category probability vector using the maximum a posteriori principle, and output the risk level category as a risk prediction result.

8. A photovoltaic array fault risk diagnosis device based on multi-mode data fusion, characterized in that, The device is configured to implement the photovoltaic array fault risk diagnosis method based on multi-mode data fusion as described in any one of claims 1 to 7, and the device includes: The data processing module is used to monitor the target photovoltaic power station and preprocess the multi-source monitoring data to construct a basic data sample set. The multi-source monitoring data includes the time series data of the electrical parameters of the target photovoltaic power station, the time series data of environmental monitoring, and the inspection image data. The feature extraction module is used to input the basic data sample set into a pre-constructed multimodal data feature extraction model and output multimodal features, including temporal feature vectors and image feature vectors; The feature fusion module is used to calculate the effective information contribution of each modality feature, and to perform weighted fusion of the multimodal features based on the effective information contribution to obtain a fused feature vector; The risk prediction module is used to input the fused feature vector into a pre-built risk prediction model, output the risk prediction result, and perform fault risk diagnosis on the photovoltaic array components of the target photovoltaic power station based on the risk prediction result. The risk prediction model is a temporal convolutional network composed of multi-layer causal dilated convolution and residual connection modules. The risk prediction model is configured to model and analyze the time dependence and evolution mode of the fused feature vector.

9. A photovoltaic array fault risk diagnosis device based on multi-mode data fusion, characterized in that, The photovoltaic array fault risk diagnosis device based on multi-mode data fusion includes: a memory, a processor, and a photovoltaic array fault risk diagnosis program based on multi-mode data fusion stored in the memory. The processor is used to run the photovoltaic array fault risk diagnosis program based on multi-mode data fusion. The photovoltaic array fault risk diagnosis program based on multi-mode data fusion is configured to implement the photovoltaic array fault risk diagnosis method based on multi-mode data fusion as described in any one of claims 1 to 7.

10. A computer-readable storage medium, characterized in that, The computer-readable storage medium stores a photovoltaic array fault risk diagnosis program based on multi-mode data fusion, which, when executed by a processor, implements the photovoltaic array fault risk diagnosis method based on multi-mode data fusion as described in any one of claims 1 to 7.