A multi-modal irradiance prediction method and system deeply fused with a physical model
By deeply fusing clear-sky physical models with multimodal data and utilizing cross-modal attention and gating fusion mechanisms, the physical consistency problem of multimodal prediction methods under complex meteorological conditions was solved, achieving high-precision and reliable irradiance prediction.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- STATE GRID JIANGSU ELECTRIC POWER CO LTD SUZHOU BRANCH
- Filing Date
- 2026-05-22
- Publication Date
- 2026-06-19
Smart Images

Figure CN122241619A_ABST
Abstract
Description
Technical Field
[0001] This invention belongs to the field of irradiance prediction technology, and more specifically, relates to a multimodal irradiance prediction method and system that deeply integrates physical models. Background Technology
[0002] Solar irradiance is the core driving factor of photovoltaic systems, and its short-term changes are mainly affected by cloud movement and atmospheric radiation processes.
[0003] In recent years, with the rapid development of ground-based cloud image monitoring, meteorological sensing, and multimodal data acquisition technologies, researchers have begun to explore solar irradiance prediction methods based on multi-source data fusion. Multimodal deep learning models, by combining image modalities and meteorological modalities, have significantly improved the model's representational capabilities. However, these existing models are prone to insufficient physical consistency under complex meteorological conditions, leading to prediction results that deviate from the laws of radiation physics. Although some studies have corrected predictions using physical models after deep learning, the interaction between physical models and other modalities is relatively shallow, making it difficult to effectively fuse information on physical laws.
[0004] Traditional time series models (such as ARIMA), statistical learning methods (such as SVR), and single-modal deep learning models (such as LSTM and CNN) can predict the trend of irradiance changes to some extent, but when dealing with complex and changeable weather conditions, they are often unable to accurately capture the dynamic evolution characteristics of cloud systems due to their limitations in spatiotemporal feature representation and physical consistency modeling. Summary of the Invention
[0005] To address the shortcomings of existing technologies, this invention provides a multimodal irradiance prediction method and system that deeply integrates physical models.
[0006] The present invention adopts the following technical solution.
[0007] A first aspect of the present invention provides a multimodal irradiance prediction method based on a deep fusion physical model, comprising: Collect continuous ground-based cloud image sequences and meteorological observation data within a set time period, and calculate clear-sky irradiance using a clear-sky physical model to form clear-sky physical modal data; Optical flow calculations are performed on ground-based cloud image sequences to extract the spatiotemporal evolution characteristics of cloud movement and construct dynamic temporal features of cloud images; Meteorological observation data, clear-sky physical mode data, and dynamic temporal features of cloud images are aligned by timestamps and used as temporal inputs for meteorological mode, physical mode, and image mode, respectively. A cross-modal attention mechanism is used to interact and fuse the temporal features of meteorological, physical, and image modalities to generate fused multimodal features. The multimodal fusion features are input into the Transformer encoder for global temporal modeling, and the predicted values of solar irradiance at future times are output.
[0008] Optionally, the calculation of clear-sky irradiance using a clear-sky physics model includes: Input meteorological observation data into the clear-sky physical model; The clear-sky physical model calculates the solar altitude angle and atmospheric mass based on the time and geographical location information in the input data, and calculates the water vapor optical thickness and aerosol optical thickness based on the temperature and humidity information in the input data, respectively. Clear-sky irradiance is calculated based on solar elevation angle, atmospheric mass, water vapor optical thickness, aerosol optical thickness, and the optical thickness of clean, dry atmosphere.
[0009] Optionally, the optical flow calculation of the ground-based cloud image sequence includes: The dense optical flow method is used to calculate the pixel displacement field between consecutive cloud image frames, and the optical flow field reflecting the speed and direction of cloud movement is obtained. Spatiotemporal evolution features are extracted from each frame of cloud image and the corresponding optical flow field. The spatiotemporal evolution features include cloud image brightness statistics, optical flow amplitude statistics, optical flow direction statistics, and cloud coverage. The spatiotemporal evolution features of a continuously preset number of frames are spliced together in chronological order to form dynamic temporal features of the cloud map.
[0010] Optionally, the interaction and fusion of temporal features from meteorological, physical, and image modalities via a cross-modal attention mechanism includes: A first cross-modal attention calculation is performed between the image modality and the meteorological modality to generate cloud image-meteorological interaction features; A second cross-modal attention computation is performed between the meteorological and physical modes to generate meteorological-clear sky interaction features; A third cross-modal attention computation is performed between the image modality and the physical modality to generate cloud map-clear sky interactive features; The cloud image-meteorological interaction features, meteorological-clear sky interaction features, and cloud image-clear sky interaction features are fused through a phased gating fusion mechanism to generate multimodal fusion features.
[0011] Optionally, the phased gating fusion mechanism is executed by a gating fusion unit, including two-level gating fusion: The first level of gating fusion involves weighted fusion of meteorological-clear sky interaction features and cloud image-clear sky interaction features to generate core physical features; The second level of gated fusion involves weighted fusion of core physical features and cloud image-meteorological interaction features to generate multimodal fusion features. In this process, the gating factor of each level of gating fusion is calculated through a learnable weight matrix and a bias term, and then constrained within a preset range using an activation function to achieve adaptive weight allocation for different modal features.
[0012] A second aspect of the present invention provides a multimodal irradiance prediction system based on a deep fusion physical model, used to implement the multimodal irradiance prediction method based on a deep fusion physical model described in the first aspect of the present invention, comprising: The module comprises a data acquisition and processing module, a multimodal fusion module, and a time series modeling and prediction module, among which: The data acquisition and processing module is used to collect continuous ground-based cloud image sequences and meteorological observation data, and calculate clear-sky irradiance through a clear-sky physical model to form clear-sky physical modal data and dynamic temporal characteristics of cloud images; The multimodal fusion module is used to align meteorological observation data, clear sky physical modal data, and dynamic temporal features of cloud images, and interacts and fuses them through cross-modal attention mechanism and sub-gated fusion mechanism to generate multimodal fusion features; The temporal modeling and prediction module is used to input multimodal fusion features into the Transformer encoder for global temporal modeling and output the predicted value of solar irradiance at future times.
[0013] Optionally, the data acquisition and processing module includes a ground-based cloud image acquisition unit, a meteorological sensor unit, and a clear-sky physical model calculation unit, wherein: The foundation cloud image acquisition unit is used to acquire a continuous sequence of foundation cloud images within a set time period; The meteorological sensor unit is used to collect meteorological observation data; The clear-sky physical model calculation unit is used to calculate the theoretical clear-sky irradiance in real time based on the data collected by the meteorological sensor unit, and form clear-sky physical mode data.
[0014] Optionally, the multimodal fusion module includes a cross-modal attention unit and a gated fusion unit, wherein: The cross-modal attention unit is used to perform bidirectional attention interaction computation between image modality, meteorological modality and physical modality to generate cloud image-meteorological interaction features, meteorological-clear sky interaction features and cloud image-clear sky interaction features; The gated fusion unit is used to perform two-level weighted fusion on the cloud image-meteorological interaction features, the meteorological-clear sky interaction features, and the cloud image-clear sky interaction features to generate the multimodal fusion features.
[0015] A third aspect of the present invention provides an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the computer program, when loaded onto the processor, implements a multimodal irradiance prediction method based on a deep fusion physical model according to a first aspect of the present invention.
[0016] A fourth aspect of the present invention provides a computer-readable storage medium storing a computer program that, when executed by a processor, implements a multimodal irradiance prediction method based on a deep fusion physical model according to a first aspect of the present invention.
[0017] Compared with the prior art, the beneficial effects of the present invention include at least the following: 1. This invention solves the problem of insufficient physical consistency of existing multimodal prediction methods under complex meteorological conditions by deeply integrating clear-sky physical models as physical constraints, and achieves a high degree of consistency between prediction results and actual radiation physics laws, significantly improving the accuracy and reliability of irradiance prediction.
[0018] 2. This invention achieves deep interaction and adaptive feature fusion between image, meteorological and physical modalities by designing a cross-modal attention mechanism and a multi-gated fusion mechanism. It solves the problems of shallow interaction levels and fixed fusion methods in traditional multimodal methods, and enhances the model's adaptability to different weather conditions and its feature expression ability.
[0019] 3. This invention treats the calculation results of the clear sky physical model as an independent physical mode and aligns and deeply fuses them with images and meteorological modes. This solves the limitations of existing methods that do not fully utilize physical information and only use it for post-processing. It realizes the deep embedding and dynamic guidance of physical laws in the prediction process, thereby ensuring the physical rationality and interpretability of the prediction results. Attached Figure Description
[0020] Figure 1 This is a flowchart of a method provided according to an embodiment of the present invention; Figure 2 This is a comparison chart of the prediction results of the thin cloud solar model provided in accordance with the embodiments of the present invention; Figure 3 This is a comparison chart of the prediction results of the thick cloud solar model provided in accordance with the embodiments of the present invention; Figure 4 This is a comparison chart of the prediction results of the cloud abrupt change day model provided in accordance with the embodiments of the present invention. Detailed Implementation
[0021] To make the objectives, technical solutions, and advantages of this invention clearer, the technical solutions of this invention will be clearly and completely described below with reference to the accompanying drawings of the embodiments of this invention. The described embodiments are merely some embodiments of this invention, and not all embodiments. Based on the spirit of this invention, all other embodiments obtained by those skilled in the art without creative effort are within the protection scope of this invention.
[0022] In Example 1, this invention provides a multimodal irradiance prediction method based on a deep fusion physical model, such as... Figure 1 As shown, it includes the following steps: Step 1: Collect continuous ground-based cloud image sequences and meteorological observation data within a set time period, and calculate clear-sky irradiance using a clear-sky physical model to form clear-sky physical modal data.
[0023] Preferably, the calculation of clear-sky irradiance using a clear-sky physics model includes: Input meteorological observation data into the clear-sky physical model; The clear-sky physical model calculates the solar altitude angle and atmospheric mass based on the time and geographical location information in the input data, and calculates the water vapor optical thickness and aerosol optical thickness based on the temperature and humidity information in the input data, respectively. Clear-sky irradiance is calculated based on solar elevation angle, atmospheric mass, water vapor optical thickness, aerosol optical thickness, and the optical thickness of clean, dry atmosphere.
[0024] Specifically, the clear-sky physical model of this invention is a parametric calculation model. Its core lies in introducing the quantification of the physical processes of atmospheric water vapor optical thickness and aerosol optical thickness to dynamically correct the basic optical thickness of clean, dry air. This more accurately reflects the attenuation effect of actual atmospheric turbidity on solar radiation, obtaining the surface solar irradiance (GHI) under theoretical clear-sky conditions. The calculation process of this clear-sky physical model is as follows: (1) Calculate astronomical and basic geographic parameters Astronomical and basic geographic parameters include solar declination, solar hour angle, solar altitude angle, and atmospheric mass, which are calculated as follows: ① Solar declination: Solar declination determines the latitude of the subsolar point, reflecting the influence of the Earth's orbital inclination on the distribution of solar radiation, and affecting the intensity of solar radiation and the duration of sunshine in different seasons.
[0025] In the formula: δ—solar declination angle; n—accumulated days arranged in order of number of days.
[0026] ② Solar hour angle: The solar hour angle describes the position of the sun relative to the local meridian. It reflects the sun's trajectory throughout the day and can be used to calculate the solar altitude angle. The formula for calculating the solar altitude angle based on the East 8 time zone is:
[0027] In the formula: t—Eastern Eighth Time Zone; ω—Solar Hour Angle; λ—Longitude.
[0028] ③ Solar altitude angle: The solar altitude angle determines the angle between the sun's rays and the Earth's surface. The calculation method is as follows:
[0029] In the formula: φ—latitude; h—solar altitude angle.
[0030] ④ Atmospheric mass: Atmospheric mass is a key parameter describing the degree of attenuation of solar radiation in the atmosphere. Its corrected calculation method is as follows:
[0031] In the formula: —Atmospheric quality.
[0032] (2) Calculate the optical thickness of each atmospheric component. Calculate the optical thickness of a clean, dry atmosphere:
[0033] The optical thickness of water vapor is calculated using temperature T and relative humidity RH:
[0034] The optical thickness of the aerosol is calculated using temperature, humidity, and solar altitude angle.
[0035] (3) Determine the Linke turbidity factor and the actual total atmospheric optical thickness. The Linke turbidity factor, based on physical processes, measures the degree of turbidity of actual clear-sky air relative to clean, dry air. The specific formula is as follows:
[0036] In the formula: —Total optical thickness of the atmosphere in clear skies; —Optical thickness of clean, dry atmosphere; TL —Link turbidity factor.
[0037] The total atmospheric optical thickness can be expressed as the sum of the optical thicknesses contributed by different components, that is:
[0038] From the above two equations, we can obtain the method for calculating the Link turbidity factor based on physical processes:
[0039] (4) Calculate the theoretical clear-sky irradiance (GHI) The solar irradiance intensity at a certain moment at the upper boundary of the atmosphere is calculated as follows:
[0040] In the formula: —The intensity of solar radiation at any given moment at the upper boundary of the atmosphere; —Solar constant = 1367 W / m 2 ε— Deviation correction factor, used to correct for deviations caused by changes in the distance between the Sun and the Earth.
[0041] The formula for calculating the deviation correction factor ε is as follows:
[0042] A general-purpose clear-sky irradiance model based on the Linke turbidity factor is as follows:
[0043] Substituting the Linke turbidity factor calculated based on meteorological data into the model, we obtain the new model:
[0044] in, —Clear sky irradiance; —High correlation coefficient, where: a1=0.0002545×altitude+0.868; a2=0.000196×altitude+0.0387; f h1 =e (-altitude / 8000) ; f h2 =e (-altitude / 1250) ; altitude — elevation, m; —The intensity of solar radiation at any given moment at the upper boundary of the atmosphere; —Solar altitude angle; —Atmospheric quality; —Optical thickness of clean, dry atmosphere; —Water vapor optical thickness; — Aerosol optical thickness.
[0045] Step 2: Perform optical flow calculations on the ground-based cloud image sequence to extract the spatiotemporal evolution characteristics of cloud movement and construct dynamic temporal features of the cloud image.
[0046] Preferably, the optical flow calculation of the ground-based cloud image sequence includes: The dense optical flow method is used to calculate the pixel displacement field between consecutive cloud image frames, and the optical flow field reflecting the speed and direction of cloud movement is obtained. Spatiotemporal evolution features are extracted from each frame of cloud image and the corresponding optical flow field. The spatiotemporal evolution features include cloud image brightness statistics, optical flow amplitude statistics, optical flow direction statistics, and cloud coverage. The spatiotemporal evolution features of a continuously preset number of frames are spliced together in chronological order to form dynamic temporal features of the cloud map.
[0047] It should be noted that ground-based cloud images are the most direct and richest source of visual data reflecting sky conditions. Their spatiotemporal variations include the direction, speed, thickness, and dynamic process of cloud movement and solar eclipse. Optical flow provides information on cloud speed and direction, enabling the model to perceive cloud movement trends. This not only reduces image noise and computational burden through feature statistical abstraction but also, by combining it with clear-sky constraints, maps changes in optical flow speed to a decreasing irradiance trend, achieving dynamic prediction. In this invention, this module provides stable visual temporal feature input for subsequent multimodal Transformers, forming the physical perception foundation of the entire system.
[0048] This invention employs the Farneback dense optical flow method to calculate the optical flow field between consecutive frames. Optical flow describes the motion vector field corresponding to the change of pixel grayscale in an image sequence over time. It reflects the velocity and direction of an object's motion on the image plane. Given two consecutive grayscale images... The assumption of constant brightness is satisfied within a short time interval:
[0049] Where (u,v) represents the pixel at time... Displacement within.
[0050] Performing a first-order Taylor expansion on this equation yields the optical flow constraint equation:
[0051] The Farneback algorithm solves for a smooth displacement field using multi-scale pyramids and polynomial approximation. .
[0052] The amplitude and direction of optical flow are:
[0053] In the formula: M—optical flow amplitude, reflecting the speed of cloud movement, —Direction of movement.
[0054] This invention selects a continuous sequence of ground-based cloud images with a resolution of 128×128, with a sampling interval of 5 minutes. To construct the temporal input, six consecutive frames of images form a sample sequence:
[0055] Dense optical flow is calculated for each pair of adjacent frames to obtain the velocity field. .
[0056] To reduce computational complexity and noise interference while preserving key spatiotemporal dynamic information, the following features are extracted from each frame of cloud image and its corresponding optical flow: mean cloud image brightness. with standard deviation (Reflecting the overall brightness of the sky and the thickness of the cloud layer); mean value of optical flow amplitude With variance mean value of optical flow direction With variance (Reflecting the speed and diffusion trend of cloud clusters); Cloud coverage index (Obtained through a fixed brightness threshold segmentation. Using a grayscale value of 180 as the threshold, areas with pixel brightness ≥ 180 are identified as clouds, and the proportion of cloud pixels to total pixels is calculated.)
[0057] These statistical features are then concatenated along the time dimension to form a time-series feature vector. :
[0058] This allows for the construction of a compressed but physically relevant spatiotemporal feature sequence, providing structured data for subsequent multimodal Transformer inputs.
[0059] Step 3: Align meteorological observation data, clear-sky physical mode data, and cloud image dynamic temporal features by timestamp, and use them as temporal inputs for meteorological mode, physical mode, and image mode, respectively.
[0060] Step 4: Interact and fuse the temporal features of meteorological, physical, and image modalities through a cross-modal attention mechanism to generate fused multimodal features.
[0061] The interaction and fusion of temporal features from meteorological, physical, and image modalities through a cross-modal attention mechanism includes: A first cross-modal attention calculation is performed between the image modality and the meteorological modality to generate cloud image-meteorological interaction features; A second cross-modal attention computation is performed between the meteorological and physical modes to generate meteorological-clear sky interaction features; A third cross-modal attention computation is performed between the image modality and the physical modality to generate cloud map-clear sky interactive features; The cloud image-meteorological interaction features, meteorological-clear sky interaction features, and cloud image-clear sky interaction features are fused through a phased gating fusion mechanism to generate multimodal fusion features.
[0062] Specifically, this invention designs a multi-scale temporal convolution module at the Transformer front end, performing convolution operations on the input sequence in parallel using different convolution kernel scales, thereby extracting short-term fluctuations, medium-term trends, and relatively stable change features. Let the input sequence... The set of convolution kernel scales is ,but:
[0063] In the formula: —Temporal features extracted under convolutional kernel k; — Convolution operation on the sequence x time dimension; — Batch normalization operation; —Modify the activation function of the linear unit.
[0064] Each convolutional branch learns short-term fluctuations, medium-term changes, and longer-term trends, respectively. The concatenated branches are then fused using a linear layer.
[0065] In the formula: —Global temporal features obtained after fusing multi-scale convolutional features; —Weight matrix of the linear fusion layer; — Bias term.
[0066] In multimodal prediction tasks, complex nonlinear dependencies exist between different modalities (cloud imagery, weather, clear sky models). This invention proposes a cross-modal attention mechanism, which establishes a query-key-value relationship between modalities and achieves directed interaction of modal features through adaptive weight calculation.
[0067] Let the modal features of the cloud map be Meteorological modes are Clear sky mode is The cross-modal attention mechanism can then be expressed as:
[0068] In the formula: — Scaling factor; These represent different modalities of query, key, and value; —Weight normalization.
[0069] Furthermore, in the cloud image-weather interaction:
[0070] In the weather-clear sky interaction:
[0071] In the Cloud Atlas - Clear Sky interaction:
[0072] The generated three-way interaction features are high-order association representations extracted through fine-grained interactions between different modal pairs using an attention mechanism. These features dynamically capture the complementarity and correlation between modalities and suppress intra-modal noise. They serve as the input basis for subsequent staged gating fusion, enabling the gating mechanism to perform adaptive weighted fusion on the aligned high-quality interaction features.
[0073] Preferably, the phased gating fusion mechanism is executed by a gating fusion unit, including two-level gating fusion: The first level of gating fusion involves weighted fusion of meteorological-clear sky interaction features and cloud image-clear sky interaction features to generate core physical features; The second level of gated fusion involves weighted fusion of core physical features and cloud image-meteorological interaction features to generate multimodal fusion features. In this process, the gating factor of each level of gating fusion is calculated through a learnable weight matrix and a bias term, and then constrained within a preset range using an activation function to achieve adaptive weight allocation for different modal features.
[0074] Specifically, the importance of multimodal information is not constant under different weather conditions. Clear-sky models are more accurate in clear weather, while cloud image dynamics become the dominant information in cloudy or convective weather. Therefore, this invention introduces a cascaded gating fusion mechanism driven by a clear-sky physical model in the fusion stage. This mechanism automatically adjusts the weights of each modal feature through a gating factor, achieving adaptive modality selection.
[0075] The first-level gated fusion input involves two multimodal interaction features H from the clear sky physics model. ms With H cs Output core physical characteristics for:
[0076] in This represents element-wise multiplication, and the gate factor g1 is calculated as follows:
[0077] In the formula: —The sigmoid activation function controls the fusion ratio between 0 and 1; —Learnable gating layer weights; — Bias term.
[0078] Second-level gated fusion input core physical characteristics Output gated fusion features for:
[0079] The gating factor g2 is defined as:
[0080] In the formula: —Learnable gating layer weights; — Bias term.
[0081] This design ensures that the model automatically adapts the influence weights of different modalities during the fusion process. The first-level gated fusion constructs the physical kernel of the model. By calculating the gating factor g1, the features of the deep interaction between the two clear-sky physical models are dynamically fused to generate the core physical feature H. phys This ensures that the model's predictions are based on physical laws; the second-level gating fusion uses core physical features as input. The gating factor g2 is calculated, and the physical kernel is used to guide the final prediction result, supplemented by observation data correction, to achieve physical model-driven correction.
[0082] This gated fusion mechanism solves the fixed weight problem in multimodal fusion, achieves adaptive modality scheduling, and effectively suppresses redundant features. When used in conjunction with cross-modal attention, the model will have the ability to perceive and adapt across modalities.
[0083] Step 5: Input the multimodal fusion features into the Transformer encoder for global temporal modeling and output the predicted solar irradiance values for future times.
[0084] Specifically, the fused multimodal features The input is fed into an improved Transformer encoder. Each coding layer consists of a multi-head self-attention (MHA) network and a feedforward network (FFN):
[0085]
[0086] In the formula: — Layer normalization operation; — Output of the self-attention layer; —The final output of the Transformer encoding layer.
[0087] Multi-head attention mechanisms allow the model to capture global dependencies in parallel from multiple subspaces. Compared to traditional temporal models, the non-recursive structure of the Transformer significantly improves training efficiency and temporal awareness depth. This invention employs a 4-layer encoding stack, with each layer containing 4 attention heads, a hidden layer dimension of 128, and a feedforward layer dimension of 256.
[0088] Location coding Added to the sequence input to preserve timing order information:
[0089] In the formula: —Transformer input feature sequence.
[0090] The last time step vector of the Transformer output sequence As a global temporal feature, it is mapped to the irradiance prediction value after layer normalization and multilayer perceptron (MLP):
[0091] In the formula: —Features of the last time step of the output sequence; , , —Weight matrix; , , — Bias function; — Output mapping function.
[0092] The Transformer architecture leverages global dependency capture capabilities to achieve holistic modeling of irradiation time series, and performs joint optimization with physical model constraints to achieve physical consistency correction for data-driven predictions.
[0093] Specifically, in this method, the clear-sky irradiance model first calculates the theoretical upper limit of irradiance based on real-time meteorological data and maps it to a feature vector of the same dimension as the cloud map and meteorological data; the three remain independent at the input end to ensure clear physical information. In the feature fusion stage, a directional cross-modal attention mechanism using clear-sky features as the query source is used to physically correct the cloud map and meteorological features, and then the weights of each modality are dynamically allocated through a gating fusion unit—the model prioritizes referring to the theoretical upper limit of clear sky during clear weather, and adaptively increases the proportion of dynamic features of the cloud map when the cloud layer changes drastically. In addition, by introducing a physical boundary constraint term into the loss function of the model training, the predicted value is fundamentally limited to the clear-sky irradiance between zero and the corresponding time.
[0094] This invention introduces clear-sky physical constraints into a multimodal deep learning network, combining data-driven prediction with prior physical laws. The design of this method ensures the physical rationality of the prediction results and provides interpretable upper bound constraints for the Transformer architecture, thereby avoiding prediction results that contradict physical processes (such as excessively high nighttime irradiance or outliers under thick cloud cover).
[0095] In Embodiment 2, this invention provides a multimodal irradiance prediction system based on a deep fusion physical model, used to implement the multimodal irradiance prediction method based on a deep fusion physical model described in Embodiment 1, comprising: The module comprises a data acquisition and processing module, a multimodal fusion module, and a time series modeling and prediction module, among which: The data acquisition and processing module is used to collect continuous ground-based cloud image sequences and meteorological observation data, and calculate clear-sky irradiance through a clear-sky physical model to form clear-sky physical modal data and dynamic temporal characteristics of cloud images; The multimodal fusion module is used to align meteorological observation data, clear sky physical modal data, and dynamic temporal features of cloud images, and interacts and fuses them through cross-modal attention mechanism and sub-gated fusion mechanism to generate multimodal fusion features; The temporal modeling and prediction module is used to input multimodal fusion features into the Transformer encoder for global temporal modeling and output the predicted value of solar irradiance at future times.
[0096] Preferably, the data acquisition and processing module includes a ground-based cloud image acquisition unit, a meteorological sensor unit, and a clear-sky physical model calculation unit, wherein: The foundation cloud image acquisition unit is used to acquire a continuous sequence of foundation cloud images within a set time period; The meteorological sensor unit is used to collect meteorological observation data; The clear-sky physical model calculation unit is used to calculate the theoretical clear-sky irradiance in real time based on the data collected by the meteorological sensor unit, and form clear-sky physical mode data.
[0097] Preferably, the multimodal fusion module includes a cross-modal attention unit and a gated fusion unit, wherein: The cross-modal attention unit is used to perform bidirectional attention interaction computation between image modality, meteorological modality and physical modality to generate cloud image-meteorological interaction features, meteorological-clear sky interaction features and cloud image-clear sky interaction features; The gated fusion unit is used to perform two-level weighted fusion on the cloud image-meteorological interaction features, the meteorological-clear sky interaction features, and the cloud image-clear sky interaction features to generate the multimodal fusion features.
[0098] In Embodiment 3, the present invention provides an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor. When the computer program is loaded onto the processor, it implements a multimodal irradiance prediction method based on a deep fusion physical model as described in Embodiment 1.
[0099] In Embodiment 4, the present invention provides a computer-readable storage medium storing a computer program that, when executed by a processor, implements a multimodal irradiance prediction method based on a deep fusion physical model as described in Embodiment 1.
[0100] To more clearly illustrate the outstanding substantive features of this invention and the significant progress it brings to the prior art, an application example of implementing this invention is described below.
[0101] The embodiments of the present invention will now be described in detail with reference to the accompanying drawings. Specific application examples include: To verify the effectiveness of the irradiance prediction method based on the deep fusion clear-sky physical model proposed in Embodiment 1 of this invention, model training, parameter optimization, and prediction verification were conducted on a unified experimental platform. Experimental data were obtained from multiple distributed photovoltaic power stations. Meteorological data and ground-based cloud images were collected every five minutes using a power meteorological monitoring device, covering several typical months. The data was divided into training, validation, and test sets in approximately a 6:2:2 ratio. The meteorological data included temperature, humidity, wind speed, wind direction, air pressure, and measured irradiance. Clear-sky irradiance was calculated based on the meteorological data using the clear-sky irradiance physical model, while the actual irradiance was simultaneously measured by a ground-based radiation meter.
[0102] To further verify the adaptability and robustness of the irradiance prediction method based on the deep fusion clear-sky physical model under complex meteorological conditions, this embodiment selects three typical irradiance change scenarios for qualitative and quantitative analysis: thin cloud weather, thick cloud weather, and abrupt cloud cover changes. The three types of samples correspond to... Figure 2 , Figure 3 and Figure 4 This represents the variation process of solar irradiance under different cloud thicknesses and dynamic characteristics, among which, Figure 2 Figure (a) shows the forecast results on August 21, 2025, and Figure (b) shows the forecast results on August 24, 2025. Figure 3 Figure (c) shows the forecast results on June 10, 2025, and Figure (d) shows the forecast results on August 2, 2025. Figure 4 Figure (e) shows the forecast results for August 10, 2025, and Figure (f) shows the forecast results for July 17, 2025.
[0103] Figure 2The results show the predicted irradiance under thin cloud cover. The clear-sky irradiance (green curve) maintains a smooth parabolic shape, peaking at approximately 1000 W / m², representing ideal cloudless irradiance conditions. The actual irradiance (orange curve) generally follows the clear-sky curve, with only minor fluctuations around midday (approximately 11:00–14:00), reflecting the shading effect of localized thin clouds. The model's predicted curve (blue) almost perfectly matches the measured curve, showing no significant shift or lag in either the peak or fluctuation segments, indicating minimal prediction error. Physically, the model performs excellently under thin cloud conditions, demonstrating its strong sensitivity to small-scale irradiance disturbances. The multi-scale temporal convolution module extracts short-term fluctuation features in parallel using different convolution kernel scales, enabling the model to accurately capture instantaneous irradiance changes caused by sparse cloud bands. Simultaneously, the directional cross-modal cross-attention mechanism effectively integrates the spatiotemporal texture information of the ground-based cloud image with the continuous changing trends of meteorological elements, ensuring that the predicted curve remains synchronized with the measured values in terms of fluctuation phase and amplitude. Overall, the model exhibits good smoothness, physical consistency, and transient response under thin cloud conditions, indicating that it has a highly stable temporal modeling capability under clear weather conditions.
[0104] Figure 3 The data corresponds to a thick cloud weather sample. Compared to thin cloud conditions, the overall irradiance level is significantly lower, with the measured irradiance curve fluctuating continuously between 10:00 and 15:00, peaking at only 400–500 W / m². The model's predicted curve almost perfectly matches the measured curve, experiencing multiple significant cycles of rise and fall. The predicted values accurately respond to multiple short-term dip events with no missed detections. The errors in dip depth and duration are both less than 5%, and the overall trend remains highly correlated with the measured values, indicating that the model maintains excellent dynamic tracking capabilities even under moderate cloud cover. This result demonstrates that the model can effectively cope with non-stationary, strongly fluctuating irradiance sequences. Under thick cloud conditions, the model relies on meteorological features (especially relative humidity and wind speed) extracted through a directional cross-modal attention mechanism, enhancing its ability to perceive sudden changes in irradiance. Meanwhile, the gated fusion unit dynamically adjusts the contribution ratios of different modalities through weights, automatically strengthening the guidance of meteorological modalities when visual information is not entirely reliable, thus maintaining overall prediction stability.
[0105] Figure 4The results of a typical day with abrupt cloud cover changes demonstrate dramatic fluctuations in irradiance. After a steady rise to 800 W / m², the measured irradiance suddenly plummeted to 100 W / m² at 12:00, before rapidly recovering to 800 W / m² at 13:00. The model's prediction responded promptly to the sharp drop at 12:00, with the drop almost coinciding with the measured values, and its response to the secondary recovery was slightly earlier, exhibiting a slight lead time. This phenomenon indicates that the model can anticipate upcoming irradiance recovery trends, demonstrating good forward-looking predictive capabilities. This performance, maintaining accuracy and physical plausibility even under complex and variable weather conditions, is attributed to the driving force and deep interaction of the physical model within deep learning. The Transformer structure achieves temporal dependency modeling through multi-layered attention, thereby maintaining the reasonableness and reliability of predictions under abrupt weather changes.
[0106] Experimental results show that the model proposed in this invention exhibits excellent predictive performance under various typical weather conditions: in complex meteorological scenarios such as thin clouds, thick clouds and abrupt cloud changes, the model can accurately capture the rapid fluctuation trend of irradiance, the predicted curve is in high agreement with the measured value, and no prediction results that violate physical laws appear.
[0107] In summary, the irradiance prediction method based on a deep fusion of clear-sky physical models provided in this embodiment introduces a clear-sky irradiance model as a physical constraint and combines ground-based cloud image sequences with meteorological observation data to construct a joint modeling mechanism of cloud images, meteorological, and physical modes. The method employs a multi-scale temporal convolution module to extract local temporal features, utilizes a cross-modal attention mechanism to achieve deep interaction between modes, and introduces a gating fusion module to adaptively adjust the contribution weights of different modes in the prediction. Finally, an improved Transformer encoder completes the global modeling and prediction of the irradiance sequence.
[0108] Experiments show that this method exhibits excellent predictive performance under various typical weather conditions, including thin clouds, thick clouds, and abrupt cloud formations. It accurately captures rapid fluctuations in irradiance, with the predicted curves closely matching the measured values, and no predictions violating physical laws are observed. Ablation experiments further validate the effectiveness of each module. In particular, the introduction of clear-sky physical constraints significantly improves prediction accuracy (e.g., reducing the mean absolute error (MAE) to 45.0 W / m² and the coefficient of determination (R²) to 0.971), while ensuring the physical rationality and interpretability of the model output.
[0109] This disclosure can be a system, method, and / or computer program product. A computer program product may include a computer-readable storage medium having computer-readable program instructions loaded thereon for causing a processor to implement various aspects of this disclosure.
[0110] Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention and not to limit it. Although the present invention has been described in detail with reference to the above embodiments, those skilled in the art should understand that modifications or equivalent substitutions can still be made to the specific implementation of the present invention. Any modifications or equivalent substitutions that do not depart from the spirit and scope of the present invention should be covered within the protection scope of the claims of the present invention.
Claims
1. A multimodal irradiance prediction method based on a deep fusion physical model, characterized in that, include: Collect continuous ground-based cloud image sequences and meteorological observation data within a set time period, and calculate clear-sky irradiance using a clear-sky physical model to form clear-sky physical modal data; Optical flow calculations are performed on ground-based cloud image sequences to extract the spatiotemporal evolution characteristics of cloud movement and construct dynamic temporal features of cloud images; Meteorological observation data, clear-sky physical mode data, and dynamic temporal features of cloud images are aligned by timestamps and used as temporal inputs for meteorological mode, physical mode, and image mode, respectively. A cross-modal attention mechanism is used to interact and fuse the temporal features of meteorological, physical, and image modalities to generate fused multimodal features. The multimodal fusion features are input into the Transformer encoder for global temporal modeling, and the predicted values of solar irradiance at future times are output.
2. The multimodal irradiance prediction method of a deep fusion physical model according to claim 1, characterized in that: The calculation of clear-sky irradiance using a clear-sky physical model includes: Input meteorological observation data into the clear-sky physical model; The clear-sky physical model calculates the solar altitude angle and atmospheric mass based on the time and geographical location information in the input data, and calculates the water vapor optical thickness and aerosol optical thickness based on the temperature and humidity information in the input data, respectively. Clear-sky irradiance is calculated based on solar elevation angle, atmospheric mass, water vapor optical thickness, aerosol optical thickness, and the optical thickness of clean, dry atmosphere.
3. The multimodal irradiance prediction method of a deep fusion physical model according to claim 1, characterized in that: The optical flow calculation of the ground-based cloud map sequence includes: The dense optical flow method is used to calculate the pixel displacement field between consecutive cloud image frames, and the optical flow field reflecting the speed and direction of cloud movement is obtained. Spatiotemporal evolution features are extracted from each frame of cloud image and the corresponding optical flow field. The spatiotemporal evolution features include cloud image brightness statistics, optical flow amplitude statistics, optical flow direction statistics, and cloud coverage. The spatiotemporal evolution features of a continuously preset number of frames are spliced together in chronological order to form dynamic temporal features of the cloud map.
4. The multimodal irradiance prediction method of a deep fusion physical model according to claim 1, characterized in that: The interaction and fusion of temporal features from meteorological, physical, and image modalities through a cross-modal attention mechanism includes: A first cross-modal attention calculation is performed between the image modality and the meteorological modality to generate cloud image-meteorological interaction features; A second cross-modal attention computation is performed between the meteorological and physical modes to generate meteorological-clear sky interaction features; A third cross-modal attention computation is performed between the image modality and the physical modality to generate cloud map-clear sky interactive features; The cloud image-meteorological interaction features, meteorological-clear sky interaction features, and cloud image-clear sky interaction features are fused through a phased gating fusion mechanism to generate multimodal fusion features.
5. The multimodal irradiance prediction method of a deep fusion physical model according to claim 4, characterized in that: The phased gating fusion mechanism is executed by the gating fusion unit and includes two levels of gating fusion: The first level of gating fusion involves weighted fusion of meteorological-clear sky interaction features and cloud image-clear sky interaction features to generate core physical features; The second level of gated fusion involves weighted fusion of core physical features and cloud image-meteorological interaction features to generate multimodal fusion features. In this process, the gating factor of each level of gating fusion is calculated through a learnable weight matrix and a bias term, and then constrained within a preset range using an activation function to achieve adaptive weight allocation for different modal features.
6. A multimodal irradiance prediction system based on a deep fusion physical model, used to implement the multimodal irradiance prediction method based on a deep fusion physical model as described in any one of claims 1-5, characterized in that, include: The module comprises a data acquisition and processing module, a multimodal fusion module, and a time series modeling and prediction module, among which: The data acquisition and processing module is used to collect continuous ground-based cloud image sequences and meteorological observation data, and calculate clear-sky irradiance through a clear-sky physical model to form clear-sky physical modal data and dynamic temporal characteristics of cloud images; The multimodal fusion module is used to align meteorological observation data, clear sky physical modal data, and dynamic temporal features of cloud images, and interacts and fuses them through cross-modal attention mechanism and sub-gated fusion mechanism to generate multimodal fusion features; The temporal modeling and prediction module is used to input multimodal fusion features into the Transformer encoder for global temporal modeling and output the predicted value of solar irradiance at future times.
7. The multimodal irradiance prediction system based on a deep fusion physical model according to claim 6, characterized in that: The data acquisition and processing module includes a ground-based cloud image acquisition unit, a meteorological sensor unit, and a clear-sky physical model calculation unit, wherein: The foundation cloud image acquisition unit is used to acquire a continuous sequence of foundation cloud images within a set time period; The meteorological sensor unit is used to collect meteorological observation data; The clear-sky physical model calculation unit is used to calculate the theoretical clear-sky irradiance in real time based on the data collected by the meteorological sensor unit, and form clear-sky physical mode data.
8. The multimodal irradiance prediction system based on a deep fusion physical model according to claim 6, characterized in that: The multimodal fusion module includes a cross-modal attention unit and a gated fusion unit, wherein: The cross-modal attention unit is used to perform bidirectional attention interaction computation between image modality, meteorological modality and physical modality to generate cloud image-meteorological interaction features, meteorological-clear sky interaction features and cloud image-clear sky interaction features; The gated fusion unit is used to perform two-level weighted fusion on the cloud image-meteorological interaction features, the meteorological-clear sky interaction features, and the cloud image-clear sky interaction features to generate the multimodal fusion features.
9. An electronic device comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, characterized in that, When the computer program is loaded into the processor, it implements a multimodal irradiance prediction method for a deep fusion physical model according to any one of claims 1-5.
10. A computer-readable storage medium storing a computer program, characterized in that, When the computer program is executed by the processor, it implements a multimodal irradiance prediction method for a deep fusion physical model according to any one of claims 1-5.