A method for ground verification and future prediction correction of satellite data of solar irradiance

By employing a physical-guided machine learning hybrid correction method and quantile mapping function, the error cycle problem in satellite data correction for the Gobi Plateau region was solved, generating a high-precision historical benchmark dataset and achieving reliability and accuracy in future solar energy resource prediction.

CN122220701APending Publication Date: 2026-06-16DATANG HYDROPOWER SCI & TECH RES INST CO LTD +1

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
DATANG HYDROPOWER SCI & TECH RES INST CO LTD
Filing Date
2026-03-19
Publication Date
2026-06-16

Smart Images

  • Figure CN122220701A_ABST
    Figure CN122220701A_ABST
Patent Text Reader

Abstract

A solar irradiance satellite data ground verification and future prediction correction method belongs to the field of meteorological data assimilation. The problem of limited accuracy of future prediction correction in the gobi plateau region due to the quality of historical reference data is solved. The method includes: generating a historical reference grid data set through machine learning hybrid correction, including: obtaining paired time series of ground truth data and satellite grid data; constructing a comprehensive feature vector enhanced by physical information; performing interpretable machine learning model training and feature importance analysis and applying it to the global grid pixel to generate the corrected historical reference data set; performing quantile mapping bias correction on future climate prediction data, including: matching the historical reference data set with the climate model historical simulation data to construct the quantile mapping function; applying the mapping function to correct the future prediction data to obtain the bias-corrected future estimation data set. It is used in the field of climate prediction data correction.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention belongs to the field of meteorological data assimilation, and in particular relates to a method for ground verification and future prediction correction of solar irradiance satellite data. Background Technology

[0002] Accurate data on solar energy, particularly surface solar irradiance, is crucial for planning and constructing large-scale photovoltaic (PV) bases in remote areas such as the Gobi Desert and plateaus. These regions are rich in solar energy resources and are key areas for energy development, but they face the challenge of extremely sparse ground-based meteorological observation stations. While ground-based station data is highly accurate, it cannot provide the finely detailed grid data with continuous spatial distribution required for planning. This makes it difficult to accurately assess the true distribution and potential of solar energy resources within a region based solely on sparse stations, potentially leading to biased investment decisions.

[0003] To obtain continuous spatial data, satellite remote sensing inversion has become the primary data source. However, satellite inversion data is affected by various factors such as the atmosphere, clouds, and sensors, resulting in significant spatiotemporal heterogeneous systematic errors. Ground-based observation correction is necessary before it can be used for high-precision applications. In regions with sparse stations, such as the Gobi Desert, traditional correction methods (such as global linear regression or Kriging based on spatial interpolation) face severe challenges. For example, linear regression assumes a fixed error relationship, but satellite errors actually vary drastically with solar altitude angle, season, and weather. In the Gobi Desert, with its intense sunlight and complex surface albedo, simple linear models struggle to capture this nonlinear error structure. Kriging interpolation, on the other hand, heavily relies on station density and spatial autocorrelation assumptions. In the vast Gobi Desert with its extremely limited number of stations, the interpolation results are highly uncertain and lack representativeness, failing to generate reliable correction fields.

[0004] In recent years, although machine learning has provided new approaches to error correction, its "black box" nature raises questions about its reliability when extrapolating to areas not covered by the site or under unknown weather scenarios. Furthermore, it cannot explain the physical source of the error, making it difficult for users to fully rely on it in major engineering decisions. More importantly, for the long-term operation and expansion planning of photovoltaic bases, it is essential to predict changes in solar energy resources over the next few decades, which relies on simulation data from global climate models (such as CMIP6). However, climate model outputs exhibit regional biases, necessitating the use of validated, high-quality historical observation data as a benchmark for "bias correction," commonly achieved through methods such as quantile mapping.

[0005] This leads to a fundamental technological gap and a circular paradox: the correction of historical data and the correction of future forecasts are usually conducted independently. If raw satellite data containing the aforementioned unverified errors, or correction results obtained using traditional methods in sparsely populated areas with questionable reliability, are directly used as the "historical observation benchmark" for correcting biases in future climate forecasts, then errors in the historical data will be amplified and propagated into future projections. For example, in the Gobi Desert region, if the sparseness of historical benchmark data leads to a systematic overestimation or underestimation of irradiance, future forecasts corrected using this benchmark will inherit this bias, resulting in misleading conclusions in long-term energy planning. This creates a vicious cycle where insufficient quality of historical benchmark data leads to unreliable future forecast correction benchmarks, which in turn results in errors remaining in the corrected future forecasts. Summary of the Invention

[0006] In view of this, the present invention aims to propose a ground verification and future prediction correction method for solar irradiance satellite data, in order to solve the problem that the accuracy of future prediction correction in the Gobi Plateau region is limited by the quality of historical reference data, resulting in an error cycle.

[0007] To achieve the above objectives, the present invention adopts the following technical solution: a method for ground verification and future prediction correction of solar irradiance satellite data, the method comprising the following two-stage sequential process: The first stage utilizes a physical-guided machine learning hybrid correction method, using historical data from a limited number of ground meteorological stations as a benchmark, to process the satellite historical grid dataset and generate a high-precision historical benchmark grid dataset. This includes the following steps: Obtain paired time series of ground truth data and satellite grid data; Construct a comprehensive feature vector with enhanced physical information, including solar geometric features, temporal features, satellite-to-ground ratio features, moving average features, lag features, and periodic features; Interpretable machine learning model training and feature importance analysis are performed based on comprehensive feature vectors; The trained model is applied to global grid cells to generate a corrected historical benchmark dataset. The second stage involves correcting the quantile mapping bias of future climate prediction data based on the historical benchmark dataset, including the following steps: By benchmarking historical datasets with historical simulation data from climate models, a quantile mapping function is constructed. The mapping function is applied to correct the future prediction data, resulting in a bias-corrected future prediction dataset.

[0008] Furthermore, a preferred method is proposed, wherein the comprehensive feature vector enhanced with physical information includes the cosine value of the solar zenith angle cos(θ SZA It is calculated using astronomical formulas:

[0009] in, The latitude of the station; δ Solar declination; ω It is the solar hour angle.

[0010] Furthermore, a preferred embodiment is proposed, wherein the satellite-to-ground ratio feature includes: instantaneous ratio. Ratio to long-term average The instantaneous ratio is:

[0011] The long-term average ratio is:

[0012] in, for t Solar irradiance values ​​retrieved from satellites at any given time. for t The solar irradiance value measured by the ground weather station at any given time. T The total length of the time series. It is a local minimum.

[0013] Furthermore, a preferred embodiment is proposed, wherein the moving average features include:

[0014] in, for t Moment α 10-step moving average, for( ti The satellite irradiance value at time ) The order of the moving average.

[0015] Furthermore, a preferred method is proposed, in which the periodicity feature is implemented through cyclic encoding, including hourly cyclic encoding and monthly cyclic encoding; The hourly cyclic encoding is as follows:

[0016]

[0017] in, The sine code value for the hour. The cosine code value for the hour. for t The number of hours in time; The month cyclic code is:

[0018]

[0019] in, The sine code value for the month. The cosine code value for the month. for t The month of the moment.

[0020] Furthermore, a preferred approach is proposed, wherein the interpretable machine learning model is a gradient boosting decision tree model, which employs a time series cross-validation strategy during training and utilizes feature importance analysis tools to quantify the contribution weights of solar geometric features, ratio features, and periodic features to the correction results.

[0021] Furthermore, a preferred method is proposed, wherein the construction of the quantile mapping function includes: For each grid point and hourly data, calculate the empirical cumulative distribution function of the historical baseline dataset and the historical simulation data of the model; In the preset quantile set q =[ q 1, q 2,…, q n On the [top], a mapping relationship is established through piecewise linear or monotonic spline interpolation.

[0022] Furthermore, a preferred method is proposed, wherein the application of the mapping function to correct future prediction data includes:

[0023] in, This is the revised predicted value of future irradiance. F i,j,h For mapping functions, x fut These are the original simulated values ​​for the data points.

[0024] Based on the same inventive concept, the present invention also proposes a computer device, including a memory and a processor, wherein the memory stores a computer program, and when the processor runs the computer program stored in the memory, the processor executes a method for ground verification and future prediction correction of solar irradiance satellite data according to any one of the preceding claims.

[0025] Based on the same inventive concept, the present invention also proposes a computer-readable storage medium storing a computer program, which, when executed by a processor, performs the steps of a method for ground verification and future prediction correction of solar irradiance satellite data as described in any one of the above.

[0026] Compared with the prior art, the beneficial effects of the present invention are: First, the method proposed in this invention effectively solves the problem of not being able to directly obtain reliable gridded historical benchmark data in the sparsely populated Gobi Desert region through a physical-guided machine learning hybrid correction stage. Unlike existing technologies that simply rely on mathematical interpolation or purely data-driven black-box models, this invention uses deterministic physical laws such as solar zenith angle and time periodicity as strong constraints, deeply integrating them into feature engineering, such as constructing features like solar zenith angle cosine, satellite-to-ground instantaneous ratio, and periodic cyclic encoding. This enables the model to understand and simulate the complex nonlinear variation of errors with solar geometric position, season, and weather persistence when trained using only limited ground station data, thus reliably extrapolating the learned correction relationships to vast grid areas without ground stations. For example, feature importance analysis after model training can clearly reveal the main driving factors of errors in a certain area. This not only gives the model white-box or gray-box interpretability and enhances the credibility of the results in scientific decision-making, but its conclusions can also guide the improvement of satellite inversion algorithms. The high-precision, spatially continuous historical benchmark grid dataset produced at this stage has been independently verified to have significantly improved accuracy. For example, in the embodiment, the root mean square error was reduced from 48.2 watts per square meter in the original satellite data to 21.5 watts per square meter.

[0027] Secondly, the method proposed in this invention organically and necessarily links the high-quality historical baseline grid dataset generated above with future climate prediction correction. In existing technologies, historical data correction and future model prediction correction are usually two independent or even disconnected processes. The observational benchmarks relied upon for future correction often directly use unverified or limited-precision raw data, leading to the propagation and amplification of historical errors. This invention, however, explicitly uses the high-precision corrected grid data generated in the first stage as the sole and authoritative observational benchmark, and constructs a quantile mapping function for each grid point and each hour to correct the systematic biases of historical simulation data from climate models. This function is then applied to the prediction of future scenarios. This design, in principle, completely cuts off the path of historical data errors propagating to future predictions, ensuring that the baseline of future predictions is established on data rigorously calibrated by ground measurements. At the same time, while correcting the absolute value bias of model predictions, this method fully preserves the future relative change signals simulated by the climate model, so that the corrected future predictions can reflect both a reliable current climate state and a credible future trend. Attached Figure Description

[0028] The accompanying drawings, which form part of this invention, are used to provide a further understanding of the invention. The illustrative embodiments of the invention and their descriptions are used to explain the invention and do not constitute an undue limitation of the invention. In the drawings: Figure 1 This is a flowchart of a method for ground verification and future prediction correction of solar irradiance satellite data as described in this invention; Figure 2 This is a schematic diagram of the first-stage physical guidance feature engineering construction described in this invention; Figure 3 This is a schematic diagram illustrating the training of the interpretable machine learning model and the feature importance analysis described in this invention; Figure 4 This is an example diagram showing the results of the regional feature importance analysis described in this invention; Figure 5 This is a schematic diagram illustrating the principle of the global gridded correction application described in this invention. Figure 6 This is a schematic diagram illustrating the principle of the second-stage quantile mapping method described in this invention; Figure 7 This is a time series comparison chart of the original satellite data, corrected data and ground-measured data of a certain site during a certain period, as described in this invention. Figure 8 This is a comparison chart showing the differences in the spatial distribution mean (multi-year average) of satellite GHI data before and after correction as described in this invention; Figure 9 This is a comparison of the spatial distribution of the original climate model prediction for the future period (2050) and the prediction results corrected by the method of this invention. In the figure, (a) is the spatial distribution of the original climate model prediction results for the future period (2050), and (b) is the spatial distribution of the prediction results corrected by the method of this invention. Figure 10 This is a schematic diagram of the modular architecture of the system described in this invention. Detailed Implementation

[0029] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. It should be noted that, unless otherwise specified, the embodiments and features in the embodiments of the present invention can be combined with each other, and the described embodiments are only some embodiments of the present invention, not all embodiments.

[0030] Implementation Method 1, see [link] Figure 1This implementation method addresses the problem that the accuracy of future prediction correction is limited by the quality of historical reference data, leading to an error cycle. It proposes a method for ground verification and future prediction correction of solar irradiance satellite data, comprising the following two-stage sequential process: In the first stage, a physical-guided machine learning hybrid correction method is used to process the satellite historical grid dataset based on historical data from a limited number of ground meteorological stations to obtain paired time series of ground ground ground data and satellite grid data. Construct a comprehensive feature vector with enhanced physical information, including solar geometric features, temporal features, satellite-to-ground ratio features, moving average features, lag features, and periodic features; Interpretable machine learning model training and feature importance analysis are performed based on comprehensive feature vectors; The trained model is applied to global grid cells to generate a corrected historical benchmark dataset. The second stage involves correcting the quantile mapping bias of future climate prediction data based on the historical benchmark dataset, including the following steps: By benchmarking historical datasets with historical simulation data from climate models, a quantile mapping function is constructed. The mapping function is applied to correct the future prediction data, resulting in a bias-corrected future prediction dataset.

[0031] The comprehensive feature vector enhanced by the physical information includes the cosine value of the solar zenith angle, cos( θ SZA It is calculated using astronomical formulas:

[0032] in, The latitude of the station; δ Solar declination; ω It is the solar hour angle.

[0033] Furthermore, the satellite-to-ground ratio feature described in this embodiment includes: instantaneous ratio. Ratio to long-term average The instantaneous ratio is:

[0034] The long-term average ratio is:

[0035] in, for t Solar irradiance values ​​retrieved from satellites at any given time. for tThe solar irradiance value measured by the ground weather station at any given time. T The total length of the time series. It is a local minimum.

[0036] Furthermore, the moving average characteristics described in this embodiment include:

[0037] in, for t Moment α 10-step moving average, for( ti The satellite irradiance value at time ) The order of the moving average.

[0038] Furthermore, the periodicity feature described in this embodiment is implemented through cyclic encoding, including hourly cyclic encoding and monthly cyclic encoding; The hourly cyclic encoding is as follows:

[0039]

[0040] in, The sine code value for the hour. The cosine code value for the hour. for t The number of hours in time; The month cyclic code is:

[0041]

[0042] in, The sine code value for the month. The cosine code value for the month. for t The month of the moment.

[0043] Furthermore, the interpretable machine learning model described in this embodiment is a gradient boosting decision tree model. During training, a time series cross-validation strategy is adopted, and feature importance analysis tools are used to quantify the contribution weights of solar geometric features, ratio features, and periodic features to the correction results.

[0044] Furthermore, the construction of the quantile mapping function described in this embodiment includes: For each grid point and hourly data, calculate the empirical cumulative distribution function of the historical baseline dataset and the historical simulation data of the model; In the preset quantile set q =[q 1, q 2,…, q n On the [top], a mapping relationship is established through piecewise linear or monotonic spline interpolation.

[0045] Furthermore, the application of the mapping function to correct future prediction data as described in this embodiment includes:

[0046] in, This is the revised predicted value of future irradiance. F i,j,h For mapping functions, x fut These are the original simulated values ​​for the data points.

[0047] Implementation Method 2, see below Figures 2 to 3 This embodiment describes a complete implementation process for the method of ground verification and future prediction correction of solar irradiance satellite data described in Embodiment 1, including: This implementation adopts a two-stage sequential process: the first stage generates a high-precision historical benchmark through physical guided machine learning; the second stage performs quantile mapping correction on future predictions based on this benchmark.

[0048] Phase 1: Physics-guided machine learning hybrid correction – generating high-precision historical benchmark mesh This stage is the core of this implementation method. Its core idea is to integrate known deterministic physical laws as strong prior knowledge into the feature space and training framework of a data-driven machine learning model, constructing a "physical information augmentation" machine learning model. This model not only achieves high prediction accuracy in a black-box manner but also reflects the role of physical laws through its own interpretive tools, thereby achieving gray-box or even white-box correction.

[0049] The specific technical solution steps are as follows: S101: Collaborative preparation and refined preprocessing of multi-source data.

[0050] Ground truth data G ground Collect hourly or higher temporal resolution data from all available ground-based solar radiation observatories within the target area that have undergone quality control by an authoritative institution. Perform secondary quality control on the data, including: removing anomalous radiation values ​​that should theoretically be zero or negative at night; checking and marking abnormally high values ​​that may be caused by instrument malfunction or snow reflection according to the physical upper limit (atmospheric top solar constant combined with atmospheric transmittance); and ensuring that the timestamps are accurate and uniformly converted to UTC.

[0051] Satellite grid data S sat This involves acquiring satellite-derived solar irradiance products that completely overlap with ground station observation periods. Preferred options include reanalysis products that have undergone long-term calibration and have mature algorithms, or climate data records. Depending on application requirements, total solar irradiance (GHI) may also be processed simultaneously. sat Direct normal irradiance (DNI) sat Horizontal irradiance (DHI) sat Multiple radiation components, etc.

[0052] Spatiotemporal matching: The geographic coordinates of each ground station are matched to the satellite data grid, and the satellite data of the cell where the station is located (or the representative value obtained by bilinear interpolation) is extracted over the entire time series to form paired time series datasets {( G ground,t , S sat,t )},in t For time indexing.

[0053] S102: Construct a comprehensive feature engineering system to enhance physical information.

[0054] For each time point t Construct feature vectors containing multiple types of information X t ,include: 1. Geometric characteristics of the Sun: The geometric characteristics of the Sun are physical quantities that directly describe the relative position of the Sun and the Earth, and are the most fundamental factors affecting the amount of solar radiation received by the Earth's surface. These include: the solar zenith angle. θ SZA,t The angle between the sun's rays and the local zenith direction; the cosine of the solar zenith angle cos( θ SZA ); Sun azimuth angle SA,t The direction of the sun's projection onto the horizontal plane. Wherein, the cosine of the sun's zenith angle is cos(...). θ SZA The most important factor is that it directly determines: ① the solar irradiance received per unit area of ​​the Earth's surface at the top of the atmosphere is proportional to cos( θ SZA ); ② The path length of sunlight passing through the atmosphere is inversely proportional to cos( θ SZA ).

[0055] These features provide the most fundamental physical constraints for the model. Satellite inversion algorithms exhibit different error characteristics at different solar altitudes; explicitly inputting this feature into the model allows it to learn how the error varies with the sun's geometric position.

[0056] Calculate the cosine of the solar zenith angle using astronomical formulas (such as the SPA algorithm). θ SZA ):

[0057] in, The latitude of the station; δ Solar declination; ω It is the solar hour angle.

[0058] 2. Temporal characteristics: Solar irradiance exhibits distinct diurnal and annual cycles. Irradiance is significantly higher during the day than at night, and summer irradiance is typically higher than winter irradiance. Temporal features are fundamental information describing the temporal location of data, used to capture the diurnal, seasonal, and long-term trends of solar irradiance. These features provide the model with temporal contextual information, enabling it to understand the patterns of radiation variation across different time scales.

[0059] For time points t Hourly characteristics H t ∈{0,1,…,23}, representing a specific hour of the day; month characteristics M t ∈{1,2,…,12} represents the months of the year; annual day D t ∈{1,2,…,365 / 366}, representing the day of the year; seasonal characteristics: Q t The seasons ∈{1,2,3,4} are divided according to meteorological definitions: spring (March–May) is 1, summer (June–August) is 2, autumn (September–November) is 3, and winter (December–February) is 4.

[0060] 3. Satellite-to-Ground Ratio Characteristics: The relationship between feature-quantified satellite observations and ground-based measurements reflects the systematic bias patterns of satellite data.

[0061] Real-time ratio:

[0062] Long-term average ratio:

[0063] in, This is a minimum value to prevent division by zero.

[0064] This ratio directly reflects the degree of deviation between satellite data and the true ground value. R t When ≈1, satellite observations and ground measurements are basically consistent; whenR t When >1, satellite data may be overestimated; when R t When the ratio is less than 1, satellite data may be underestimated. (Long-term average ratio) It provides prior information on the overall deviation level of the region, which helps the model to quickly establish a calibration benchmark.

[0065] 4. Moving average characteristics: Cloud cover variations and atmospheric turbulence can introduce high-frequency noise into satellite observations. The moving average process is equivalent to a low-pass filter, preserving the persistent information of the weather system while removing random disturbances, enabling the model to make predictions based on more stable signals.

[0066] For time window width α At the point in time t The moving average is:

[0067] Common window settings: 3-hour moving average ( α =3), capturing rapid changes in weather systems; 6-hour moving average ( α =6), reflecting a trend on a half-day timescale; 12-hour moving average ( α =12), depicting the background field after the diurnal variation is smoothed.

[0068] 5. Lag characteristic: Solar irradiance variation is temporally continuous and can be divided into three parts: cloud systems, atmospheric conditions, and diurnal variation. The continuity of cloud systems: the movement and development of cloud clusters are temporally correlated; the cloud conditions of the previous moment affect the radiation at the current moment. The inertia of atmospheric conditions: changes in atmospheric components such as aerosols and water vapor are relatively slow. The regularity of diurnal variation: the diurnal variation pattern of solar irradiance is continuous.

[0069] The lag feature incorporates satellite observations from historical moments to capture the persistence and memory effect of weather systems, enabling the model to learn the autocorrelation structure of time series and improve its ability to predict rapid weather changes.

[0070] No. k hysteresis characteristics:

[0071] in k =0.5,1,24 represent lags of 0.5 hours, 1 hour, and 24 hours, respectively.

[0072] 6. Periodicity: Traditional time encoding (such as directly encoding hours as 0-23) suffers from discontinuous boundaries: 23 o'clock and 0 o'clock differ by 23 numerically, but in reality, they only differ by 1 hour. Periodic features, by cyclically encoding time variables, transform linear time information into positional information in a cyclic space, enabling the model to correctly understand periodic boundaries.

[0073] Cyclic coding maps time onto the unit circle using trigonometric functions. H sin , H cos The units of time (hours) are represented by points on a two-dimensional plane, symbolizing moments in a day. Adjacent hours are adjacent on the unit circle, maintaining temporal continuity. 23:00 and 00:00 are close on the circle, accurately reflecting the cyclical nature of time. This encoding allows the model to better learn the continuous patterns of daily and monthly cycles, particularly regarding the hour (hour). H ) and month ( M Perform sine and cosine transformations: Hourly Cyclic Encoding:

[0074]

[0075] Monthly cyclic encoding:

[0076]

[0077] Dominant variables: Yearly Days (DOY), Season codes.

[0078] Combine all the above numerical features into a vector. X t :

[0079] To make model training more stable and efficient, Z-score standardization is performed:

[0080] in, X t,i For the first i The original values ​​of each feature, μ i For the training set i The mean of each feature; σ i For the training set i The standard deviation of each feature.

[0081] The standardized features have zero mean and unit variance, which is beneficial for the convergence of the gradient descent algorithm and the stability of the model performance.

[0082] S103: Explains the training, optimization, and physical insight extraction of machine learning models.

[0083] Model selection: Gradient boosting decision tree-based models, such as XGBoost or LightGBM, are preferred. These models perform well in handling tabular data, non-linear relationships, and feature interactions, and are highly efficient in training. More importantly, they naturally possess the ability to evaluate feature importance.

[0084] Training strategy: Time-series cross-validation is employed. Data ordered chronologically is divided into consecutive folds. For example, data from 1980-2000 is used for training, data from 2001-2005 is used for validation (for early stopping and parameter tuning), and data from 2006-2010 is used as an independent test set to evaluate the final generalization ability. This avoids contamination of training results by future information leaks, making the evaluation results more reliable.

[0085] After model training, feature importance analysis is performed. The importance ranking is obtained by calculating the contribution of each feature to the model's decision-making (e.g., Gini importance or SHAP value). The analysis results can clarify the importance of various features (e.g., cos(...)). θ SZA ), R t The relative contribution of periodic characteristics, etc., to the correction results is revealed, thereby showing the main physical driving factors of satellite errors in the region and improving the interpretability of the model.

[0086] For example, in high-altitude regions where clear skies dominate, cos( θ SZA )and R t Typically, this is the most critical factor; however, in frequently cloudy coastal areas, the importance of lag characteristics reflecting cloud persistence may increase significantly. This analysis makes the data-driven correction process transparent, interpretable, and with clear physical meaning, thus greatly improving the scientific credibility of the results. At the same time, the analysis conclusions can provide feedback and guide the improvement of satellite inversion algorithms; for example, if errors are found in a certain area related to the solar zenith angle... θ SZA A strong correlation suggests that there may be room for optimization in the atmospheric path radiation term of the inversion algorithm.

[0087] S104: Automated batch execution of global mesh correction.

[0088] The final model is obtained through training. M After being processed by the feature normalizer (Scaler), it is "solidified" into a correction operator. For each grid cell within the target region ( i,j Read the original satellite data time series of this pixel. Based on the center coordinates of the pixel, the solar geometric features at each time point are recalculated. ;based on Calculate the corresponding moving average, lag, and other features; generate time and periodic features by combining location information; standardize all features using a scaler; and input the standardized features into the model. M The corrected irradiance sequence was obtained. Traverse all pixels to generate a high-precision historical correction grid dataset. .

[0089] Phase Two: Quantile Mapping for Future Prediction Correction Based on High-Quality Benchmarks This phase uses the output from the first phase. As a benchmark for observation.

[0090] S201: Benchmark matching and mapping function construction.

[0091] CMIP6 (the Sixth Coupled Model Intercomparison Project) is a global climate model comparison project initiated by the World Climate Research Programme (WCRP), which selects historical simulation data from CMIP6. Its spatiotemporal scope is similar to the benchmark dataset generated in the first phase. Consistent.

[0092] Will and Resampled to the same spatiotemporal resolution (temporal resolution of hourly and spatial resolution of 1°).

[0093] For each grid point, calculate the observation baseline for each hour. and mode simulation The empirical cumulative distribution function (CDF).

[0094] In the selected set of quantiles q =[ q 1, q 2,…, q n Record the values ​​corresponding to the two CDF values ​​on (1%, 5%, ..., 95%, 99%): observed quantiles. Q obs ( q ) and simulated quantiles Q mod ( q ).

[0095] by Q mod ( q ) is the independent variable.Q obs ( q Using as the dependent variable, piecewise linear or monotonic spline interpolation is used to construct each grid. Mapping function corresponding to hours F i,j,h ( x Its physical meaning is: to convert the model simulation values x Corrected to the observation reference scale.

[0096] S202: Correction for biases in future forecast data, including: Obtain prediction data from the same climate model used in the CMIP6 program under specific future scenarios (such as SSP2-4.5 and SSP5-8.5). M future .

[0097] for M future Each data point in the data is determined by its spatial location. i,j ) and timestamp ( h Find the corresponding mapping function. F i,j,h .

[0098] The original simulated value of this data point x fut enter F i,j,h The value after deviation correction is obtained:

[0099] Iterate through all future data points, make corrections, and generate the final future prediction dataset. G future .

[0100] Implementation Method 3, see below Figures 4 to 10 This embodiment describes a specific implementation of a method for ground verification and future prediction correction of solar irradiance satellite data proposed in Embodiment 2. The specific application example is a large photovoltaic base planning area (approximately 200,000 square kilometers) in southwestern China, including: We obtained a high-precision, 1-kilometer resolution hourly average GHI historical dataset for the region from 2000 to 2020. Based on this, we corrected the bias of the CMIP6 multi-model ensemble average GHI change from 2021 to 2100 under the SSP2-4.5 scenario, providing decision support for the province's large-scale energy planning.

[0101] S101. Data Preparation: Ground data: Hourly GHI observation data from 2000 to 2020 were collected from 12 national-level radiation stations in and around the province. Significant outliers were removed, and approximately 95% of the valid data were retained. The stations are distributed across diverse terrains including valleys and plateaus.

[0102] Satellite data: Reanalysis products from the National Solar Radiation Database (NSRDB) of the U.S. Department of Energy were selected, with a temporal resolution of 1 hour and a spatial resolution of 0.1°. Bilinear interpolation was used to resample the data onto an Albers equal-area projection grid with a 1-kilometer resolution to match the high-precision terrain data.

[0103] Matching: Find the corresponding cell of each ground station on a 1-kilometer grid, extract its time series, and form 12 sets of paired data.

[0104] S102, Feature Engineering Construction: For each hour of paired data, a feature vector containing the following 17 features is constructed: Solar geometric characteristics: cosine of solar zenith angle ;satellite Ground ratio characteristics: instantaneous ratio Long-term average ratio Moving average characteristics: 3-hour moving average 6-hour moving average 12-hour moving average Lag characteristic: satellite values ​​lag by 0.5 hours. Satellite values ​​delayed by 1 hour Satellite values ​​delayed by 24 hours Time characteristic: hour ,month Years accumulated ,season Periodicity: Hourly sine wave cosine of hour Sine of the month cosine of the month All features are Z-score standardized before training.

[0105] S103, Model Training and Interpretation: Data from 12 stations spanning 2000-2015 were merged and sorted chronologically. Time-series cross-validation was employed: training from 2000-2010, validation from 2011-2013 (for early cessation), and the first test block from 2014-2015. Then, a rolling time window was used: training from 2003-2013, validation from 2014-2016, testing from 2017-2018, and so on. The final model was trained using all data from 2000-2018, with 2019-2020 used as a rigorous final independent test set.

[0106] The LightGBM model was selected and trained using time-series cross-validation. Test set results showed a root mean square error (RMSE) of 21.5 W / m. 2 The mean absolute error (MAE) is 15.8 W / m. 2 Coefficient of determination R 2 =0.963. Compared to the original satellite data (RMSE=48.2W / m 2 R 2 =0.882), the correction effect is significant.

[0107] like Figure 4 As shown in the generated feature importance bar chart, the top three features are: the cosine value of the solar zenith angle. Importance percentage: 0.32; Current satellite-to-ground instantaneous ratio Importance percentage: 0.28; Monthly sine code The importance percentage is 0.16. This result has significant physical implications: it confirms that in this region, the solar elevation angle is the primary factor affecting satellite inversion accuracy (possibly due to the clean atmosphere at high altitudes and the lower solar zenith angle of path radiation). θ SZA (Significant dependency); the raw satellite signal itself carries a large amount of information; seasonal cycles are the third largest contributor to error. And hysteresis characteristics... The low importance (<0.03) indicates that the weather system in this region changes rapidly and lacks persistence. As a core appendix, this report provides users with a transparent, interpretable, and traceable system for evaluating model performance and physical consistency. Its goal is to elucidate the scientific mechanisms and sources of uncertainty behind the irradiance data, enabling users not only to understand the data's accuracy but also its physical causes, applicable scenarios, and potential limitations.

[0108] S104, Global Mesh Correction: Generate hourly calibration dataset G hist Then aggregated into monthly average products, Figure 7 and Figure 8 It demonstrates the significant improvement in spatial distribution compared to ground stations before and after calibration.

[0109] Step 2: Future forecast revision based on quantile mapping (2021-2100 estimates) S201, Benchmark Matching and Function Construction: Five high-performing models (CanESM5, MIROC6, MRI-ESM2-0, IPSL-CM6A-LR, and UKESM1-0-LL) were selected from CMIP6, and their hourly GHI multi-model ensemble average from historical simulations (2000-2014) was calculated. .

[0110] Hourly historical benchmark dataset from 2000 to 2014 G hist and hourly GHI multi-mode ensemble average All are interpolated to a common 1°×1° grid to accommodate the coarse resolution of the mode.

[0111] For each hour of each 1° grid, the CDF of observations and simulations are calculated separately. A piecewise linear mapping function is established at the quantiles [0.01, 0.05, 0.1, 0.2, ..., 0.8, 0.9, 0.95, 0.99]. F i,j,h .

[0112] S202, Future Data Correction: Obtain hourly GHI forecast data for the above five models under the SSP2-4.5 scenario from 2021 to 2100, and calculate their multi-model ensemble average. M future .

[0113] for M future Find the corresponding mapping function for the hourly and grid-specific data values. F i,j,h Perform the conversion.

[0114] Obtain the corrected future prediction dataset G future .

[0115] like Figure 9As shown, the original model projections and the revised projections for the 2025-2054 averages are compared. The original model underestimates the region (possibly due to simulation biases in aerosols or cloud cover). After quantile mapping correction, the projected radiation values ​​are closer to the currently observed climatological distribution. More importantly, the correction process preserves the model's predicted future relative changes (such as possible darkening or brightening in some areas), only calibrating the baseline of its absolute values, providing a more reliable basis for assessing future changes in photovoltaic power generation potential.

[0116] Based on the above method, this embodiment also developed an "Integrated Solar Irradiance Correction and Prediction System" (SI-CPSV1.0). This system adopts a B / S architecture, such as... Figure 10 As shown, the backend uses the Python Django framework, and the core algorithms are implemented using the NumPy, Pandas, Scikit-learn, XGBoost, and xarray libraries. The system includes: User interaction layer: Provides a web interface where users can upload data or select preset datasets, and configure regions, time periods, model parameters, output formats, etc. through forms.

[0117] Process scheduling layer: Based on Celery's task queue, it parses user-submitted jobs into a directed acyclic graph and schedules the execution of each module sequentially.

[0118] Core computing layer: Includes independent microservices such as data I / O, feature engineering, model training, grid correction, and quantile mapping, each of which can be horizontally scaled.

[0119] Data Resource Layer: Manages ground station databases, satellite data caches, pattern data warehouses, and generated product libraries.

[0120] Monitoring and logging layer: Records detailed running status, resource consumption and intermediate results of each job, which facilitates troubleshooting and process reproduction.

[0121] After a user submits a combined task of "historical correction + future correction" through a browser, the system will automatically execute the entire process in the background. Once completed, the system will notify the user via email and provide a results visualization page and a data download link.

[0122] Those skilled in the art will understand that embodiments of this disclosure can be provided as methods, systems, or computer program products. Therefore, this disclosure can take the form of a completely hardware embodiment, a completely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, this disclosure can take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) containing computer-usable program code.

[0123] This disclosure is described with reference to flowchart illustrations and / or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of this disclosure. It will be understood that each block of the flowchart illustrations and / or block diagrams, and combinations of blocks in the flowchart illustrations and / or block diagrams, can be implemented by computer program instructions. These computer program instructions can be provided to a processor of a general-purpose computer, special-purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create a machine for implementing the flowchart illustrations and / or block diagrams. Figure 1 One or more processes and / or boxes Figure 1 The computer program instructions may also be stored in a computer-readable storage medium that can direct a computer or other programmable data processing device to operate in a particular manner, such that the instructions stored in the computer-readable storage medium produce an article of manufacture including instruction means, which are implemented in a process Figure 1 One or more processes and / or boxes Figure 1 The function specified in one or more boxes.

[0124] These computer program instructions may also be loaded onto a computer or other programmable data processing equipment to cause a series of operational steps to be performed on the computer or other programmable equipment to produce a computer-implemented process, thereby providing instructions that execute on the computer or other programmable equipment for implementing the process. Figure 1 One or more processes and / or boxes Figure 1 The steps of the function specified in one or more boxes.

[0125] Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of this disclosure and not to limit its protection scope. Although this disclosure has been described in detail with reference to the above embodiments, those skilled in the art should understand that after reading this disclosure, they can still make various changes, modifications or equivalent substitutions to the specific implementation of the invention, but these changes, modifications or equivalent substitutions are all within the protection scope of the published pending claims.

Claims

1. A method for ground verification and future prediction correction of solar irradiance satellite data, characterized in that, The method comprises the following two-stage sequential process: The first phase utilizes a physical-guided machine learning hybrid correction method, based on historical data from a limited number of ground meteorological stations, to process the satellite historical grid dataset, including the following steps: Obtain paired time series of ground truth data and satellite grid data; Construct a comprehensive feature vector with enhanced physical information, including solar geometric features, temporal features, satellite-to-ground ratio features, moving average features, lag features, and periodic features; Interpretable machine learning model training and feature importance analysis are performed based on comprehensive feature vectors; The trained model is applied to global grid cells to generate a corrected historical benchmark dataset. The second stage involves correcting the quantile mapping bias of future climate prediction data based on the historical benchmark dataset, including the following steps: By benchmarking historical datasets with historical simulation data from climate models, a quantile mapping function is constructed. The mapping function is applied to correct the future prediction data, resulting in a bias-corrected future prediction dataset.

2. The method for ground verification and future prediction correction of solar irradiance satellite data according to claim 1, characterized in that, The comprehensive feature vector enhanced by the physical information includes the cosine value of the solar zenith angle, cos( θ SZA It is calculated using astronomical formulas: in, The latitude of the station; δ Solar declination; ω It is the solar hour angle.

3. The method for ground verification and future prediction correction of solar irradiance satellite data according to claim 1, characterized in that, The satellite-to-ground ratio characteristics include: instantaneous ratio Ratio to long-term average The instantaneous ratio is: The long-term average ratio is: in, for t Solar irradiance values ​​retrieved from satellites at any given time. for t The solar irradiance value measured by the ground weather station at each time point, where T is the total length of the time series. It is a local minimum.

4. The method for ground verification and future prediction correction of solar irradiance satellite data according to claim 1, characterized in that, The moving average features include: in, for t Moment α 10-step moving average, for( ti The satellite irradiance value at time ) The order of the moving average.

5. The method for ground verification and future prediction correction of solar irradiance satellite data according to claim 1, characterized in that, The periodic feature is achieved through cyclic encoding, including hourly cyclic encoding and monthly cyclic encoding; The hourly cyclic encoding is as follows: in, The sine code value for the hour. The cosine code value for the hour. for t The number of hours in time; The month cyclic code is as follows: in, The sine code value for the month. The cosine code value for the month. for t The month of the moment.

6. The method for ground verification and future prediction correction of solar irradiance satellite data according to claim 1, characterized in that, The interpretable machine learning model is a gradient boosting decision tree model. During training, a time series cross-validation strategy is used, and feature importance analysis tools are used to quantify the contribution weights of solar geometric features, ratio features, and periodic features to the correction results.

7. The method for ground verification and future prediction correction of solar irradiance satellite data according to claim 1, characterized in that, The construction of the quantile mapping function includes: For each grid point and hourly data, calculate the empirical cumulative distribution function of the historical baseline dataset and the historical simulation data of the model; In the preset quantile set q =[ q 1, q 2,…, q n On the [top], a mapping relationship is established through piecewise linear or monotonic spline interpolation.

8. The method for ground verification and future prediction correction of solar irradiance satellite data according to claim 1, characterized in that, The application of the mapping function to correct future prediction data includes: in, This is the revised predicted value of future irradiance. F i,j,h For mapping functions, x fut These are the original simulated values ​​for the data points.

9. A computer device, characterized in that: It includes a memory and a processor, wherein the memory stores a computer program, and when the processor runs the computer program stored in the memory, the processor executes a method for ground verification and future prediction correction of solar irradiance satellite data according to any one of claims 1-8.

10. A computer-readable storage medium, characterized in that, The computer-readable storage medium stores a computer program that, when executed by a processor, performs the steps of a method for ground verification and future prediction correction of solar irradiance satellite data as described in any one of claims 1-8.