Cross-site photovoltaic power prediction model optimization method and system based on knowledge and data double-driven AI

What is AI technical title?
AI technical title is built by Patsnap AI team. It summarizes the technical point description of the patent document.
By combining physical mechanisms and cross-site historical optimization data with a dual-drive AI approach, the learning rate and hyperparameters are dynamically optimized, solving the problems of training instability and low efficiency in cross-site photovoltaic power generation prediction, and achieving more efficient model transfer and optimization.

CN121920863BActive Publication Date: 2026-06-19UESTC (SHENZHEN) ADVANCED RES INST +1

View PDF 2 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Patents(China)
Current Assignee / Owner: UESTC (SHENZHEN) ADVANCED RES INST
Filing Date: 2026-03-25
Publication Date: 2026-06-19

Application Information

Patent Timeline

25 Mar 2026

Application

19 Jun 2026

Publication

CN121920863B

IPC: G06Q10/0637; H02J3/00; H02J3/38; G06Q50/06; G06N7/01; H02J103/50; H02J103/30; H02J101/24

AI Tagging

Application Domain

Mathematical models Data processing applications

Technical Efficacy Phrases

Improve training efficiencyImprove predictive performance

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Child reading ability improvement training method and system based on multi-dimensional evaluation of brain basic cognitive function
CN122392827AIncrease enthusiasm for participationincrease initiative
Training data acquisition apparatus for robot gripper control
CN122299618Ato eliminateavoid fit problemsPattern recognition Robotics
A Multi-View Small Sample Android Malware Classification Method Based on Optimal Bootstrap Matching
CN121744010BImprove classification accuracymake up for the lack ofPattern recognition View based
A Depression Detection Method Based on Instruction Fine-tuning Multimodal Speech-Language Model
CN122090880AOvercoming underutilizationfully excavatedSpeech recognition Speech input Speech sound
A Deep Learning Inversion Method for Passive Microwave Cloud Surface Temperature Constrained by Physical Mechanism
CN120633363BHigh precisionBe physically explainableDesign optimisation/simulation Constraint-based CAD Microwave Physical model

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

Smart Images

Figure CN121920863B_ABST

Patent Text Reader

Abstract

This invention discloses a method and system for optimizing a cross-site photovoltaic power generation prediction model based on knowledge- and data-driven AI. The method includes: determining a physical and data-fused photovoltaic power prediction model based on source site data and target site data; performing hyperparameter experiments on at least one source site using the source site data and the photovoltaic power prediction model to obtain historical optimization process data; extracting meta-features from the target site data and retrieving similar source sites from the historical optimization process data based on these meta-features; predicting the optimal peak learning rate and stable critical learning rate for the target site based on the learning rate process knowledge extracted from the initial hyperparameter combination set; training and dynamically optimizing the learning rate on the target site using the optimal peak learning rate and stable critical learning rate; and employing Bayesian optimization search based on the optimized learning rate, cost-sensitive objective function, and target site data. This invention improves cross-site training efficiency and prediction performance.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of photovoltaic power generation prediction and intelligent optimization technology, and in particular to a method and system for optimizing cross-site photovoltaic power generation prediction models based on knowledge and data-driven AI. Background Technology

[0002] Photovoltaic power generation is significantly affected by irradiance, temperature, weather changes, and equipment differences. Differences exist between different sites due to climate zones, geographical conditions, module aging, and shading, leading to slow convergence, decreased accuracy, and unstable training when the same model is transferred across sites. Existing deep time-series models (such as GRU) are highly sensitive to hyperparameters such as learning rate, network size, and input window length. When the sample size at the target site is limited, tuning hyperparameters from scratch using grid search or random search typically requires numerous experiments, resulting in high training time costs. Furthermore, the existing source sites have generated a large amount of historical "hyperparameter-performance-time" data and training process trajectories during hyperparameter tuning, but existing methods struggle to systematically transfer this process knowledge to new sites, leading to a waste of optimization experience.

[0003] To address the aforementioned issues, there is an urgent need for a hyperparameter transfer optimization scheme that can integrate prior physical mechanisms with historical optimization process data from different sites, thereby improving cross-site training efficiency and prediction performance. Summary of the Invention

[0004] The purpose of this invention is to provide a method and system for optimizing cross-site photovoltaic power generation prediction models based on knowledge and data-driven AI, so as to solve the above-mentioned technical problems.

[0005] The preferred technical solutions among the many technical solutions provided by this invention can produce a variety of technical effects, which are described in detail below.

[0006] To achieve the above objectives, the present invention provides the following technical solution:

[0007] This invention provides a method for optimizing a cross-site photovoltaic power generation prediction model based on knowledge and data-driven AI, comprising the following steps:

[0008] Based on source site data and target site data, a photovoltaic power prediction model integrating physical and data analysis is determined.

[0009] Based on the source site data and the photovoltaic power prediction model, perform hyperparameter experiments at at least one source site to obtain historical optimization process data;

[0010] Extract the meta-features of the target site data, and retrieve similar source sites from the historical optimization process data based on the meta-features to generate an initial hyperparameter combination set for the target site;

[0011] Learn rate process knowledge is extracted from the historical optimization process data corresponding to the initial hyperparameter combination set. The optimal peak learning rate and stable critical learning rate of the target site are predicted based on the learn rate process knowledge. The learning rate is then dynamically optimized at the target site based on the optimal peak learning rate and the stable critical learning rate.

[0012] Based on the optimized learning rate, cost-sensitive objective function, and target site data, a Bayesian optimization search is used to obtain the optimal hyperparameters and corresponding photovoltaic power prediction model for the target site.

[0013] In one or more embodiments, the physical and data fusion photovoltaic power prediction model includes a physical prediction model and a data correction term corresponding to the physical prediction model;

[0014] The physical prediction model is determined based on irradiance, component temperature, equivalent area, efficiency parameters, and temperature coefficient. The data correction term is determined by the magnitude of the correction term output by the GRU model, and the magnitude of the correction term is constrained by the magnitude control parameter.

[0015] In one or more embodiments, the historical optimization process data includes hyperparameters, validation evaluation metrics, and training time cost, wherein the hyperparameters include learning rate-related fields, and the learning rate-related fields include the peak learning rate.

[0016] In one or more embodiments, generating an initial set of hyperparameter combinations for the target site includes:

[0017] The hyperparameters corresponding to the top K hyperparameter experiments with the best verification evaluation index are selected from the historical optimization process data corresponding to the similar source sites, and the selected hyperparameters are summarized as the initial hyperparameter combination set of the target site.

[0018] In one or more embodiments, extracting learning rate process knowledge from historical optimization process data corresponding to the initial hyperparameter combination set includes:

[0019] Extract the learning rate-related fields, training state labels, and validation evaluation metrics from the historical optimization process data corresponding to the initial hyperparameter combination set. Based on the training state labels and validation evaluation metrics, divide the hyperparameter experiments into a stable experiment set and an unstable experiment set.

[0020] For the source site corresponding to the initial set of hyperparameter combinations, the experiment with the optimal validation evaluation index is selected from the hyperparameter experiments whose training state label is successful, and the peak learning rate corresponding to the experiment is determined as its optimal peak learning rate; for the hyperparameter experiments whose state label is divergent or invalid, the corresponding peak learning rate is extracted, and the upper bound of its stable critical learning rate is estimated based on the peak learning rate using the minimum value or a preset quantile; the maximum peak learning rate that can maintain stable convergence is extracted from the stable experiment set as the lower bound of its stable critical learning rate.

[0021] In one or more embodiments, when both the stable test set and the unstable test set of the source site corresponding to the initial hyperparameter combination set are not empty, the stable critical learning rate of the source site corresponding to the initial hyperparameter combination set is determined based on the maximum value of the peak learning rate in the stable test set and the minimum value of the peak learning rate in the unstable test set.

[0022] In one or more embodiments, the steps of predicting the optimal peak learning rate and stable critical learning rate of the target site based on the knowledge of the learning rate process, and training a dynamically optimized learning rate on the target site based on the optimal peak learning rate and the stable critical learning rate include:

[0023] Predict the optimal peak learning rate and stable critical learning rate of the target site based on the target site's meta-features and model size characteristics.

[0024] Based on the similarity weights between the similar source sites and the target site, the optimal peak learning rate and the stable critical learning rate of the learning rate process knowledge of the similar source sites are weighted to obtain the prior optimal peak learning rate and the prior stable critical learning rate.

[0025] The predicted optimal peak learning rate is fused with the prior optimal peak learning rate prediction, and the predicted stable critical learning rate is fused with the prior stable critical learning rate. After fusion, dynamic learning rate training is performed separately.

[0026] In one or more embodiments, the cost-sensitive objective function includes training time cost, which is the time spent under a fixed training budget, or the standardized time spent per round multiplied by a preset reference round.

[0027] In one or more embodiments, the Bayesian optimization search employs a multi-fidelity strategy, the steps of which include:

[0028] First, candidate hyperparameters are selected using a first training budget. Then, the selected candidate hyperparameters are trained using a second training budget and evaluated to obtain the final cost-sensitive objective function value and training time cost. The first training budget is less than the second training budget.

[0029] According to another aspect of the present invention, a cross-site photovoltaic power generation prediction model optimization system based on knowledge and data dual-driven AI is also provided, which is used to implement the cross-site photovoltaic power generation prediction model optimization method based on knowledge and data dual-driven AI described above, including: a data preprocessing module, a physical and data fusion modeling module, a cross-site historical optimization library module, a site similarity retrieval and warm-start module, a learning rate process knowledge extraction and prediction module, a cost-sensitive Bayesian optimization module, and an output module;

[0030] The data preprocessing module, the physical and data fusion modeling module, the cross-site historical optimization library module, the learning rate process knowledge extraction and prediction module, the cost-sensitive Bayesian optimization module, and the output module are connected in sequence. The data preprocessing module, the site similarity retrieval and warm-start module, and the cost-sensitive Bayesian optimization module are connected in sequence. The cross-site historical optimization library module is connected to the site similarity retrieval and warm-start module.

[0031] Implementing one of the above-described technical solutions of the present invention has the following advantages or beneficial effects:

[0032] This invention utilizes source site data to accumulate cross-site transferable hyperparameter optimization knowledge, training stability knowledge, and time cost knowledge. It then combines this knowledge with target site data to complete site similarity calibration, transfer constraints, and local fine-tuning. This avoids the negative transfer risk caused by direct transfer and improves cross-site training efficiency and prediction performance. Attached Figure Description

[0033] To more clearly illustrate the technical solutions of the embodiments of the present invention, the accompanying drawings used in the description of the embodiments will be briefly introduced below. Obviously, the accompanying drawings described below are only some embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort. In the drawings:

[0034] Figure 1 This is a flowchart of an embodiment of the present invention of a cross-site photovoltaic power generation prediction model optimization method based on knowledge and data-driven AI;

[0035] Figure 2 This is a schematic diagram of the framework of a cross-site photovoltaic power generation prediction model optimization system based on knowledge and data-driven AI according to an embodiment of the present invention. Detailed Implementation

[0036] To make the objectives, technical solutions, and advantages of the present invention clearer, various exemplary embodiments described below will be referenced to the accompanying drawings, which form part of the exemplary embodiments, illustrating various exemplary embodiments that may be used to implement the present invention. Unless otherwise indicated, the same numbers in different drawings represent the same or similar elements. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with this disclosure. It should be understood that they are merely examples of processes, methods, and apparatuses consistent with some aspects of the present invention disclosed as detailed in the appended claims, and other embodiments may be used, or structural and functional modifications may be made to the embodiments listed herein without departing from the scope and spirit of the present invention.

[0037] In the description of this invention, it should be understood that the terms "center," "longitudinal," "lateral," etc., indicate the orientation or positional relationship based on the accompanying drawings, and are only for the convenience of describing the invention and simplifying the description, and do not indicate or imply that the referred element must have a specific orientation, or be constructed and operated in a specific orientation. The terms "first," "second," etc., are used for descriptive purposes only and should not be construed as indicating or implying relative importance or implicitly specifying the number of indicated technical features. The term "a plurality" means two or more. The terms "connected" and "linked" should be interpreted broadly, for example, they can refer to fixed connections, detachable connections, integral connections, mechanical connections, electrical connections, communication connections, direct connections, indirect connections through an intermediate medium, and can refer to the internal communication of two elements or the interaction relationship between two elements. The term "and / or" includes any and all combinations of one or more of the related listed items. Those skilled in the art can understand the specific meaning of the above terms in this invention according to the specific circumstances.

[0038] To illustrate the technical solution described in this invention, specific embodiments are described below, showing only the parts related to the embodiments of this invention.

[0039] Example 1: As Figure 1 As shown, this invention provides a method for optimizing a cross-site photovoltaic power generation prediction model based on knowledge and data-driven AI, comprising the following steps:

[0040] S100. Based on the source site data and the target site data, determine the photovoltaic power prediction model that integrates physical and data analysis.

[0041] Prior to this step, data acquisition and preprocessing are also included.

[0042] Data acquisition specifically includes collecting photovoltaic time-series data from both the source and target sites, including at least power... Irradiance Component temperature Or its estimated value; optional parameters include ambient temperature, wind speed, humidity, time code (hour / day sequence), solar position characteristics, etc.

[0043] After preprocessing the collected data, the corresponding source site data and target site data are obtained. The data preprocessing includes:

[0044] Time alignment: Data from multiple sources is collected and processed at a uniform sampling interval. Resample and align timestamps.

[0045] Missing segments are handled by imputing or removing them according to preset rules. Preferably, short missing segments are interpolated linearly, while long missing segments are removed from the corresponding window of samples.

[0046] Exception handling: Setting values to zero, pruning or discarding values exceeding the physical limit, and... Set to zero.

[0047] Sample construction: Setting the sliding window length (Corresponding to the hyperparameter seq_len), construct the input sequence With predicted labels (e.g.) (or multi-step label prediction). Window length The optimal range is 6 to 288, with the specific range determined by the sampling interval and the rate of meteorological change.

[0048] Data partitioning: Divide the training and validation sets chronologically to avoid information leakage. Rolling validation can be used to cover seasonal variations.

[0049] In one or more embodiments, the physics-data fusion photovoltaic power prediction model includes a physical prediction model and corresponding data correction terms. Specifically, the physical prediction model is determined based on irradiance, module temperature, equivalent area, efficiency parameters, and temperature coefficient, while the data correction terms are determined using the output of the GRU model.

[0050] In a specific implementation example, the expression for the photovoltaic power prediction model that integrates physics and data is as follows:

[0051]

[0052] (1);

[0053] in, Irradiance, For component temperature, It is the equivalent area or capacity factor. For efficiency parameters, For temperature coefficient, This is the correction term output by the GRU (Gated Recurrent Unit) model.

[0054] In the above formula, Physical prediction models The magnitude of the correction term in the output of the GRU model The determined data correction term, also known as the logarithmic ratio monitoring term.

[0055] Furthermore, the GRU correction term is obtained through supervised learning of log-ratios:

[0056] (2);

[0057] in, This is the correction term output by the GRU at time t. Let be the actual photovoltaic power at time t. The physical baseline power is calculated from the irradiance at time t, module temperature, area / capacity factor, efficiency parameter, and temperature coefficient. It is a numerically stable constant, a positive constant used to prevent numerical instability caused by an excessively small denominator.

[0058] In a specific embodiment, Rated power can be used A multiple or other positive constant.

[0059] Furthermore, the magnitude of the correction term can be constrained to suppress output explosion caused by the exponential term. For example, suppose... If the unconstrained correction term is the output of the GRU network, then the constrained correction term can be expressed as:

[0060] (3);

[0061] in, The amplitude control parameter can be 2 to 6.

[0062] To put it simply, the aforementioned data correction terms can shift the focus of model learning from "absolute power magnitude" to "relative physical bias," transforming the absolute prediction task, which is inherently difficult to reuse across sites, into a relative correction task that is more suitable for cross-site sharing, thereby significantly enhancing cross-site transferability.

[0063] Furthermore, the logarithmic ratio monitoring term By binding the prediction target to the physical baseline, the model does not need to relearn the entire power generation law from a limited sample; it only needs to learn the relative biases not covered by the physical model. Therefore, when data at the target site is insufficient, it is easier to complete the modeling using existing physical priors. This is reflected in the ability to achieve more stable, transferable, and physically interpretable bias modeling even with limited sample data at the target site.

[0064] It should be noted that this embodiment uses source and target site data, not only to supplement the insufficient target site samples, but more importantly, to use the source site data to accumulate cross-site transferable hyperparameter optimization knowledge, training stability knowledge, and time cost knowledge. Then, combined with the target site data, site similarity calibration, migration constraints, and local fine-tuning are completed, thereby avoiding the negative migration risk caused by direct migration and realizing cross-site modeling that is "reusable in experience, correctable in differences, and feasible in optimization".

[0065] S200. Based on the source site data and the photovoltaic power prediction model, perform hyperparameter tests at at least one source site to obtain historical optimization process data.

[0066] In a specific embodiment, the hyperparameter testing steps include:

[0067] Step 1: Obtain photovoltaic time-series data from at least one source site and preprocess the photovoltaic time-series data.

[0068] In a specific embodiment, the photovoltaic time-series data includes at least photovoltaic power. Irradiance Component temperature Or its estimated values; optionally, it may also include ambient temperature, wind speed, humidity, time coding, and solar position characteristics.

[0069] Furthermore, the preprocessing process includes: time alignment of multi-source data according to a uniform sampling interval, imputation or removal of missing values, pruning, zeroing or removing outliers, and adjusting the parameters according to the sliding window length. Construct the input sequence samples and predicted labels. The window length is specified. The corresponding hyperparameter is seq_len. Then, the samples are divided into training and validation sets in chronological order to avoid information leakage.

[0070] Step 2: Construct the above-mentioned physical and data fusion photovoltaic power prediction model on the obtained source site.

[0071] Step 3: Perform multiple sets of hyperparameter experiments on the acquired source site. Each experiment corresponds to multiple sets of hyperparameters.

[0072] For example, a set of hyperparameter configurations corresponding to the i-th experiment. Hyperparameter configurations include: number of GRU layers, hidden dimension, dropout (random deactivation ratio, used to temporarily disable some neuron connections during neural network model training with a set probability to improve model generalization ability and suppress overfitting), window length, batch size, weight decay, gradient pruning threshold, and learning rate scheduler parameters. The learning rate scheduler parameters may include the scheduler's peak learning rate. At least one of the following: warm-up ratio, minimum learning rate ratio, and rise ratio. For a set of hyperparameter configurations corresponding to the i-th trial... The hyperparameters are loaded into the photovoltaic power prediction model, trained on the training set of the corresponding source site, and evaluated on the validation set.

[0073] Preferably, to ensure comparability between different experiments, each group of experiments is run under a fixed training budget, which is a fixed number of epochs (training cycle units) or a fixed number of steps (steps or times).

[0074] For the i-th experiment, record the hyperparameter configuration for each group. Corresponding verification and evaluation indicators Training time cost The training status labels are used to form a single trial record. The validation evaluation metrics are preferably nRMSE (normalized root mean square error) and / or normalized MAE (normalized mean absolute error); the training time cost is the actual time consumed under a fixed training budget, or the standardized time obtained by multiplying the average time consumed per round by the reference round; the training status labels include success, diverged, and invalid values (nan). Furthermore, in some embodiments, the trial status can be labeled based on whether invalid values appear during training, training divergence occurs, the training loss does not decrease within a preset number of rounds and the oscillation amplitude exceeds a threshold, or the validation evaluation metrics continuously deteriorate beyond a threshold.

[0075] Step 4: Summarize the multiple test records obtained from the source site to form a cross-site historical optimization library, which serves as historical optimization process data.

[0076] Historical optimization process data can be represented as a set of multiple records, each record containing at least... .in, Indicates the first The hyperparameter configuration corresponding to this experiment. This indicates the verification and evaluation index corresponding to the hyperparameter configuration. This indicates the training time cost corresponding to the hyperparameter configuration. Indicates the first The training state labels corresponding to this experiment. This is the test number.

[0077] In some embodiments, learning rate-related fields may be further recorded so that learning rate process knowledge can be extracted from historical optimization process data, the peak learning rate and the stable critical learning rate can be estimated, and the results can be used for the initialization of the initial hyperparameter combination of the target site, the screening of similar source sites, and subsequent Bayesian optimization.

[0078] Among the validation evaluation metrics mentioned above, nRMSE measures the overall deviation between the predicted and actual values. Normalized MAE, on the other hand, measures the mean absolute deviation of the prediction error.

[0079] It should be noted that this embodiment uses a normalized error index, the purpose of which is to make the evaluation results comparable between different sites, thereby facilitating the unified comparison of historical optimization process data across sites, the screening of similar sites, and subsequent hyperparameter migration optimization.

[0080] The above training time cost uses the time spent under a fixed training budget (fixed number of epochs or fixed number of steps) to ensure the comparability of different experiments; or uses the average time spent per round × reference round.

[0081] S300. Extract the meta-features of the target site data, and retrieve similar source sites from the historical optimization process data based on the meta-features to generate an initial hyperparameter combination set for the target site.

[0082] In one or more embodiments, the meta-features of the target site data include at least one of the statistical distribution characteristics of irradiance, temperature and power, data quality characteristics, and site geographical or equipment characteristics. The statistical distribution characteristics include at least one of the following: mean, standard deviation, quantiles, diurnal amplitude, or seasonal intensity.

[0083] The aforementioned data quality characteristics are used to characterize "data usability differences" and avoid misjudging transferability based solely on physical distribution similarity. In this embodiment, these include data quality characteristics such as missing rate, outlier rate, effective sample size, and sampling interval.

[0084] The aforementioned geographical or device features are used to compensate for long-term sources of difference that statistical features cannot fully represent. Differences exist between different sites due to climate zones, geographical conditions, component aging, and occlusion. These differences can lead to slow convergence, decreased accuracy, and unstable training during cross-site migration. Therefore, including geographical or device features in the meta-feature scope aims to incorporate the "causal information" of site differences into similarity judgments.

[0085] In a specific embodiment, the target site's meta-features are represented as meta-feature vectors. , will the The meta-features of each source site are represented as meta-feature vectors. Preferably, the meta-feature vector is composed of continuous meta-features and categorical meta-features. For continuous meta-features, standardization or normalization is first performed to eliminate the influence of different dimensions on similarity search results. For categorical geographical or device features, one-hot encoding, sequential encoding, or preset discrete encoding methods are used for numerical representation, thereby obtaining a comparable site meta-feature vector under a unified dimension. This processing method is consistent with the overall idea in the specification that "when there are differences in the dimensions of indicators between sites, normalized indicators are used as the screening criteria to ensure cross-site comparability."

[0086] Furthermore, the aforementioned similarity retrieval is achieved by calculating the meta-feature similarity between the target site and each source site, rather than directly calculating the similarity for each individual test record in the historical optimization process data. That is, the similarity is first calculated at the site level based on... and Identify similar source sites, and then extract high-performing hyperparameter test records from the historical optimization process data corresponding to the similar source sites to generate the initial hyperparameter combination set for the target site.

[0087] In a preferred embodiment, a weighted Euclidean distance is used to calculate the distance between the target station and the first station. Meta-feature distance between source sites :

[0088] (4);

[0089] in, This represents the target site after standardization or normalization. Dimensional features, Represents the standardized or normalized first digit. The source site Dimensional features, The total number of meta-feature dimensions. For the first The weights corresponding to the dimensional features satisfy the following conditions: and .

[0090] Preferably, group weights can be set according to the category of meta-features, so that statistical distribution features, data quality features, and geographical or device features have different contributions in similarity judgment. For example, a first group of weights can be assigned to statistical distribution features, a second group of weights to data quality features, and a third group of weights to geographical or device features, and then distributed to each meta-feature dimension within each group. This can simultaneously take into account both short-term data distribution similarity and the causes of long-term site differences.

[0091] Furthermore, based on the aforementioned meta-distance Calculate the target site and the first Similarity between source sites Preferably, the similarity function uses a Gaussian kernel function:

[0092] (5);

[0093] in, For distance scale parameters, Preferably, The similarity can be determined based on the median, mean, or preset empirical value of the distance distribution between candidate source sites and target sites. The advantages of using a Gaussian kernel function are twofold: firstly, it smoothly maps the distance values to the (0, 1] interval, facilitating comparisons of similarity between different source sites; secondly, the calculated similarity can be directly used for subsequent normalization to obtain the similarity weights required in step S400. .

[0094] In a preferred embodiment, source sites that meet a preset similarity threshold are identified as similar source sites. Specifically, if the first... Each source site satisfies:

[0095] (6);

[0096] It is then identified as a similar source site, among which A preset similarity threshold is used. Preferably, A similarity of 0.7 is acceptable. That is, when the similarity between the target site and a source site reaches 0.7 or higher, the source site can be considered to have a high degree of consistency with the target site at the meta-feature level and can be used as a similar source site to participate in warm-start initialization.

[0097] Considering that the number of candidate source sites and the similarity distribution may differ among different target sites, in another embodiment, a "threshold filtering + prior" approach can also be used. The method of "name completion" is used to determine similar source sites, that is: priority is given to selecting all sites that meet the criteria. The number of source sites that meet the threshold is less than a preset number. Then, sort by similarity from high to low, and supplement the previous ones. Each source site is considered a similar source site; when the number of source sites meeting the threshold exceeds [a certain threshold]... Alternatively, only the top-ranked similarity results can be retained. The source site is used to control the complexity of subsequent retrieval and weighting.

[0098] Furthermore, the determined set of similar source sites... The similarity scores of each source site were normalized to obtain similarity weights. :

[0099] (7);

[0100] in, Indicates the first The normalized similarity weights corresponding to each similar source site satisfy the following conditions: and .

[0101] Normalized similarity weights On the one hand, it can be used to measure the contribution of different similar source sites to the initialization of the target site. On the other hand, it can be directly used in the subsequent step S400 to weighted convergence of the optimal peak learning rate and stable critical learning rate of similar source sites, thereby forming the empirical prior of similar source sites.

[0102] When generating the initial set of hyperparameter combinations for the target site, the historical optimization process data corresponding to similar source sites are sorted according to the validation evaluation index, and the top results with the best validation evaluation index are selected. The hyperparameters corresponding to each hyperparameter experiment are summarized to form the initial hyperparameter combination set for the target site. .

[0103] Preferably, the verification evaluation index adopts the normalized error index, including nRMSE and / or normalized MAE, to ensure the comparability of evaluation results between different sites. The optimal range is 5 to 20. Therefore, the optimal hyperparameter combinations that have been validated on similar source sites can be used as the warm-start candidate set for the target site. This reduces the number of random initialization blind searches and invalid trials, while also narrowing the hyperparameter search space for the target site and improving the initial efficiency of cross-site migration optimization.

[0104] In a specific embodiment, a warm-start approach can be used to retrieve the similarity between the target site's meta-features and the source site's meta-features, and then use the similarity weight that exceeds a set threshold as the aforementioned similar source site.

[0105] The aforementioned warm-start refers to the initialization of the hyperparameter set of the entire neural network, which can be described as a warm start. In this embodiment, it is an initialization based on historical experience or an initial start with prior experience. Warm-start means that instead of starting the search for hyperparameters from a completely random or blank state, it first uses hyperparameter combinations that have performed well in historical experiments at similar sites as the initial candidate set for the target site, and then continues to optimize on this basis, in order to reduce the blind search and invalid experiments caused by random initialization.

[0106] In one or more embodiments, generating an initial set of hyperparameter combinations for the target site includes:

[0107] The hyperparameters corresponding to the top K hyperparameter experiments with the best validation evaluation indicators are selected from the historical optimization process data corresponding to similar source sites. These selected hyperparameters are then compiled as the initial hyperparameter combination set for the target site. The specific steps include:

[0108] Extracted target site metadata Meta-features of each source site Calculate distance or similarity, select the N source sites with the highest similarity as similar source sites, and select the hyperparameters corresponding to the top K hyperparameter experiments with the best validation evaluation index from the historical optimization process data corresponding to the similar source sites as the initial hyperparameter combination set for the target site. Preferably, K is between 5 and 20.

[0109] Preferably, when there are differences in the dimensions of indicators between sites, normalized indicators are used as the screening criteria to ensure cross-site comparability.

[0110] S400. Extract learning rate process knowledge from the historical optimization process data corresponding to the initial hyperparameter combination set. Predict the optimal peak learning rate and stable critical learning rate of the target site based on the learning rate process knowledge. Train and dynamically optimize the learning rate at the target site based on the optimal peak learning rate and stable critical learning rate.

[0111] In one or more embodiments, learning rate process knowledge is extracted from historical optimization process data corresponding to the initial hyperparameter combination set. Based on the learning rate process knowledge, the optimal peak learning rate for the target site is predicted. With stable critical learning rate The learning rate is dynamically optimized during the training process at the target site.

[0112] The knowledge involved in the above-mentioned learning rate extraction process includes:

[0113] Step 1: Extract learning rate-related fields and training state labels from the historical optimization process data corresponding to the initial hyperparameter combination set. Based on the training state labels and validation evaluation metrics, divide the historical experiments (hyperparameter experiments) into a stable experiment set and an unstable experiment set. Unstable experiments satisfy any of the following criteria.

[0114] (1) Invalid values appeared during the experimental training process;

[0115] (2) Divergence occurred during the experimental training process;

[0116] (3) The training loss does not decrease within the preset number of rounds and the oscillation amplitude exceeds the preset threshold;

[0117] (4) Verify that the evaluation indicators continue to deteriorate within consecutive preset evaluation rounds and the degree of deterioration exceeds the preset threshold.

[0118] The preset rounds, preset evaluation rounds, and preset thresholds can be preset based on historical training logs or determined based on validation set statistics.

[0119] Except for the unstable test, the other successfully converged tests are denoted as the stable test set.

[0120] The above learning rate-related fields include peak learning rate. It can also further include the initial learning rate, minimum learning rate ratio, warmup ratio, rising segment ratio, learning rate scheduler type, and optimizer type.

[0121] Step 2: Determine the optimal peak learning rate for the source site corresponding to the initial set of hyperparameter combinations. With stable critical learning rate .

[0122] For each source site corresponding to the initial set of hyperparameter combinations, the experiment with the optimal validation metric is selected from the hyperparameter experiments with the training state label "success," and the peak learning rate corresponding to this experiment is determined as the optimal peak learning rate for the corresponding source site. .

[0123] Furthermore, for the source site corresponding to the initial set of hyperparameter combinations, the stable critical learning rate is estimated by combining its stable and unstable test sets. Preferably, the peak learning rate is extracted for trials with state labels of diverged or nan. Peak learning rate Estimate its stable critical learning rate using the minimum value or a preset quantile. The upper bound of the learning rate; and extract the maximum peak learning rate that maintains stable convergence from the stable test set as its stability critical learning rate. The lower bound.

[0124] In some embodiments, when both the stable and unstable test sets of the source site corresponding to the initial hyperparameter combination set are non-empty, the maximum value of the peak learning rate in the stable test set is denoted as [value]. Let the minimum or quantile value of the peak learning rate in the unstable trial set be denoted as . Then we can:

[0125] (8);

[0126] Or, place in and The preset interpolation value between them is determined as the stable critical learning rate of the corresponding source site.

[0127] When only an unstable set of trials exists, the minimum or quantile value of the peak learning rate of the unstable trials can be used as an upper bound estimate of the stable critical learning rate. When only a stable set of trials exists, the stable critical learning rate of the source site can be supplemented by combining the stable critical learning rate statistics of similar source sites.

[0128] Furthermore, based on knowledge of the learning rate process, the optimal peak learning rate for the target site is predicted. With stable critical learning rate The learning rate is dynamically optimized during training at the target site, including the following steps:

[0129] Predict the optimal peak learning rate and stable critical learning rate of the target site based on the target site's meta-features and model size characteristics.

[0130] Based on the similarity weights between similar source sites and target sites, the optimal peak learning rate and stable critical learning rate of the learning rate process knowledge of similar source sites are weighted to obtain the prior optimal peak learning rate and prior stable critical learning rate.

[0131] The predicted optimal peak learning rate is fused with the prior optimal peak learning rate prediction, and the predicted stable critical learning rate is fused with the prior stable critical learning rate. After fusion, dynamic learning rate training is performed separately.

[0132] In a specific embodiment, learning rate prediction samples are constructed on a per-source-site basis. For the first... One source site, constructing a sample as follows .in: The source site metadata features include at least one of the following: statistical distribution features of irradiance, temperature and power, data quality features, and geographical or equipment features. The model size features include at least one of hidden dimension and batch size, and may further include the number of layers, parameter size, or sequence length.

[0133] ,

[0134] (9);

[0135] in, , The optimal peak learning rate and the stable critical learning rate for the s-th source site are respectively. The knowledge is obtained through the above-described process of extracting the learning rate.

[0136] Based on the constructed samples, a learning rate prediction model is trained. This model can employ Gaussian process regression, random forest regression, gradient boosting tree regression, multilayer perceptron, or a combination thereof. Using a logarithmic learning rate as the prediction target helps reduce numerical instability caused by changes in the learning rate across orders of magnitude. After the model is trained, it takes the target site's meta-features and the model size features as inputs and outputs... and The predicted value.

[0137] Let the meta-features of the target site be The model size characteristics are First, and Input the trained learning rate prediction model to obtain the model's predicted values:

[0138]

[0139] (10);

[0140] in, This represents the optimal peak learning rate prediction model. This represents a stable critical learning rate prediction model. This indicates that the optimal peak learning rate prediction model is used. Based on target site meta-features and model size features The output is the logarithmic prediction of the optimal peak learning rate for the target site; This indicates a prediction model based on a stable critical learning rate. Based on target site meta-features and model size features The output is the logarithmic prediction of the target site's stable critical learning rate.

[0141] Furthermore, the similarity weights between the similar source sites obtained in step S300 and the target site are combined. Calculate the weighted logarithmic prior of similar source sites:

[0142]

[0143] (11);

[0144] in, The normalized similarity weights, where s takes values from 1 to n, satisfying... where n is the number of similar source sites; This represents the optimal peak learning rate from similar source sites. Weighted by similarity Logarithmic prior values obtained by weighted aggregation; This represents the stable critical learning rate derived from similar source sites. Weighted by similarity Logarithmic prior values obtained by weighted aggregation; , It belongs to the model prediction term. , This belongs to the prior knowledge of similar source sites.

[0145] The model predictions are fused with the weighted priors of similar source sites (prior optimal peak learning rate, prior stable critical learning rate) to obtain:

[0146] ,

[0147] (12);

[0148] in, This is the fusion coefficient.

[0149] Furthermore, the optimal peak learning rate and stable critical learning rate for the target site were obtained:

[0150]

[0151] (13);

[0152] The optimal peak learning rate for the target site in terms of prediction accuracy. The critical learning rate boundary characterizes the target site in terms of training stability.

[0153] in accordance with and Determine the peak learning rate used in actual training at the target site and perform dynamic learning rate training. To ensure training stability, set the peak learning rate used in actual training at the target site. satisfy:

[0154] (14).

[0155] In some embodiments, the actual peak learning rate of the target site during training can be further determined as:

[0156] (15).

[0157] This balances prediction accuracy and training stability.

[0158] The AdamW optimizer is used for dynamic learning rate training. The learning rate scheduler is preferably Warmup+Cosine or OneCycle. Parameters such as warmup ratio, minimum learning rate ratio, and rising phase ratio can be selected within a preset range and can be incorporated into the subsequent fine-tuning process.

[0159] It should be noted that during target site training, a neural network gradient norm pruning threshold (preferably 0.5 to 5.0) and a correction term magnitude constraint are combined to enhance stability.

[0160] The Warmup+Cosine approach, often described as a learning rate scheduling strategy involving preheating and cosine annealing, is abbreviated as Warmup+Cosine Annealing. Warmup involves gradually increasing the learning rate from a small value to a set peak value at the beginning of training to prevent instability caused by an excessively large initial learning rate. Cosine annealing involves gradually decreasing the learning rate according to a cosine function after reaching the peak value, resulting in smoother convergence in the later stages of training. In short, Warmup+Cosine is a dynamic learning rate scheduling method that involves gradually increasing the learning rate in the early stages of training (like a preheating phase) and then gradually decreasing it using cosine annealing after reaching the peak value, balancing stability in the early stages with improved convergence in the later stages.

[0161] The OneCycle mentioned above can usually be described in Chinese as: Single-cycle learning rate scheduling strategy, or simply single-cycle learning rate adjustment method. OneCycle refers to a single-cycle dynamic learning rate scheduling strategy, which first gradually increases the learning rate to a peak value within a training cycle, and then gradually decreases it, thereby balancing the convergence efficiency in the early stages of training with the stable convergence effect in the later stages of training.

[0162] To make it easier to understand, this embodiment does not simply set the learning rate to "varies with training," but rather extracts the learning rate field and training state label from the historical optimization library of the source sites, distinguishes between success, diversified, and NaN trials, and further estimates the learning rate of each source site. and Then, based on the target site's meta-features and model size features, the learning rate parameter for the target site is predicted, and the parameters are set. ,or This safety constraint is finally combined with AdamW and Warmup+Cosine or OneCycle for dynamic training. That is, the aforementioned dynamic learning rate is dynamically set by inferring cross-site historical process knowledge and constrained by stable boundaries.

[0163] S500. Based on the optimized learning rate, cost-sensitive objective function, and target site data, a Bayesian optimization search is used to obtain the optimal hyperparameters and corresponding photovoltaic power prediction model for the target site.

[0164] In one or more embodiments, the cost-sensitive objective function includes training time cost, which is the time spent under a fixed training budget, or the standardized time spent per round multiplied by a preset reference round.

[0165] In a specific embodiment, the expression for the cost-sensitive objective function is as follows:

[0166] (16);

[0167] in, To verify the indicators, For training time cost, Configure baseline hyperparameters for the target site Training time, It is a time cost weight used to control the trade-off between accuracy and time.

[0168] Select target site baseline hyperparameter configuration (This can be the default configuration that performs stably in warm-start), and the baseline training time is obtained after one training session. .

[0169] The above set of initial hyperparameter combinations Using the initial point as the starting point, perform a Bayesian optimization fine-tuning search to obtain the optimal hyperparameters. .

[0170] In one or more embodiments, the Bayesian optimization search employs a multi-fidelity strategy, the steps of which include:

[0171] First, candidate hyperparameters are selected using a first training budget. Then, the selected candidate hyperparameters are trained and evaluated using a second training budget to obtain the final cost-sensitive objective function value and training time cost. The first training budget is less than the second training budget.

[0172] In some embodiments, the first training budget and the second training budget are preset epoch numbers or step numbers, respectively; preferably, the first training budget can be 10% to 50% of the second training budget, used to quickly screen candidate hyperparameters, and the second training budget is used to obtain the final cost-sensitive objective function value and training time cost.

[0173] It should be noted that the optimal hyperparameters, model weights, preprocessing parameters (normalized statistics, missing data handling strategies, etc.) and learning rate scheduler parameters output in step S500 are used for online or offline power prediction at the target site.

[0174] In summary, this embodiment utilizes source site data to accumulate cross-site transferable hyperparameter optimization knowledge, training stability knowledge, and time cost knowledge. It then combines this with target site data to complete site similarity calibration, transfer constraints, and local fine-tuning, thereby avoiding the negative transfer risk caused by direct transfer and improving cross-site training efficiency and prediction performance.

[0175] Furthermore, this embodiment utilizes historical optimization process data from the source site to construct a cross-site historical optimization library. This library unifies and manages hyperparameter configurations, validation and evaluation metrics, training time costs, and training status, thereby forming reusable hyperparameter optimization knowledge, training stability knowledge, and time cost knowledge across different sites. By adopting a normalized error metric and a time cost definition under a fixed training budget, experimental results across different sites are more comparable, providing a unified data foundation for subsequent similar site selection, migration initialization, and target site optimization.

[0176] Furthermore, this embodiment extracts meta-features from the target site data and combines them with historical optimization process data to retrieve a set of similar source sites, generating an initial hyperparameter combination set for the target site. This ensures that the migration process is no longer a direct copy of the source site's experience, but rather involves first calibrating site similarity and then performing constrained migration initialization. This effectively reduces the negative migration risk caused by differences between the source and target sites in meteorological distribution, power distribution, data quality, and site geographical or equipment conditions, narrows the target site's hyperparameter search space, reduces the number of invalid trials, and improves the initial efficiency of cross-site optimization.

[0177] Furthermore, this embodiment extracts the learning rate field and training state label from the historical optimization library of the source sites, distinguishes between success, diverged, and NaN experiments, estimates the optimal peak learning rate and stable critical learning rate for each source site, and then predicts the learning rate parameters for the target site by combining the target site's meta-features and model size features. Dynamic training is then performed using AdamW and Warmup+Cosine or OneCycle with appropriate safety constraints. This reduces the probability of training divergence, invalid values, and oscillating non-convergence, enhancing the stability, robustness, and repeatability of the target site training process.

[0178] Furthermore, this embodiment introduces a cost-sensitive objective function that includes training time costs during the target site optimization phase, and employs a multi-fidelity Bayesian optimization strategy. It first uses a smaller training budget to screen candidate hyperparameters, and then uses a larger training budget for refinement. This reduces the number of high-cost training iterations, shortens parameter tuning and convergence time, and improves the efficiency of computing resource utilization while maintaining predictive performance. Compared to a large-scale search starting from scratch, this embodiment can obtain the optimal hyperparameters and corresponding model for the target site at a lower experimental cost.

[0179] Therefore, this embodiment can not only improve the training efficiency and prediction effect of the cross-site photovoltaic power generation prediction model on the target site, but also take into account the training stability, transfer reliability and engineering deployment feasibility, and achieve the cross-site modeling effect of "reusable experience, correctable differences and feasible optimization", which is especially suitable for application scenarios with limited target site samples, obvious site differences or limited computing power budget.

[0180] Example 2: Figure 2 As shown, this invention also provides a cross-site photovoltaic power generation prediction model optimization system based on knowledge and data-driven AI, used to implement the cross-site photovoltaic power generation prediction model optimization method based on knowledge and data-driven AI described in Embodiment 1. This system includes: a data preprocessing module, a physical and data fusion modeling module, a cross-site historical optimization library module, a site similarity retrieval and warm-start module, a learning rate process knowledge extraction and prediction module, a cost-sensitive Bayesian optimization module, and an output module. The data preprocessing module, physical and data fusion modeling module, cross-site historical optimization library module, learning rate process knowledge extraction and prediction module, cost-sensitive Bayesian optimization module, and output module are sequentially connected. The data preprocessing module, site similarity retrieval and warm-start module, and cost-sensitive Bayesian optimization module are sequentially connected. The cross-site historical optimization library module is connected to the site similarity retrieval and warm-start module.

[0181] Furthermore, the data preprocessing module is used to acquire source site data and target site data, perform time alignment, missing data handling, and anomaly handling on the site data, and construct sliding window samples.

[0182] The physical and data fusion modeling module is used to construct a physical and data fusion photovoltaic power prediction model based on source site data and target site data.

[0183] The cross-site historical optimization library module is used to perform hyperparameter experiments on at least one source site based on the sliding window samples of the source site and the photovoltaic power prediction model, and obtain historical optimization process data.

[0184] The site similarity retrieval and warm-start module is used to extract meta-features from the target site data and retrieve similar source sites from the historical optimization process data based on the meta-features to generate an initial set of hyperparameter combinations for the target site.

[0185] The learning rate process knowledge extraction and prediction module is used to extract learning rate process knowledge from the historical optimization process data corresponding to the initial hyperparameter combination set, predict the optimal peak learning rate and stable critical learning rate of the target site based on the learning rate process knowledge, and train the dynamic optimization learning rate at the target site based on the optimal peak learning rate and stable critical learning rate.

[0186] The cost-sensitive Bayesian optimization module is used to obtain the optimal hyperparameters and corresponding photovoltaic power prediction model for the target site by using Bayesian optimization search based on the optimized learning rate, cost-sensitive objective function, and sliding window samples of the target site.

[0187] The output module is used to output the optimal hyperparameters of the target site and the corresponding photovoltaic power prediction model, as well as the photovoltaic power generation prediction results of the target site.

[0188] The technical features not detailed in this embodiment are the same as those in Embodiment 1. Please refer to Embodiment 1 for details, and they will not be repeated here.

[0189] The above description is merely a preferred embodiment of the present invention. Those skilled in the art will understand that various changes or equivalent substitutions can be made to these features and embodiments without departing from the spirit and scope of the present invention. Furthermore, under the teachings of the present invention, these features and embodiments can be modified to adapt to specific situations and materials without departing from the spirit and scope of the present invention. Therefore, the present invention is not limited to the specific embodiments disclosed herein, and all embodiments falling within the scope of the claims of this application are within the protection scope of the present invention.

Claims

1. A knowledge and data double-driven AI-based cross-site photovoltaic power generation power prediction model optimization method, characterized in that, Includes the following steps: Based on source site data and target site data, a photovoltaic power prediction model integrating physical and data fusion is determined; Based on the source site data and the photovoltaic power prediction model, perform hyperparameter experiments at at least one source site to obtain historical optimization process data; Extract the meta-features of the target site data, and retrieve similar source sites from the historical optimization process data based on the meta-features to generate an initial hyperparameter combination set for the target site; Learn rate process knowledge is extracted from the historical optimization process data corresponding to the initial hyperparameter combination set. The optimal peak learning rate and stable critical learning rate of the target site are predicted based on the learn rate process knowledge. The learning rate is then dynamically optimized at the target site based on the optimal peak learning rate and the stable critical learning rate. Based on the optimized learning rate, cost-sensitive objective function, and target site data, Bayesian optimization search is used to obtain the optimal hyperparameters and corresponding photovoltaic power prediction model for the target site. Based on the knowledge gained from the learning rate process, predict the optimal peak learning rate and the stable critical learning rate for the target site. Then, based on the optimal peak learning rate and the stable critical learning rate, train and dynamically optimize the learning rate at the target site. The steps include: Predict the optimal peak learning rate and stable critical learning rate of the target site based on the target site's meta-features and model size characteristics. Based on the similarity weights between the similar source sites and the target site, the optimal peak learning rate and the stable critical learning rate of the learning rate process knowledge of the similar source sites are weighted to obtain the prior optimal peak learning rate and the prior stable critical learning rate. The predicted optimal peak learning rate is fused with the prior optimal peak learning rate, and the predicted stable critical learning rate is fused with the prior stable critical learning rate. After fusion, dynamic learning rate training is performed separately.

2. The knowledge and data dual-driven AI-based cross-site photovoltaic power generation power prediction model optimization method according to claim 1, characterized in that, The physical and data fusion photovoltaic power prediction model includes a physical prediction model and corresponding data correction terms for the physical prediction model. The physical prediction model is determined based on irradiance, component temperature, equivalent area, efficiency parameters, and temperature coefficient. The data correction term is determined by the magnitude of the correction term output by the GRU model, and the magnitude of the correction term is constrained by the magnitude control parameter.

3. The knowledge and data dual-driven AI-based cross-site photovoltaic power generation power prediction model optimization method according to claim 1, characterized in that, The historical optimization process data includes hyperparameters, validation evaluation metrics, and training time cost. The hyperparameters include learning rate-related fields, and the learning rate-related fields include the peak learning rate.

4. The method for optimizing a cross-site photovoltaic power generation prediction model based on knowledge and data-driven AI according to claim 3, characterized in that, Generate an initial set of hyperparameter combinations for the target site, including: The hyperparameters corresponding to the top K hyperparameter experiments with the best verification evaluation index are selected from the historical optimization process data corresponding to the similar source sites, and the selected hyperparameters are summarized as the initial hyperparameter combination set of the target site.

5. The method for optimizing a cross-site photovoltaic power generation prediction model based on knowledge and data-driven AI according to claim 3, characterized in that, Learning rate process knowledge is extracted from the historical optimization process data corresponding to the initial hyperparameter combination set, including: Extract the learning rate-related fields, training state labels, and validation evaluation metrics from the historical optimization process data corresponding to the initial hyperparameter combination set. Based on the training state labels and validation evaluation metrics, divide the hyperparameter experiments into a stable experiment set and an unstable experiment set. For the source site corresponding to the initial set of hyperparameter combinations, the experiment with the optimal validation evaluation index is selected from the hyperparameter experiments whose training state label is successful, and the peak learning rate corresponding to the experiment is determined as its optimal peak learning rate; for the hyperparameter experiments whose state label is divergent or invalid, the corresponding peak learning rate is extracted, and the upper bound of its stable critical learning rate is estimated based on the peak learning rate using the minimum value or a preset quantile; the maximum peak learning rate that can maintain stable convergence is extracted from the stable experiment set as the lower bound of its stable critical learning rate.

6. The method for optimizing a cross-site photovoltaic power generation prediction model based on knowledge and data-driven AI according to claim 5, characterized in that, When both the stable test set and the unstable test set of the source station corresponding to the initial hyperparameter combination set are not empty, the stable critical learning rate of the source station corresponding to the initial hyperparameter combination set is determined based on the maximum value of the peak learning rate in the stable test set and the minimum value of the peak learning rate in the unstable test set.

7. The method for optimizing a cross-site photovoltaic power generation prediction model based on knowledge and data-driven AI according to claim 1, characterized in that, The cost-sensitive objective function includes training time cost, which is the time consumed under a fixed training budget, or the standardized time consumed by multiplying the average time consumed per round by a preset reference number of rounds.

8. The method for optimizing a cross-site photovoltaic power generation prediction model based on knowledge and data-driven AI according to claim 1, characterized in that, The Bayesian optimization search employs a multi-fidelity strategy, and its steps include: First, candidate hyperparameters are selected using a first training budget. Then, the selected candidate hyperparameters are trained using a second training budget and evaluated to obtain the final cost-sensitive objective function value and training time cost. The first training budget is less than the second training budget.

9. A cross-site photovoltaic power generation prediction model optimization system based on knowledge and data-driven AI, characterized in that, The method for optimizing a cross-site photovoltaic power generation prediction model based on knowledge and data dual-driven AI as described in any one of claims 1-8 includes: a data preprocessing module, a physical and data fusion modeling module, a cross-site historical optimization library module, a site similarity retrieval and warm-start module, a learning rate process knowledge extraction and prediction module, a cost-sensitive Bayesian optimization module, and an output module. The data preprocessing module, the physical and data fusion modeling module, the cross-site historical optimization library module, the learning rate process knowledge extraction and prediction module, the cost-sensitive Bayesian optimization module, and the output module are connected in sequence. The data preprocessing module, the site similarity retrieval and warm-start module, and the cost-sensitive Bayesian optimization module are connected in sequence. The cross-site historical optimization library module is connected to the site similarity retrieval and warm-start module.