Power load prediction method and device
The power load forecasting method combining Transformer and LSTM models utilizes multiple rounds of weighted training and validation set selection to form multiple weak learners, solving the problem in existing technologies that it is difficult to take into account both long-term patterns and local dynamic changes, and achieving higher accuracy in power load forecasting.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- WUHAN UNIV
- Filing Date
- 2026-03-13
- Publication Date
- 2026-06-19
AI Technical Summary
Existing power load forecasting methods struggle to simultaneously account for both long-term patterns and local dynamic changes, resulting in insufficient forecast accuracy.
The Transformer model is used for long-term dependency modeling, and the LSTM model is used for residual prediction. Multiple weak learners are formed through multiple rounds of weighted training and validation set selection, and finally weighted ensemble is performed to improve prediction accuracy.
It improves the accuracy and stability of power load forecasting, and can better balance the long-term patterns and local error corrections in the power load sequence.
Smart Images

Figure CN122246684A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of power load forecasting technology, and in particular to a power load forecasting method and apparatus. Background Technology
[0002] Electricity load forecasting is a fundamental aspect of power system operation and dispatching. The forecast results provide a basis for generation planning, grid dispatch control, reserve capacity allocation, and energy optimization management. As power system operations become increasingly complex, load data typically exhibits distinct time-series characteristics. On the one hand, it displays regular variations such as daily and weekly cycles; on the other hand, it is susceptible to short-term fluctuations due to factors such as user electricity consumption behavior, holiday arrangements, and changes in the external environment. Therefore, accurate electricity load forecasting is crucial for ensuring the safe and stable operation of the power system, improving resource allocation efficiency, and reducing operating costs.
[0003] Existing power load forecasting methods typically employ a single forecasting model to model the load sequence. However, power load sequences exhibit both long-term dependence and short-term fluctuations, making it difficult for a single model to effectively represent both long-term patterns and local dynamic changes simultaneously, resulting in insufficient forecasting accuracy. Summary of the Invention
[0004] This invention provides a power load forecasting method and apparatus to solve the problem that existing technologies rely solely on a single model for forecasting, which makes it difficult to take into account both long-term patterns and local dynamic changes in power load, resulting in insufficient forecasting accuracy.
[0005] This invention provides a power load forecasting method, comprising: acquiring historical load data and corresponding time information; constructing a forecast sample based on the historical load data and the time information, wherein the forecast sample includes input features and corresponding actual load values, and the input features include time features determined based on the time information and historical load features determined based on the historical load data; dividing the forecast sample into a training set and a validation set; inputting the input features into a Transformer model based on the training set to obtain a master forecast value; obtaining a residual forecast value using an LSTM model based on the error between the master forecast value and the actual load value, and obtaining a basic weak learner based on the master forecast value and the residual forecast value; performing multiple rounds of weighted training based on the basic weak learner and determining the weights corresponding to each weak learner to obtain multiple weak learners; filtering the multiple weak learners based on the validation set to obtain a target weak learner set; and weighting and integrating the forecast values output by each target weak learner in the target weak learner set according to their corresponding weights to obtain a power load forecasting result.
[0006] According to the power load forecasting method provided by the present invention, the step of obtaining residual predicted values using an LSTM model based on the error between the master predicted value and the actual load value, and obtaining a basic weak learner based on the master predicted value and the residual predicted values, includes: constructing residual samples based on the error between the master predicted value and the actual load value, and inputting the residual samples into the LSTM model to obtain residual predicted values; and combining the master predicted value and the residual predicted values to obtain a basic weak learner.
[0007] According to the power load forecasting method provided by the present invention, the step of performing multi-round weighted training based on the basic weak learner and determining the weights corresponding to each weak learner to obtain multiple weak learners includes: determining the learner weights corresponding to the current weak learner based on the prediction error of the current weak learner, and updating the sample weights corresponding to each training sample based on the prediction error; training the next weak learner based on the updated sample weights; and repeating the training multiple times to obtain multiple weak learners and their corresponding weights.
[0008] According to the power load forecasting method provided by the present invention, the step of filtering the plurality of weak learners based on the validation set to obtain a target weak learner set includes: sequentially adding the weak learners to be filtered from the plurality of weak learners to the current weak learner set, and calculating the prediction error before and after adding the weak learners to be filtered based on the validation set; if the prediction error after adding the weak learners to be filtered is less than the prediction error before adding, the weak learners to be filtered are retained in the target weak learner set.
[0009] According to the power load forecasting method provided by the present invention, the step of constructing a forecast sample based on the historical load data and the time information includes: constructing time features according to the time information; extracting historical time-series features according to the historical load values corresponding to multiple preset historical times; and combining the time features and the historical time-series features to form the input features.
[0010] The present invention also provides a power load forecasting device, comprising: a data acquisition module for acquiring historical load data and corresponding time information; A sample construction module is used to construct a prediction sample based on the historical load data and the time information. The prediction sample includes input features and corresponding actual load values. The input features include time features determined based on the time information and historical load features determined based on the historical load data. The main prediction module is used to divide the prediction samples into a training set and a validation set; based on the training set, the input features are input into a Transformer model to obtain the main prediction value; the residual prediction module is used to obtain the residual prediction value using an LSTM model based on the error between the main prediction value and the actual load value, and to obtain a basic weak learner based on the main prediction value and the residual prediction value; the weighted training module is used to perform multiple rounds of weighted training based on the basic weak learner, and to determine the weights corresponding to each weak learner to obtain multiple weak learners; the screening module is used to screen the multiple weak learners based on the validation set to obtain a target weak learner set; the weighted ensemble module is used to perform weighted ensemble of the prediction values output by each target weak learner in the target weak learner set according to the corresponding weights to obtain the power load prediction result.
[0011] The present invention also provides an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the program to implement the power load forecasting method as described above.
[0012] The present invention also provides a non-transitory computer-readable storage medium having a computer program stored thereon, which, when executed by a processor, implements the power load forecasting method as described above.
[0013] The power load forecasting method and apparatus provided by this invention acquires historical load data and corresponding time information to construct forecast samples including time features and historical load features, enabling the model input to simultaneously characterize the temporal regularity and historical variation characteristics of the power load sequence. The input features are fed into a Transformer model to obtain the master forecast value, which is then used to model the long-term dependencies and overall variation patterns in the load sequence. Further, based on the error between the master forecast value and the actual load value, an LSTM model is used to obtain residual forecast values to compensate for local fluctuations and short-term errors not fully represented in the master forecasting process, thus forming a basic weak learner. On this basis, multiple rounds of weighted training are performed on the basic weak learner to obtain multiple weak learners and their corresponding weights, allowing subsequent training to continuously focus on samples with large prediction errors. Then, based on the validation set, multiple weak learners are screened, retaining the target weak learner set that substantially improves the prediction effect, thus avoiding invalid weak learners from entering the final integration process. Finally, the predicted values output by each target weak learner in the target weak learner set are weighted and integrated according to their corresponding weights to obtain the power load forecasting result. Therefore, this invention can better balance long-term pattern modeling and local error correction in power load sequences, and improve the effectiveness and stability of multi-model combined prediction. Attached Figure Description
[0014] To more clearly illustrate the technical solutions in this invention or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are some embodiments of this invention. For those skilled in the art, other drawings can be obtained from these drawings without creative effort.
[0015] Figure 1 A flowchart illustrating an example of an electricity load forecasting method according to the present invention is shown schematically. Figure 2 This diagram illustrates a flowchart for constructing prediction samples based on historical load data and time information. Figure 3 The flowchart illustrates the process of performing multiple rounds of weighted training based on a basic weak learner, determining the weights of each weak learner, and obtaining multiple weak learners. Figure 4 The diagram illustrates a curve showing the fitting of the predicted power load to the actual value in an example of the present invention. Figure 5 The cumulative error curves for each model are illustrated schematically. Figure 6 The schematic diagram illustrates a structural block diagram of an example of an electrical load forecasting device provided by the present invention; Figure 7 This is a schematic diagram of the structure of the electronic device provided by the present invention. Detailed Implementation
[0016] To make the objectives, technical solutions, and advantages of this invention clearer, the technical solutions of this invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some, not all, of the embodiments of this invention. All other embodiments obtained by those skilled in the art based on the embodiments of this invention without creative effort are within the scope of protection of this invention.
[0017] Electricity load sequences exhibit both long-term dependence and short-term fluctuations. Existing technologies often employ a single model to predict electricity load. While some models can effectively extract periodic changes and long-term correlations in the load sequence, they lack the ability to correct for local errors and sudden fluctuations. Other models, although sensitive to short-term dynamic changes, have limited ability to grasp overall trends and long-term patterns, thus limiting prediction accuracy and stability.
[0018] In view of this, the present invention provides a method for predicting power load.
[0019] Figure 1The flowchart illustrating an example of an electricity load forecasting method according to the present invention is shown schematically.
[0020] like Figure 1 As shown, the method includes operations S110~S170.
[0021] In operation S110, historical load data and corresponding time information are obtained.
[0022] According to an embodiment of the present invention, historical power load data of the area or object to be predicted within a preset historical time range, as well as time information corresponding to each historical power load data, can be obtained first. The historical power load data can be load sequence data collected according to a preset sampling period, such as historical load values recorded at sampling intervals of 15 minutes, 30 minutes, or 1 hour. The preset historical time range can be set according to the needs of the prediction task, for example, selecting historical load data from several consecutive days, weeks, or months prior to the time to be predicted as the data basis for subsequently constructing prediction samples.
[0023] According to embodiments of the present invention, historical load data can be provided by a power dispatching system, an electricity consumption information collection system, a load monitoring system, or other data sources capable of providing historical load records. Time information may include one or more of date information, time information, weekday information, weekday / non-working day information, and holiday information. For example, for a given historical load value, the time information may indicate its corresponding specific date, hour, minute, and whether that date is a weekday or a holiday.
[0024] In some implementations, to ensure the continuity and effectiveness of subsequent sample construction, the acquired historical power load data can be preprocessed, such as aligning it according to time sequence, removing duplicate records, filling in missing records, or standardizing the sampling interval. The preprocessed historical load data and corresponding time information can then be used to construct prediction samples.
[0025] In operation S120, a prediction sample is constructed based on historical load data and time information. The prediction sample includes input features and corresponding actual load values. The input features include time features determined based on time information and historical load features determined based on historical load data.
[0026] According to an embodiment of the present invention, the length of the historical time window can be preset, and the historical time window can be slid across the historical load sequence composed of historical load data in chronological order to form multiple prediction samples. For each prediction sample, information including historical power load data within the historical time window and the time information corresponding to the prediction sample is used as the input features of the prediction sample, and the actual power load value corresponding to the target time after the end of the historical time window is used as the true load value.
[0027] For example, to achieve feature extraction from time series data and transform the raw time series data into a sample format suitable for deep learning model processing, the historical load data is first subjected to time feature construction, historical time series feature extraction, sample construction, and normalization processing. By explicitly introducing time periodic patterns and historical dependencies, the model's ability to represent intraday cycles, intraweekly cycles, and historical load continuity in the load sequence can be enhanced, thereby reducing the training difficulty caused by relying solely on implicit learning by the network.
[0028] In operation S130, the predicted samples are divided into a training set and a validation set; based on the training set, the input features are input into the Transformer model to obtain the master predicted value.
[0029] According to embodiments of the present invention, the predicted samples can be divided according to time sequence to maintain the temporal continuity of the power load sequence. Specifically, the predicted samples within the earlier time range can be divided into a training set, and the predicted samples within the later time range can be divided into a validation set. By dividing according to time sequence, data leakage from later times to the model training process of earlier times can be avoided, thus better conforming to the actual application scenario of power load forecasting.
[0030] In some implementations, the ratio of training set to validation set can be set according to the total number of samples and training requirements. For example, most of the predicted samples can be used as the training set, and the remaining predicted samples can be used as the validation set, so as to balance the sufficiency of model training and the effectiveness of validation. It should be understood that the specific ratio of training set to validation set does not constitute a limitation of the present invention, as long as it can meet the needs of model training and validation.
[0031] According to an embodiment of the present invention, after the training set and validation set are divided, the Transformer model can be trained based on the training set. Specifically, the input features from the training set are input into the Transformer model, which models the temporal dependencies in the input features and outputs prediction results corresponding to each training sample. These prediction results serve as the master prediction values. The master prediction values characterize the initial prediction results of the Transformer model for the power load at the target time based on the input features.
[0032] According to embodiments of the present invention, the Transformer model can utilize a self-attention mechanism to model long-term dependencies in input features. For the input feature sequence, query matrices are obtained through linear mapping. Key matrix Sum matrix The attention output is calculated based on the query matrix, key matrix, and value matrix, and can be represented as: (1) in, Represents the normalization function. This represents the dimension of the key vector. Using this method, the correlation between different time positions in the input sequence can be measured, thereby enhancing the model's ability to represent long-distance dependencies.
[0033] To extract temporal features from different representation subspaces, a multi-head self-attention mechanism can be further employed. For the th An attention point can be represented as: (2) in, , and The first The linear transformation matrix corresponding to each attention head This indicates the number of attention heads. The outputs of each attention head can be concatenated to obtain a multi-head attention output: (3) in, This indicates a splicing operation. This represents the linear mapping matrix corresponding to the multi-head outputs. After multi-head attention calculation, the outputs of each head are concatenated: (4) (5) in, For the first The output of each attention head.
[0034] Furthermore, the intermediate representation of the Transformer model is obtained through output mapping: (6) in, This represents the output of the multi-head attention module. To output the mapping matrix, This is the bias term. Based on the intermediate representation, it can be further processed by subsequent network layers of the Transformer model to output the master prediction value corresponding to each training sample. The master prediction value is used to characterize the initial prediction result of the Transformer model on the power load at the target time based on the input features, and serves as the basis for subsequent residual prediction and compensation processing.
[0035] In operation S140, based on the error between the master forecast and the actual load value, the residual forecast is obtained using the LSTM model, and a basic weak learner is obtained based on the master forecast and the residual forecast.
[0036] According to an embodiment of the present invention, after obtaining the master prediction value corresponding to each training sample, a sample set of residual values can be constructed based on the difference between the master prediction value and the actual load value, and the sample set of residual values can be learned using LSTM to obtain the corresponding residual prediction value. The residual prediction value is combined with the master prediction value to obtain the prediction result after error correction, and the prediction result after error correction is used as the output of the basic weak learner.
[0037] In the S150 operation, multiple rounds of weighted training are performed based on the basic weak learner, and the weights corresponding to each weak learner are determined to obtain multiple weak learners.
[0038] According to an embodiment of the present invention, in different training rounds, the focus of subsequent training is adjusted based on the results of the previous round, so that subsequent rounds can further focus on samples where the prediction effect in the previous round was insufficient. Thus, the weak learners obtained from different training rounds can differ in their focus on sample fitting, thereby forming multiple weak learners.
[0039] For example, multiple weak learners can be gradually obtained based on the training results of the basic weak learner during multiple rounds of training, and corresponding weights can be determined for each weak learner to characterize the degree of contribution of each weak learner to the final prediction result.
[0040] Specifically, in each training round, a weak learner is formed based on the master prediction value output by the Transformer model and the residual prediction value output by the LSTM model. To ensure that subsequent training rounds focus on training samples with larger prediction errors from the previous round, the sample weights corresponding to the training samples are adjusted based on the previous round's training results in different training rounds, and the next weak learner is trained based on the adjusted sample weights. Thus, multiple weak learners can be formed round by round under the same basic structure. Multi-round weighted training is used to ensure that subsequent training processes continuously focus on sample regions with insufficient prediction performance, thereby improving the ability of the weak learner set to represent different load change patterns.
[0041] In some embodiments, weak learners trained in different rounds can respectively have good fitting ability for different variation characteristics in the power load sequence. For example, some weak learners can better characterize load changes during stable periods, while others can better characterize fluctuation periods or local deviation characteristics. By performing multiple rounds of weighted training and forming multiple weak learners, the subsequent prediction process no longer relies on a single basic weak learner, but instead utilizes the combined results of multiple weak learners to participate in the final prediction, thereby improving the stability and adaptability of the prediction results.
[0042] In operation S160, multiple weak learners are filtered based on the validation set to obtain the target weak learner set.
[0043] According to an embodiment of the present invention, after obtaining multiple weak learners, a validation set can be used to filter the weak learners to retain those with good validation effects on the prediction results, thus forming a target weak learner set. The weak learners in the target weak learner set are used to participate in subsequent weighted ensemble output to improve the reliability and stability of the final prediction results.
[0044] The above screening process reduces the adverse effects of ineffective or inefficient weak learners participating in subsequent ensemble outputs, thereby improving the stability and effectiveness of the final ensemble prediction results. The selected target weak learner set can serve as the basis for subsequent weighted ensemble outputs.
[0045] In operation S170, the predicted values output by each target weak learner in the target weak learner set are weighted and integrated according to their corresponding weights to obtain the power load prediction result.
[0046] According to embodiments of the present invention, weighted integration can be implemented by weighted summation or weighted average.
[0047] By using the above setup, we can combine the Transformer model to characterize the long-term dependence and overall variation of the power load sequence, and then use the LSTM model to correct the prediction error. Through multiple rounds of weighted training, validation set selection, and weighted ensemble, we can improve the effectiveness of the weak learner combination, thereby enhancing the accuracy and stability of the power load prediction results.
[0048] Figure 2 The flowchart illustrating the construction of prediction samples based on historical load data and time information is shown.
[0049] like Figure 2 As shown, operation S120 includes operations S210~S230.
[0050] In operation S210, time features are constructed based on time information.
[0051] According to an embodiment of the present invention, for a historical load sequence sampled hourly, the time index is set as follows: To characterize the regular variations in historical load data over daily and weekly cycles, a time index was used. Periodic encoding is performed. Specifically, sine and cosine functions are used to map discrete-time indices to a continuous periodic space to maintain temporal cyclic continuity.
[0052] For a daily cycle, the time characteristic is defined as follows: (7) For the periodic cycle, the corresponding periodic time characteristics can be obtained: (8) in, This is a time index in hours; 24 and 168 correspond to the daily cycle length and the weekly cycle length, respectively. and These are the sine and cosine characteristics of the daily cycle, respectively. and These are the sine and cosine characteristics of the cycle, respectively.
[0053] In operation S220, historical time series features are extracted based on the historical load values corresponding to multiple preset historical times.
[0054] In embodiments of the present invention, in addition to time features, historical time-series features reflecting historical dependencies are extracted from historical load data. Let the historical load sequence be... Given a preset set of historical moments Then the hysteresis characteristic can be defined as: .
[0055] In this embodiment, considering the temporal continuity, daily repetition, and weekly repetition of the power load sequence, representative historical load values are selected to construct historical time-series features. For example, historical load values from the previous moment, the corresponding moment of the previous day, and the corresponding moment of the previous week can be selected for construction. (9) in, Characterizing the short-term duration of a load sequence, Used to reflect the cyclical patterns of electricity. Used to reflect the cyclical pattern.
[0056] In operation S230, time features and historical time series features are combined to form input features.
[0057] According to embodiments of the present invention, after obtaining the time features and historical time-series features, they can be further combined with other available features to form a model input feature vector. Specifically, the time features, exogenous features (exemplarily including meteorological features), and historical time-series features can be concatenated to obtain enhanced input features: (10) in, The time features generated by equations (7) and (8) , It is an exogenous feature. The historical time series features are represented by the concatenated total feature dimension. .
[0058] After the input feature vector is constructed, a sliding time window method is further used to construct prediction samples. Let the length of the historical time window be... The predicted step size is Then, for the first For each sample, its input can be defined as continuous. The input feature sequence at each time step is used to output the actual load value at the target time step. This can be represented as: (11) in, For the first The input matrix of training samples contains continuous Input feature information at each time point; the corresponding actual load value can be denoted as... In the case of single-step prediction, it is usually taken as... .
[0059] In this way, the original historical load sequence can be converted into a set of predicted samples consisting of an input matrix and actual load values, which can then be used for subsequent training and validation set partitioning and prediction model training.
[0060] Since different features typically exhibit significant differences in numerical range and fluctuation amplitude, to avoid the adverse effects of these dimensional differences on model training and to improve the model's training stability and generalization ability, the constructed features can be further normalized. For example, the min-max normalization method can be used to normalize each feature. After normalization, different features can be mapped to a uniform numerical range, thus facilitating subsequent model training. In one illustrative embodiment, the input features also include meteorological features, which include at least one of temperature, humidity, wind speed, rainfall, and light intensity.
[0061] In one illustrative embodiment, operation S150 includes: Residual samples are constructed based on the error between the master forecast and the actual load value, and then input into the LSTM model to obtain residual predictions. The master forecast and the residual predictions are combined to obtain the basic weak learner.
[0062] According to an embodiment of the present invention, after obtaining the master prediction value corresponding to each training sample, the residual value of each training sample can be determined as the difference between the corresponding actual load value and the master prediction value, that is: (12) in, Indicates the first The actual load value corresponding to each training sample. This represents the master prediction value output by the Transformer model. Residual samples are constructed based on the residuals and input into the LSTM model so that the LSTM model learns the variation pattern of the master prediction error and outputs the residual prediction value.
[0063] In one illustrative embodiment, the method further includes: determining a scaling factor for scaling the residual predictions based on the validation set; scaling the residual predictions using the scaling factor; and combining the scaled residual predictions with the master predictions to obtain the base weak learner.
[0064] According to an embodiment of the present invention, after obtaining the residual predicted values corresponding to each sample, a scaling factor for scaling the residual predicted values can be further determined based on the validation set. In this embodiment, the scaling factor is used to adjust the compensation magnitude of the residual predicted values to the master predicted values, so as to avoid the problem of insufficient or excessive compensation when directly using the residual predicted values for compensation.
[0065] For example, suppose the first verification set... The actual load value corresponding to each sample is The main prediction value output by the Transformer model is The residual prediction value output by the LSTM model is The scaling factor can then be determined by minimizing the error between the combined predicted values and the actual load values on the validation set. It can be exemplarily represented as: (13) in, Indicates the scaling factor. Represents the set of validation samples. Indicates the first in the verification set The actual load value corresponding to each sample This represents the main predicted value output by the Transformer model. This represents the residual prediction value output by the LSTM model. To prevent tiny constants with a denominator of zero.
[0066] After determining the scaling factor, the residual prediction values are first scaled using the scaling factor. The scaled residual prediction values can be expressed as: (14) in, Indicates the first The scaled residual prediction value corresponding to each sample.
[0067] Subsequently, the scaled residual predictions are combined with the master predictions to obtain the output of the base weak learner. The combined prediction result can be expressed as: (15) in, Indicates the basic weak learner on the first The prediction results output for each sample.
[0068] Therefore, the basic weak learner can not only maintain the Transformer model's ability to predict the overall variation and long-term dependence of power load sequences, but also correct local prediction biases through scaled and calibrated residual prediction values, thereby improving the accuracy and stability of the basic weak learner's output.
[0069] According to an embodiment of the present invention, the predictor formed by combining the master predicted value and the scaled residual predicted value is defined as a basic weak learner, used to define the basic structural form of the weak learner in subsequent rounds. During multi-round weighted training, based on the basic structural form of the basic weak learner, in the... The corresponding weak learner obtained in the training rounds can be denoted as: , is used to represent a specific weak learner instance formed based on the infrastructure.
[0070] Figure 3 The flowchart illustrates the process of performing multiple rounds of weighted training based on a basic weak learner, determining the weights of each weak learner, and obtaining multiple weak learners.
[0071] like Figure 3 As shown, operation S150 includes operations S310~330.
[0072] In operation S310, the learner weights corresponding to the current weak learner are determined based on the prediction error of the current weak learner, and the sample weights corresponding to each training sample are updated based on the prediction error.
[0073] According to an embodiment of the present invention, in the first During a round of training, based on the weight distribution of the training samples in the current round. The m-th weak learner is obtained through training: (16) in, The total number of training samples, Indicates the weak learner on the first Input features of each training sample The predicted output, Indicates the first The actual load value corresponding to each training sample.
[0074] After obtaining the number After using weak learners, the prediction error corresponding to each training sample is further calculated.
[0075] (17) in, Indicates the first Wheel of Life The normalized prediction error corresponding to each training sample.
[0076] After obtaining the prediction error for each training sample, the prediction error of the 1st training sample is further calculated. The overall loss for a weak learner: (18) in, It is the first The overall prediction error level of the weak learner under the current sample weight distribution. Based on the overall loss, the th... The learner weights corresponding to the weak learner: (19) in, Indicates the first The learner weights corresponding to the weak learner. From the above formula, it can be seen that when the overall loss... When the value is small, the corresponding learner weights A larger value indicates that the weak learner contributes more to the subsequent ensemble output.
[0077] Subsequently, the sample weights corresponding to each training sample are updated based on the prediction error: (18) in, Indicates the first Wheel of Life Each training sample corresponds to a sample weight. This update method allows training samples with larger prediction errors in the current round to receive more attention in subsequent training rounds.
[0078] In some implementations, a weight capping strategy can be further introduced to avoid training instability caused by excessively large weights on a small number of extreme samples. For example, the weight capping can be expressed as follows: (19) in, This is the preset upper limit of sample weights.
[0079] In operation S320, the next round of weak learners is trained based on the updated sample weights.
[0080] According to an embodiment of the present invention, after updating the sample weights corresponding to each training sample, the next round of weak learners is trained based on the updated sample weights. Thereafter, the process of calculating the prediction error of the current round of weak learners, determining the learner weights, and updating the sample weights is repeated, thereby obtaining multiple weak learners and their corresponding weights round by round.
[0081] By operating the S330, multiple rounds of training are repeated to obtain multiple weak learners and their corresponding weights.
[0082] After multiple rounds of weighted training, several weak learners and their corresponding weights are obtained. The corresponding weights characterize the relative contribution of each weak learner in the subsequent prediction process and serve as the basis for weighted fusion of the weak learner outputs. To further improve the effectiveness of the ensemble prediction, before the final weighted fusion, the weak learners can be screened based on the validation set to retain the target weak learners that effectively contribute to the prediction results.
[0083] In one illustrative embodiment, operation S160 includes: The weak learners to be selected from multiple weak learners are added to the current weak learner set in turn, and the prediction error before and after adding the weak learners to be selected is calculated based on the validation set.
[0084] If the prediction error after adding the weak learner to be screened is less than the prediction error before adding it, the weak learner to be screened will be retained in the target weak learner set.
[0085] According to an embodiment of the present invention, after obtaining multiple weak learners and their corresponding weights, the multiple weak learners are further filtered based on the validation set to obtain a target weak learner set. Let the first... After the first round of selection, the current set of weak learners is: The sum of their weights is: (20) The current ensemble model, consisting of the current set of weak learners, can be represented as follows: (twenty one) in, Indicates the th element in the current set of weak learners. The weights corresponding to each weak learner Indicates the first A weak learner for input samples The predicted output, This represents the output of the current ensemble model before adding the weak learners to be selected.
[0086] Subsequently, the weak learners to be selected from multiple weak learners... The weak learners are tentatively added to the current set of weak learners to construct a candidate ensemble model, which can be represented as: (twenty two) in, For weak learners to be selected The corresponding weights This represents the output of the candidate ensemble model after adding the weak learners to be selected.
[0087] After constructing the current ensemble model and candidate ensemble models, the prediction error before and after adding the weak learners to be screened is calculated based on the validation set. In this embodiment, the root mean square error on the validation set can be used as the evaluation metric, which can be expressed as: (twenty three) in, This represents the total number of samples in the validation set. Represents the set of validation samples. This represents the true load value corresponding to the i-th sample in the validation set. Indicates the first in the verification set The input features corresponding to each sample This represents the prediction output of the ensemble model on the validation set.
[0088] The prediction errors before and after being added to the weak learner to be screened are compared. When the following preset conditions are met: (twenty four) If adding the weak learner to be selected to the current weak learner set reduces the prediction error on the validation set, then the weak learner to be selected is retained in the target weak learner set, and the current weak learner set is updated; otherwise, the weak learner to be selected is not retained in the target weak learner set, and the current weak learner set remains unchanged.
[0089] After filtering all the weak learners to be selected, the target weak learners that are ultimately retained in the target weak learner set can be used for subsequent weighted ensemble output, which can be represented as: (25) in, This represents the final set of target weak learners obtained after filtering through the validation set.
[0090] In one illustrative embodiment, the power load prediction result of the power load prediction method provided by the present invention is compared with the actual power load value. In this embodiment, operation S150 is implemented using the AdaBoost algorithm.
[0091] Figure 4 The diagram illustrates a curve showing the fitting of the predicted power load result to the actual value in an example of the present invention; wherein, Ground Truth represents the actual value of the power load (actual load); Transformer-only represents the predicted power load using only Transformer; LSTM-only represents the predicted power load using only LSTM; Transformer-LSTM represents the load prediction directly based on the output of the base learner in an embodiment of the present invention; Transformer-LSTM-AdaBoost represents the predicted power load output according to operation S170 in an embodiment of the present invention; Load represents the power load; and Time index represents the time index.
[0092] Figure 5 The cumulative error curves of each model are schematically shown to represent the cumulative error between the predicted and actual values; wherein, Transformer represents the absolute value of the cumulative error when using only Transformer for prediction; LSTM represents the absolute value of the cumulative error when using only LSTM for prediction; Stack represents the absolute value of the cumulative error between the load prediction value directly based on the output of the base learner and the actual load value according to an embodiment of the present invention; BoostedStacking represents the absolute value of the cumulative error between the power load prediction result output according to operation S170 and the actual load value according to an embodiment of the present invention.
[0093] like Figure 4 and Figure 5 As shown, the power load forecasting method provided by the present invention has the lowest absolute value of cumulative error compared to other methods, and can better match the curve of the actual load value in each time period.
[0094] The power load forecasting device provided by the present invention is described below. The power load forecasting device described below can be referred to in correspondence with the power load forecasting method described above.
[0095] Figure 6 The schematic diagram illustrates a structural block diagram of an example of an electrical load forecasting device provided by the present invention.
[0096] like Figure 6 As shown, the power load forecasting device 600 includes: The data acquisition module 610 is used to acquire historical load data and corresponding time information.
[0097] The sample construction module 620 is used to construct prediction samples based on historical load data and time information. The prediction samples include input features and corresponding actual load values. The input features include time features determined based on time information and historical load features determined based on historical load data.
[0098] The main prediction module 630 is used to divide the prediction samples into a training set and a validation set; based on the training set, the input features are input into the Transformer model to obtain the main prediction value.
[0099] The residual prediction module 640 is used to obtain residual prediction values using an LSTM model based on the error between the master prediction value and the actual load value, and to obtain a basic weak learner based on the master prediction value and the residual prediction value.
[0100] The weighted training module 650 is used to perform multiple rounds of weighted training based on the basic weak learner and determine the weights corresponding to each weak learner to obtain multiple weak learners.
[0101] The filtering module 660 is used to filter multiple weak learners based on the validation set to obtain the target weak learner set.
[0102] The weighted integration module 670 is used to weight and integrate the predicted values output by each target weak learner in the target weak learner set according to the corresponding weights to obtain the power load prediction result.
[0103] Figure 7 An example is a schematic diagram of the physical structure of an electronic device, such as... Figure 7 As shown, the electronic device may include a processor 710, a communications interface 720, a memory 730, and a communication bus 740. The processor 710, communications interface 720, and memory 730 communicate with each other via the communication bus 740. The processor 710 can call logical instructions from the memory 730 to execute a power load forecasting method.
[0104] Furthermore, the logical instructions in the aforementioned memory 730 can be implemented as software functional units and, when sold or used as independent products, can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present invention, essentially, or the part that contributes to the prior art, or a part of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of the present invention. The aforementioned storage medium includes various media capable of storing program code, such as USB flash drives, portable hard drives, read-only memory (ROM), random access memory (RAM), magnetic disks, or optical disks.
[0105] In another aspect, the present invention also provides a non-transitory computer-readable storage medium having a computer program stored thereon, which, when executed by a processor, is implemented to perform the power load forecasting methods provided by the methods described above.
[0106] The device embodiments described above are merely illustrative. The units described as separate components may or may not be physically separate. The components shown as units may or may not be physical units; that is, they may be located in one place or distributed across multiple network units. Some or all of the modules can be selected to achieve the purpose of this embodiment according to actual needs. Those skilled in the art can understand and implement this without any creative effort.
[0107] Through the above description of the embodiments, those skilled in the art can clearly understand that each embodiment can be implemented by means of software plus necessary general-purpose hardware platforms, and of course, it can also be implemented by hardware. Based on this understanding, the above technical solutions, in essence or the part that contributes to the prior art, can be embodied in the form of a software product. This computer software product can be stored in a computer-readable storage medium, such as ROM / RAM, magnetic disk, optical disk, etc., and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute the methods described in the various embodiments or some parts of the embodiments.
[0108] Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention, and not to limit them; although the present invention has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that modifications can still be made to the technical solutions described in the foregoing embodiments, or equivalent substitutions can be made to some of the technical features; and these modifications or substitutions do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of the present invention.
Claims
1. A method for predicting electricity load, characterized in that, include: Obtain historical load data and corresponding time information; A prediction sample is constructed based on the historical load data and the time information. The prediction sample includes input features and corresponding actual load values. The input features include time features determined based on the time information and historical load features determined based on the historical load data. The predicted samples are divided into a training set and a validation set; Based on the training set, the input features are input into the Transformer model to obtain the main predicted value; Based on the error between the master predicted value and the actual load value, the residual predicted value is obtained using the LSTM model, and a basic weak learner is obtained based on the master predicted value and the residual predicted value. Multiple rounds of weighted training are performed based on the basic weak learner, and the weights corresponding to each weak learner are determined to obtain multiple weak learners. Based on the validation set, the multiple weak learners are filtered to obtain the target weak learner set; The predicted values output by each target weak learner in the target weak learner set are weighted and integrated according to their corresponding weights to obtain the power load prediction result.
2. The power load forecasting method according to claim 1, characterized in that, The step of obtaining residual predicted values using an LSTM model based on the error between the master predicted value and the actual load value, and obtaining a basic weak learner based on the master predicted value and the residual predicted values, includes: Residual samples are constructed based on the error between the master predicted value and the actual load value, and the residual samples are input into the LSTM model to obtain residual predicted values; The master prediction value is combined with the residual prediction value to obtain the basic weak learner.
3. The power load forecasting method according to claim 2, characterized in that, Also includes: Based on the validation set, determine the scaling factor used to scale the residual prediction values; The residual prediction value is scaled using the scaling factor, and the scaled residual prediction value is combined with the master prediction value to obtain the basic weak learner.
4. The power load forecasting method according to claim 1, characterized in that, The process involves performing multiple rounds of weighted training based on the basic weak learner, and determining the weights corresponding to each weak learner to obtain multiple weak learners, including: The learner weights corresponding to the current weak learner are determined based on the prediction error of the current weak learner, and the sample weights corresponding to each training sample are updated based on the prediction error. The next weak learner is trained based on the updated sample weights; Repeated training rounds were performed to obtain multiple weak learners and their corresponding weights.
5. The power load forecasting method according to claim 1, characterized in that, The step of filtering the multiple weak learners based on the validation set to obtain a target weak learner set includes: The weak learners to be selected from the plurality of weak learners are added to the current weak learner set in sequence, and the prediction error before and after adding the weak learners to be selected is calculated based on the validation set. If the prediction error after adding the weak learner to be screened is less than the prediction error before adding it, the weak learner to be screened will be retained in the target weak learner set.
6. The power load forecasting method according to claim 1, characterized in that, The construction of prediction samples based on the historical load data and the time information includes: Construct time features based on the time information; Historical time-series features are extracted based on the historical load values corresponding to multiple preset historical moments; The time features and the historical time series features are combined to form the input features.
7. The power load forecasting method according to claim 6, characterized in that, The input features also include meteorological features, which include at least one of temperature, humidity, wind speed, rainfall, and light intensity.
8. A power load forecasting device, characterized in that, include: The data acquisition module is used to acquire historical load data and corresponding time information; A sample construction module is used to construct a prediction sample based on the historical load data and the time information. The prediction sample includes input features and corresponding actual load values. The input features include time features determined based on the time information and historical load features determined based on the historical load data. The main prediction module is used to divide the prediction samples into a training set and a validation set; based on the training set, the input features are input into the Transformer model to obtain the main prediction value; The residual prediction module is used to obtain residual prediction values using an LSTM model based on the error between the master prediction value and the actual load value, and to obtain a basic weak learner based on the master prediction value and the residual prediction values. The weighted training module is used to perform multiple rounds of weighted training based on the basic weak learner and determine the weights corresponding to each weak learner to obtain multiple weak learners. A filtering module is used to filter the plurality of weak learners based on the validation set to obtain a target weak learner set; The weighted integration module is used to perform weighted integration of the predicted values output by each target weak learner in the target weak learner set according to the corresponding weights to obtain the power load prediction result.
9. An electronic device comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, characterized in that, When the processor executes the program, it implements the power load forecasting method as described in any one of claims 1 to 7.
10. A non-transitory computer-readable storage medium having a computer program stored thereon, characterized in that, When the computer program is executed by the processor, it implements the power load forecasting method as described in any one of claims 1 to 7.