Electricity load forecasting methods, devices, equipment, and media based on hybrid expert architecture
By employing a two-stage training approach with a hybrid expert architecture and a class-aware comparative routing mechanism, a general and partially specialized electrical load prediction model is constructed. This addresses the issues of insufficient accuracy and robustness in existing electrical load prediction technologies, achieving higher accuracy and stronger generalization capabilities in electrical load prediction.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- STATE GRID ZHEJIANG ELECTRIC POWER CO MARKETING SERVICE CENT
- Filing Date
- 2026-04-09
- Publication Date
- 2026-06-30
Smart Images

Figure CN122000888B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of electrical load forecasting technology, and in particular to an electrical load forecasting method, apparatus, device, and medium based on a hybrid expert architecture and a partially specialized expert architecture. Background Technology
[0002] Electricity load forecasting is a key foundational technology for power system operation and dispatch. Its core objective is to accurately grasp the dynamic characteristics and inherent patterns of electricity load evolution over time through in-depth mining and analysis of historical electricity consumption data, thereby achieving effective prediction of electricity demand for a future period. This technology plays a vital role in ensuring the safety and stability of the power grid, improving energy economic efficiency, and promoting the integration of clean energy, specifically manifested in multiple practical application scenarios such as power grid planning, optimized dispatch of generation resources, demand-side response management, and high-proportion renewable energy consumption.
[0003] However, electricity load time series often exhibit high complexity and uncertainty in real-world environments. Their variations are influenced not only by underlying time-series trends and cyclical patterns but also by the coupling effects of multiple external factors, including meteorological conditions (such as temperature and humidity), seasonal changes, user electricity consumption patterns, time-of-use pricing policies, holidays, and unforeseen events. These factors result in load series generally exhibiting strong non-stationarity, multi-scale volatility, and significant pattern heterogeneity. In other words, electricity load curves corresponding to different regions, different user industries, different electricity consumption types (such as industrial, commercial, and residential), and even different time periods (such as weekdays and weekends, peak and off-peak periods) often show significant differences in statistical distribution, rate of change, and fluctuation patterns. This poses a significant challenge to constructing high-precision prediction models with broad adaptability.
[0004] Currently, existing technologies have made some progress in predictive model architecture and feature engineering. However, existing methods mainly rely on a globally unified, parameter-sharing modeling approach, that is, using the same set of model parameters to train and infer load sequences for all types and scenarios. When faced with significantly different load patterns in reality, such methods are prone to pattern confusion and mutual interference during the model learning process, making it difficult to maintain ideal prediction accuracy in various differentiated scenarios. This limits the model's generalization performance and robustness in complex real-world power environments.
[0005] Therefore, there is an urgent need for a novel time series forecasting method that can accurately identify and characterize the inherent heterogeneity of electrical load sequences and achieve differentiated modeling and adaptive inference for different load patterns. This would improve the overall accuracy, stability, and practical value of electrical load forecasting technology in diverse and ever-changing real-world application scenarios. Summary of the Invention
[0006] To overcome the shortcomings of existing technologies, one of the objectives of this invention is to provide an electricity load forecasting method based on a hybrid expert architecture. This method employs a two-stage hybrid expert model architecture for electricity load forecasting. First, a general electricity load forecasting expert model is trained to learn the common patterns of electricity load changes in different regions and scenarios. Then, combined with a class-aware comparative routing mechanism, the input electricity load time series samples are intelligently allocated to multiple efficiently fine-tuned electricity load-specific forecasting expert models, thereby achieving refined modeling and forecasting for different electricity load patterns.
[0007] One of the objectives of this invention is achieved through the following technical solution:
[0008] A method for predicting electrical load based on a hybrid expert architecture includes the following steps:
[0009] Obtain a training dataset of electrical load time series and construct a prediction framework composed of multiple electrical load prediction expert models. The electrical load prediction expert models include at least a general electrical load prediction expert model and multiple partially specialized electrical load prediction expert models.
[0010] The general electric load prediction expert model is pre-trained by inputting the electric load time series training dataset into the general electric load prediction expert model to train it so that it learns the general variation law in the electric load time series. The model parameters obtained after training are saved as initialization parameters.
[0011] Based on the initialization parameters, the partial specialization electrical load prediction expert model is fine-tuned using a low-rank adaptation mechanism.
[0012] A route embedding with electrical load pattern discrimination capability is generated through class-aware contrastive learning, and the input electrical load samples are dynamically allocated to the corresponding partial specialization electrical load prediction expert model based on the route embedding.
[0013] The time series of electrical loads are predicted using a pre-assigned, partially specialized electrical load prediction expert model.
[0014] By combining pre-training with low-rank adaptation fine-tuning, and introducing a class-aware comparative routing mechanism, a two-stage expert specialization framework for electricity load prediction is constructed. This ensures the specialization of knowledge in the electricity load prediction expert model while enabling intelligent and accurate allocation of electricity load samples, thereby effectively improving the prediction accuracy and generalization ability of the model for complex electricity load time series patterns.
[0015] Furthermore, the time series samples of electrical load are input into the general electrical load prediction expert model in an end-to-end manner for training, so that the general electrical load prediction expert model learns the common dynamic characteristics of electrical load across regions and time scales. After training is completed, the model parameters are used as the initialization parameters.
[0016] By using an end-to-end pre-training approach, the general power load prediction expert model can fully learn and solidify the universal temporal evolution patterns in the power load data, providing a stable and high-value parameter initialization foundation for the efficient and high-quality specialization of the subsequent power load partial specialization prediction expert model.
[0017] To ensure that the specialization process of the load forecasting expert can effectively focus on the key modules for capturing the temporal dependencies of the load and generating forecast results, and to make the load knowledge transfer process more efficient and targeted, the initialization parameters include the weights and bias parameters of the multi-head self-attention layer, as well as the weights and bias parameters of the load forecasting head.
[0018] Furthermore, a low-rank adaptation mechanism is introduced to reduce computational overhead and storage requirements by performing efficient fine-tuning in a low-dimensional subspace. This balances the conflicting goals of "avoiding catastrophic forgetting" and "enhancing specialization capabilities," achieving efficient diversification of electricity load forecasting experts. Based on the initialization parameters, the partially specialized electricity load forecasting expert model is fine-tuned using the low-rank adaptation mechanism, including:
[0019] Using the parameters of a general electrical load forecasting expert model, several partially specialized electrical load forecasting expert models are fine-tuned to satisfy the following conditions: ,in, The parameters representing the general electrical load forecasting expert model, Representing the Parameters for a specialized load forecasting expert;
[0020] During fine-tuning, a low-rank adaptation approach is adopted, updating only a subset of parameter subspaces to maintain general knowledge and enhance adaptability to specific time-series patterns. During fine-tuning, the first... The output calculation of the individual load forecasting expert satisfies: ,in, Representing a time series, Represents a low-rank increment matrix. Indicates the process After fine-tuning, the first The final output obtained by an electricity load forecasting expert;
[0021] The low-rank increment matrix This is achieved through the product of two learnable matrices, and can be represented as: ,in , , for dimensions smaller than the original parameters The inherent rank.
[0022] Furthermore, by introducing contrastive learning to optimize the routing embedding of electrical load samples, the model can learn the similarity and difference relationships between different electrical load samples, thereby improving the routing mechanism's ability to distinguish different electrical load patterns. This allows samples with similar load characteristics to be accurately clustered and routed to the same electrical load partial specialization prediction expert model, improving the accuracy of sample allocation, including:
[0023] Extract the latent representation of the input electrical load time series samples;
[0024] Define a contrastive loss function and map the latent representation to a decision embedding using a routing network;
[0025] By using contrastive learning to optimize the representation capability of the decision embedding, a routing embedding with electrical load pattern discrimination capability is obtained to guide sample allocation;
[0026] Wherein, the contrastive loss function The calculation is as follows:
[0027] ,
[0028] in, Let τ be the similarity function, and τ be the temperature hyperparameter. Indicates the anchor point. Indicates a positive sample. This indicates a negative sample.
[0029] Furthermore, to enhance the model's predictive robustness and stability when handling fuzzy or mixed load feature samples with ambiguous load pattern boundaries, the electric load time series is predicted using pre-assigned partial-specialization prediction expert models. This includes calculating the probability distribution of a sample of electric load time series to be predicted using the Softmax function based on its corresponding routing embedding, assigning the sample to one or more partial-specialization prediction expert models for prediction according to the probability distribution, and weighting and summing the multiple prediction results to obtain the final electric load prediction result.
[0030] Both the general power load prediction expert model and the partial power load prediction expert model are built on the PatchTST architecture. By building the power load prediction expert model based on the PatchTST architecture, its advantages in jointly modeling local power consumption semantic features and long-term dependencies in power load time series modeling are fully utilized, providing a high-performance and highly scalable underlying model foundation for the entire partial-specialization hybrid expert power load prediction system.
[0031] The second objective of this invention is to provide an electrical load prediction device based on a hybrid expert architecture.
[0032] The second objective of this invention is achieved by the following technical solution:
[0033] An electrical load forecasting device based on a hybrid expert architecture, comprising:
[0034] The module is used to acquire the training dataset of electric load time series and construct a prediction framework composed of multiple electric load prediction expert models. The electric load prediction expert models include at least a general electric load prediction expert model and multiple partially specialized electric load prediction expert models.
[0035] The training module is used to pre-train the general-purpose electricity load prediction expert model. The electricity load time series training dataset is input into the general-purpose electricity load prediction expert model for training, and the model parameters obtained after training are saved as initialization parameters. Based on the initialization parameters, the partial-specialized electricity load prediction expert model is fine-tuned using a low-rank adaptation mechanism. A routing embedding with electricity load pattern discrimination capability is generated through class-aware contrastive learning, and the input electricity load samples are dynamically allocated to the corresponding partial-specialized electricity load prediction expert models according to the routing embedding.
[0036] The prediction module is used to predict the time series of electrical loads using a pre-assigned, partially specialized electrical load prediction expert model.
[0037] A third objective of this invention is to provide an electronic device for performing one of the objectives of the invention, comprising a processor, a storage medium, and a computer program, wherein the computer program is stored in the storage medium and, when executed by the processor, implements the above-described method for predicting electrical load based on a hybrid expert architecture.
[0038] A fourth objective of this invention is to provide a computer-readable storage medium storing one of the objectives of the invention, wherein a computer program is stored thereon, which, when executed by a processor, implements the above-described electrical load prediction method based on a hybrid expert architecture.
[0039] Compared with the prior art, the beneficial effects of the present invention are as follows:
[0040] This invention achieves high-precision prediction of electricity load time series by introducing a partial-specialization expert architecture for electricity load forecasting. During model training, a two-stage training approach combined with low-rank adaptive fine-tuning is employed. This retains general time-series knowledge of electricity load while efficiently training multiple partial-specialization prediction experts tailored to different electricity load patterns, power consumption scenarios, or load characteristics. This solves the technical problem that a single prediction model cannot simultaneously accommodate diverse and heterogeneous electricity load data patterns. Furthermore, a class-aware contrastive routing mechanism is introduced. Through contrastive learning, the routing network learns highly discriminative embedding representations of electricity load patterns, achieving accurate and dynamic matching between electricity load time series samples and the most suitable partial-specialization prediction experts. This further improves the accuracy and stability of electricity load forecasting, as well as the model's generalization ability in complex power system scenarios. Attached Figure Description
[0041] Figure 1 This is a general block diagram of the electrical load time series prediction method in Example 1;
[0042] Figure 2 This is a flowchart of the time series prediction method using a partially specialized expert architecture in Example 1;
[0043] Figure 3 This is a block diagram of the electrical load time series prediction device with a partially specialized expert architecture according to Embodiment 3;
[0044] Figure 4 This is a structural block diagram of the electronic device in Embodiment 4. Detailed Implementation
[0045] The present invention will now be described in more detail with reference to the accompanying drawings. It should be noted that the following description of the present invention with reference to the accompanying drawings is merely illustrative and not restrictive. Various embodiments can be combined with each other to form other embodiments not shown in the following description.
[0046] Example 1
[0047] Example 1 provides a method for predicting electrical load based on a hybrid expert architecture. It aims to train an expert model for electrical load prediction through a two-stage training method that combines pre-training and fine-tuning. By using a class-aware comparative routing mechanism, it enables different electrical load-specific prediction experts to accurately handle different types of electrical load time-series patterns. Finally, the electrical load prediction task is completed through multi-expert collaboration.
[0048] Please refer to Figure 1The diagram shows the overall block diagram of the electrical load time series prediction method based on class-aware contrastive routing and a partially specialized expert architecture. The method in this embodiment adopts a two-stage training paradigm to achieve an efficient combination of general electrical load knowledge and specific electrical load knowledge. It includes the first stage of pre-training of the general electrical load prediction expert model and the second stage of specialized training of the partially specialized electrical load prediction expert model.
[0049] First, general-purpose load forecasting experts learn common load patterns (such as base load characteristics and temperature sensitivity) across regions and time periods from massive amounts of historical data. Then, through a low-rank adaptation (LoRA) mechanism, multiple specialized experts are efficiently derived, each adept at characterizing specific patterns such as "summer air conditioning load," "industrial shock load," and "holiday load." Simultaneously, a class-aware comparative routing mechanism intelligently identifies the inherent patterns of the input load curve and dynamically and accurately assigns them to the most specialized experts. This combined mechanism enables the model to grasp both the global picture and the local context, significantly outperforming traditional single-model methods and existing hybrid expert methods in predicting nonlinear, abrupt, and multi-pattern hybrid load curves. Furthermore, it demonstrates excellent generalization ability in zero-sample migration scenarios across regions and seasons.
[0050] Among them, the hybrid expert model learns the feature representation of the input electrical load time series through a multi-scale time-series coding network, automatically extracts high-dimensional feature representations that characterize the frequency distribution, periodic stability and dynamic evolution of the load series, and then explores the similarity relationship between different samples in terms of periodic structure, frequency components and fluctuation patterns, and realizes the adaptive allocation of samples based on the similarity relationship.
[0051] Furthermore, the model does not rely on manually preset seasonal, industry, or regional labels. Instead, it employs an end-to-end joint optimization approach, eliminating the need for separate modeling. Under the combined influence of the prediction task loss function and the structural constraint loss function, the routing network learns to construct an embedding representation space for samples based on the statistical distribution characteristics, temporal correlation properties, and non-stationary variation characteristics of the load data. Within this embedding space, samples with similar temporal evolution patterns are implicitly clustered, forming a stable sample-expert mapping relationship. These samples are then assigned to the corresponding prediction expert models for processing. During training, by continuously adjusting routing weights and expert parameters, load samples with similar periodic structures, spectral characteristics, and fluctuation patterns are gradually concentrated in the same or adjacent expert models. Samples with significantly different temporal structures are assigned to different expert models, thereby enabling each expert model to gradually develop specialized modeling capabilities for specific load patterns over long-term training.
[0052] Based on the above principles, please refer to the following for details. Figure 2 As shown, an electrical load forecasting method based on a hybrid expert architecture includes the following steps:
[0053] S1. Obtain the training dataset of electric load time series and construct a prediction framework composed of multiple electric load prediction expert models. The electric load prediction expert models include at least a general electric load prediction expert model and multiple partially specialized electric load prediction expert models.
[0054] The aforementioned general load forecasting expert is used to extract general features from the full dataset. Different specialized load forecasting experts are trained by receiving different feature samples from the routing data, thereby improving the model's adaptability to different load patterns. Both the expert and the specialized expert are based on the PatchTST forecasting model architecture.
[0055] S2. Pre-train the general electric load prediction expert model by inputting the electric load time series training dataset into the general electric load prediction expert model for training, and saving the model parameters obtained after training as initialization parameters.
[0056] S2 Specifically, the time series samples of electrical load are input into a general electrical load prediction expert model in an end-to-end manner for training. This allows the general electrical load prediction expert model to learn the common dynamic characteristics of electrical load across regions and time scales. After training, the model parameters are used as the initialization parameters, expressed as follows:
[0057] ,
[0058] in, The parameters representing the general electrical load forecasting expert model, The representative model prediction loss, This represents the input electrical load time series and the target electrical load time series.
[0059] The aforementioned initialization parameters include the weights and bias parameters of the multi-head self-attention layer, used to capture the temporal dependencies and global correlation features in the electrical load time series; and the weights and bias parameters of the electrical load prediction head, wherein the prediction head is composed of a linear mapping layer or a feedforward network, used to generate the final output result of the electrical load prediction. These initialization parameters serve as the initial weights for the second-stage specialized prediction expert model training.
[0060] S3. Based on the initialization parameters, fine-tune the partial specialization electrical load prediction expert model using a low-rank adaptation mechanism;
[0061] S3, based on the initialization parameters, fine-tunes the partially specialized expert model using a low-rank adaptation mechanism, including:
[0062] Using the parameters of a general expert model for electricity load forecasting, several partially specialized expert models for electricity load forecasting are fine-tuned to adapt to the load characteristics of specific categories or patterns, satisfying the following: ,in, The parameters representing the general electrical load forecasting expert model, Representing the Parameters for a specialized load forecasting expert;
[0063] To avoid catastrophic forgetting during fine-tuning, a low-rank adaptation method (LoRA) is introduced to enhance the adaptability of experts to specific features while maintaining general generalization capabilities.
[0064] Traditional methods often require retraining the entire model to adapt to different scenarios, which is costly. This embodiment introduces a low-rank adaptation mechanism that completes expert specialization by updating only a very small number of parameters (the low-rank increment matrix). This brings several advantages: efficient training, significantly reducing computational and storage overhead, making it possible to quickly customize various specialized prediction models with limited resources; knowledge security, strictly inheriting the basic physical laws and safety constraints of the power system learned by general experts, effectively preventing "catastrophic forgetting" during the specialization process and ensuring the rationality of the prediction results; and flexible deployment, as the resulting model is self-contained, requiring no reliance on a large external feature database, meeting the stringent requirements of power control systems for real-time performance, independence, and high reliability.
[0065] During fine-tuning, a low-rank adaptation approach is adopted, updating only a subset of parameter subspaces to maintain general knowledge and enhance adaptability to specific load time series patterns. During fine-tuning, the first... The output calculation of each expert satisfies: ,in, Representing a time series, Represents a low-rank increment matrix; Indicates the process After fine-tuning, the first The final output obtained by the electricity load forecasting expert Explicitly indicate the parameters ( Forward computation is performed in the form of ).
[0066] Each expert no longer directly modifies the original parameters. Instead, it is achieved by introducing a low-rank increment matrix. This allows for efficient fine-tuning. The knowledge transfer mechanism effectively enables the reuse of general information while supporting diversity among experts, as described in the low-rank incremental matrix. This is achieved through the product of two learnable matrices, and can be represented as: ,in , are learnable low-rank adaptation parameters. , Project the original high-dimensional input into a low-rank subspace. In dimensionality reduction, similar operations can be performed. Then, the representation in the low-rank space is mapped back to the high-dimensional parameter space, similar to dimensionality recovery. for dimensions smaller than the original parameters The inherent order, This represents the dimension of the weight matrix inherited from the general expert. This represents the intrinsic rank of the adaptation process, which determines the capacity of the adaptation layer. The settings can be adjusted based on task complexity, model capacity, and experience; this embodiment does not impose any limitations on this. Low-rank settings significantly reduce the number of trainable parameters while preserving the model's expressive power.
[0067] By decoupling the task-specific adaptation process from general knowledge, LoRA preserves the representational capabilities of pre-trained experts while achieving efficient specialization. Furthermore, LoRA reduces the computational and memory overhead required for expert adaptation, enabling the method to scale to large-scale expert pools.
[0068] S4. Generate a routing embedding with electrical load pattern discrimination capability through class-aware contrastive learning, and dynamically allocate the input electrical load samples to the corresponding partial specialization electrical load prediction expert model based on the routing embedding.
[0069] S4 generates route embeddings with electrical load pattern discrimination through class-aware comparison, including:
[0070] Extract the latent representation of the input electrical load time series samples;
[0071] Define a contrastive loss function and map the latent representation to a decision embedding using a routing network;
[0072] By using contrastive learning to optimize the representation capability of decision embeddings, a routing embedding with class discrimination capability is obtained to guide sample allocation;
[0073] Wherein, the contrastive loss function The calculation is as follows:
[0074] ,
[0075] in, Let τ be the similarity function, and τ be the temperature hyperparameter. Indicates the anchor point. Indicates a positive sample. This indicates a negative sample.
[0076] Specifically, to achieve dynamic allocation of electrical load time series samples to optimal electrical load partial specialization prediction experts, this embodiment designs a class-aware contrastive routing module based on autoencoder latent representation. This module includes the following key components:
[0077] Gating functions play a central role in hybrid expert architectures by implementing conditional computation: selecting and weighting experts based on input data. For an input electrical load time series training sample... The routing network first computes an unnormalized routing vector: ,
[0078] in, This represents the number of training samples for the electrical load time series in a mini-batch. The number of experts It is a differentiable feature extraction module (e.g., a multilayer perceptron or convolutional neural network). Subsequently, the electrical load time-series samples... The intermediate routing vector obtained by the feature extraction module Normalization is performed using the Softmax function to obtain the probabilities assigned by the experts.
[0079] To enhance the routing signal's perception of electrical load temporal patterns, the model introduces an autoencoder module to learn latent representations from the original input sequence. Specifically, the encoder... Input the electrical load time series training samples Mapping to latent vectors :
[0080]
[0081] in, For the embedding dimension, the decoder Reconstruct the original input:
[0082]
[0083] The model parameters are obtained by minimizing the input of the electrical load time series training samples. Its reconstruction results Optimize the mean squared error (MSE) between them:
[0084] ,
[0085] This reconstruction objective prompts the encoder to learn a potential representation with information content.
[0086] Class-aware contrastive learning to improve electrical load timing routing embedding To enhance the discriminative ability of routing signals, this embodiment proposes a class-aware contrastive learning framework. This framework works collaboratively with a gating function and an autoencoder module within a hybrid expert architecture. Specifically, the gating function first generates a soft assignment from the samples to the expert based on the electrical load time-series input samples. The autoencoder further learns the latent dynamic structure from the original sequence to enhance the routing signal's perception of electrical load time patterns and provide a more stable and information-rich embedding representation for class-aware contrastive learning. Class-aware contrastive learning embeds routes for each anchor point. Explicitly construct positive and negative sample pairs to encourage clustering of similar samples while maintaining the distinction from dissimilar samples.
[0087] One component of this method is the auxiliary latent representation space for electrical load time-series samples. It is used as a proxy for calculating the similarity of electrical load time series samples. These representations are obtained by reconstructing the original electrical load time series samples through an autoencoder and extracting latent codes from its encoder network, thereby effectively capturing the latent temporal dynamics of the electrical load.
[0088] This process calculates the pairwise cosine similarity between all latent representations in the mini-batch to construct a temporal sample similarity matrix. :
[0089] ,
[0090] ,
[0091] in This represents the identity matrix, used to mask its own similarity.
[0092] Subsequently, for each anchor point Select the sample most similar to it as the positive sample:
[0093] ,
[0094] The remaining route embeddings Treated as negative samples, among which , Represents the sample index.
[0095] After determining the anchor points, positive samples, and negative samples, a contrastive loss is defined to guide the route embedding to learn a more class-discriminative representation:
[0096] ,
[0097] in, Represents a similarity function (e.g., cosine similarity). This is the temperature hyperparameter. The final contrast loss is obtained by averaging over the entire batch:
[0098] ,
[0099] By integrating this contrastive objective into the overall training loss, the gating mechanism is encouraged to learn a more category-aware and load-discriminating routing representation, ultimately improving the performance of expert selection in sparse MoE models (partially specialized expert models). The overall training loss consists of three parts: prediction loss... (Using Mean Absolute Error (MAE)) Reconstruction Loss of Autoencoder (Using mean squared error (MSE)) and weighted contrast loss (Using Cross-Entropy Loss), where Balance coefficient:
[0100] ,
[0101] This joint objective ensures that the model not only captures accurate temporal dynamics for prediction, but also learns an expressive routing space, thereby facilitating effective expert specialization.
[0102] S5. Perform time series prediction of electrical load using a pre-assigned partial specialized expert model.
[0103] In S5, the electrical load time series is predicted using the assigned partial specialization expert models. This includes calculating the probability distribution of the electrical load time series sample to be predicted using the Softmax function based on its routing embedding, and then assigning the sample to one or more experts for prediction based on the probability distribution, and finally summarizing the results to obtain the final prediction.
[0104] In summary, applying this method to load forecasting directly empowers the core operations of the power system. Its accurate forecasting capabilities support more precise generation planning and unit combination, reducing spinning reserves and lowering generation costs; it enhances the grid's ability to absorb the volatility of renewable energy sources such as wind and solar power, optimizing energy storage dispatch through more accurate net load forecasting; and it strengthens demand-side response efficiency, providing a refined data foundation for identifying adjustable load potential. Therefore, this method is not only an upgrade in forecasting technology but also a key enabling technology driving the evolution of the power grid towards intelligence and adaptability, possessing significant engineering application value and socio-economic benefits.
[0105] Example 2
[0106] Example 2 is a comparison of the method and model of Example 1 with existing time series prediction models.
[0107] This embodiment compares the method and model of Embodiment 1 with a variety of state-of-the-art time series prediction models, covering different architectural paradigms, including CNN-based models (TimesNet), MLP-based models (DLinear, TimeMixer), and Transformer-based models (Autoformer, FEDformer, PatchTST, and Pathformer).
[0108] The specific details of the above model / method are as follows:
[0109] TimesNet transforms a one-dimensional time series into multiple two-dimensional tensors by utilizing various periodic patterns. This method enables two-dimensional convolutional kernels to effectively capture temporal variations, with dynamic column encoding within the period and dynamic row encoding within the period.
[0110] DLinear decomposes time series into trend components and residual components, and models them using two single-layer linear networks respectively.
[0111] TimeMixer leverages multi-scale information in both the historical encoding and future prediction phases through a customizable decomposable architecture and a multi-predictor hybrid strategy.
[0112] Autoformer introduces an autocorrelation mechanism to replace self-attention, discovers the similarity of subsequences based on sequence periodicity, and aggregates similar subsequences from the underlying periodicity.
[0113] FEDformer combines the Transformer with a seasonal-trend decomposition, where the decomposition module captures the global profile of the time series. By leveraging frequency domain sparsity, a frequency enhancement mechanism is introduced to improve long-term forecasting performance.
[0114] PatchTST divides time series into subsequence-level patches to extract local semantic information and employs a channel-independent strategy, enabling each univariate channel to share the same embeddings and Transformer weights across all sequences.
[0115] Pathformer comprehensively models time series data by leveraging multi-scale temporal resolution and distance. It employs adaptive paths, using time-decomposition-based routers and aggregators to dynamically extract and combine multi-scale features based on the input, thereby achieving flexible and adaptive multi-scale prediction.
[0116] All models use a fixed input sequence length. Training and evaluation were conducted, and under various predicted step sizes. The tests were conducted and the average value was taken as the evaluation result. The evaluation used two commonly used indicators in time series forecasting: mean absolute error (MAE) and mean squared error (MSE). The test results are shown in Tables 1.1 and 1.2 below.
[0117] Table 1.1 Time Series Prediction Results
[0118]
[0119] Table 1.2 Comparison Results of Electricity Load Datasets
[0120]
[0121] As shown in Table 1.1 above, the method proposed in Example 1 demonstrates superior prediction performance in all benchmark environments, fully verifying its robustness and wide applicability in different scenarios. Compared with the traditional single modeling paradigm, this method, by introducing a class-aware contrastive routing strategy, can effectively utilize the potential similarity relationships between samples, guiding the routing network to learn more discriminative representations in the feature space, thereby achieving a more refined characterization of complex temporal patterns. This mechanism enables samples with different feature distributions to be reasonably assigned to appropriate expert models for processing, significantly enhancing the model's specialization ability and adaptability.
[0122] In experimental evaluations across six prediction tasks, our proposed method achieved the best results four times on the MSE metric and five times on the MAE metric, demonstrating significantly superior overall performance compared to various existing mainstream time series prediction models. These comparisons covered different architecture types, including CNN-based TimesNet, MLP-based DLinear and TimeMixer, and Transformer-based Autoformer, FEDformer, PatchTST, and Pathformer. These results clearly demonstrate that the proposed partially specialized hybrid expert framework not only performs exceptionally well under a single paradigm but also exhibits consistent and significant advantages over baseline models with various architectural designs.
[0123] Please refer to Table 1.2. This embodiment also compares the electrical load forecasting. Under different forecast step size settings (96, 192, 336, 720) of the Electricity electrical load dataset, the partially specialized expert architecture electrical load time series forecasting method proposed in Embodiment 1 achieved excellent forecasting performance in various benchmark test environments, fully verifying the robustness and wide applicability of the present invention under different forecast spans and load change scenarios.
[0124] In summary, by introducing a class-aware contrastive routing strategy, we can effectively uncover the potential similarities between different electrical load time series samples, guiding the routing network to learn load pattern representations with stronger discriminative capabilities in the feature space, thereby achieving a finer characterization of complex electrical load time series dynamics. This mechanism enables load samples with different electricity consumption characteristics to be rationally assigned to the most suitable partial-specialization prediction expert models for processing, significantly enhancing the model's specialization ability and environmental adaptability to diverse electrical load patterns.
[0125] Furthermore, in experimental evaluations of multiple electricity load forecasting tasks, our proposed method repeatedly achieved the best results in the MSE (Mean Sequence of Effect) index and also demonstrated a significant advantage in the MAE (Mean Sequence of Effect) index. Overall, its forecasting performance is significantly superior to several existing mainstream electricity load time series forecasting models. The selected comparison methods cover a variety of typical model architectures, including TimesNet based on convolutional neural networks, DLinear and TimeMixer based on multilayer perceptrons, and Autoformer, FEDformer, PatchTST, and Pathformer based on Transformer architectures. Experimental results show that the proposed partially specialized hybrid expert electricity load forecasting framework not only performs excellently under a single model paradigm but also exhibits consistent and significant performance advantages when facing baseline models with different structural designs.
[0126] Furthermore, both the general load forecasting expert model and the partially specialized load forecasting expert model in this method are built based on the PatchTST architecture. In direct comparison experiments with the original PatchTST model, this method achieves significant performance improvements on all forecasting tasks. This result not only demonstrates that the partially specialized hybrid expert architecture can effectively tap into and unleash the potential of PatchTST in load time series modeling, but also further verifies the effectiveness and unique advantages of the framework proposed in this invention for load forecasting tasks.
[0127] In summary, the above experimental results demonstrate from multiple perspectives the outstanding performance of this invention in characterizing complex electrical load time-series dynamics, improving the efficiency of collaborative work among prediction experts, and enhancing its generalization ability across time spans and multiple scenarios. This fully showcases its broad prospects as a new generation of electrical load time-series prediction method in practical power system applications.
[0128] To verify the generalization ability of the method of this invention in cross-dataset prediction tasks, this embodiment designed two sets of zero-shot experiments. First, a one-to-one evaluation was performed on the ETTh1, ETTh2, ETTm1, and ETTm2 datasets. As shown in Table 2, each model was trained on the source dataset and then directly tested on different target datasets without any fine-tuning. Second, a many-to-one evaluation was performed on the ETT, Electricity, Traffic, and Weather datasets. As shown in Table 3, one dataset was selected as the test set, and the remaining datasets were merged as the training set. This invention evaluates the cross-domain generalization ability of the model by comparing its performance on different source and target domain datasets. Compared to training and testing on a single dataset, the ability to maintain stable performance under cross-dataset conditions indicates that the model has stronger generalization ability.
[0129] Table 2 Zero-sample prediction results 1
[0130]
[0131] As shown in Tables 2 and 3, the method proposed in this invention demonstrates a sustained and significant advantage in both types of evaluation scenarios, fully proving its superior generalization performance in cross-dataset prediction tasks.
[0132] In one-to-one testing scenarios, our method outperforms existing baseline models on all datasets, demonstrating its robustness and adaptability in handling diverse time series feature distributions. Of particular note is that, compared to the high-performing Pathformer in recent years, our method achieves a 5.3% reduction in MSE and a 3.6% reduction in MAE; simultaneously, its prediction accuracy surpasses that of another strong baseline, TimeMixer. These results demonstrate that our method can effectively capture complex dynamic patterns in time series data without relying on fine-tuning of the target data, thus maintaining high prediction accuracy in zero-shot transfer scenarios.
[0133] In many-to-one test scenarios, our method maintains its leading advantage. As shown in Table 3, when multiple source domain datasets are used together for training and tested on an unseen target dataset, our method still significantly outperforms Pathformer and TimeMixer. This result demonstrates that our method can fully utilize the complementary information in multi-source data, effectively mitigating the performance degradation caused by distribution differences between different datasets, thus exhibiting stronger generalization ability in cross-domain tasks.
[0134] Overall, the method of this invention not only performs excellently in homogeneous scenarios, but also maintains its advantages in complex cross-dataset and cross-domain prediction tasks. This robustness and adaptability make it promising for practical applications, especially for long-term time-series prediction problems in multi-source heterogeneous data environments.
[0135] Table 3. Zero-sample prediction results 2
[0136]
[0137] Example 3
[0138] Example 3 discloses a device corresponding to the electricity load prediction method based on a hybrid expert architecture in the above examples. It is a virtual device structure as described in the above examples; please refer to [link / reference]. Figure 3 As shown, it includes:
[0139] Module 310 is used to acquire the electrical load time series training dataset and construct a prediction framework composed of multiple electrical load prediction expert models. The electrical load prediction expert models include at least a general electrical load prediction expert model and multiple partially specialized electrical load prediction expert models.
[0140] Training module 320 is used to pre-train the general-purpose electrical load prediction expert model. The electrical load time series training dataset is input into the general-purpose electrical load prediction expert model for training, and the model parameters obtained after training are saved as initialization parameters. Based on the initialization parameters, the partial-specialized electrical load prediction expert model is fine-tuned by combining a low-rank adaptation mechanism. A routing embedding with electrical load pattern discrimination capability is generated through class-aware contrastive learning, and the input electrical load samples are dynamically allocated to the corresponding partial-specialized electrical load prediction expert models according to the routing embedding.
[0141] The prediction module 330 is used to predict the time series of electrical loads using a pre-assigned, partially specialized electrical load prediction expert model.
[0142] Preferably, the time series samples of electrical load are input into the general electrical load prediction expert model in an end-to-end manner for training, so that the general electrical load prediction expert model learns the common dynamic characteristics of electrical load across regions and time scales. After training, the model parameters are used as the initialization parameters.
[0143] Preferably, the initialization parameters include weight parameters and bias parameters of the multi-head self-attention layer, and prediction head weight parameters and bias parameters for electrical load numerical output.
[0144] Preferably, based on the initialization parameters, the partially specialized electrical load prediction expert model is fine-tuned using a low-rank adaptation mechanism, including:
[0145] Using the parameters of a general electrical load forecasting expert model, several partially specialized electrical load forecasting expert models are fine-tuned to satisfy the following conditions: ,in, The parameters representing the general electrical load forecasting expert model, Representing the Parameters for a specialized load forecasting expert;
[0146] During fine-tuning, a low-rank adaptation approach is adopted, updating only a subset of parameter subspaces to maintain general knowledge and enhance adaptability to specific time-series patterns. During fine-tuning, the first... The output calculation of the individual load forecasting expert satisfies: ,in, Representing a time series, Represents a low-rank increment matrix. Indicates the process After fine-tuning, the first The final output obtained by an electricity load forecasting expert;
[0147] The low-rank increment matrix This is achieved through the product of two learnable matrices, and can be represented as: ,in , , for dimensions smaller than the original parameters The inherent rank.
[0148] Preferably, generating a route embedding with electrical load pattern discrimination capability through class-aware contrastive learning includes:
[0149] Extract the latent representation of the input electrical load time series samples;
[0150] Define a contrastive loss function and map the latent representation to a decision embedding using a routing network;
[0151] By using contrastive learning to optimize the representation capability of the decision embedding, a routing embedding with electrical load pattern discrimination capability is obtained to guide sample allocation;
[0152] Wherein, the contrastive loss function The calculation is as follows:
[0153] ,
[0154] in, Let τ be the similarity function, and τ be the temperature hyperparameter. Indicates the anchor point. Indicates a positive sample. This indicates a negative sample.
[0155] Preferably, the electric load time series is predicted using the assigned partial specialized electric load prediction expert models. This includes calculating the probability distribution of the electric load time series sample to be predicted using the Softmax function based on its corresponding routing embedding, assigning the sample to one or more partial specialized electric load prediction expert models for prediction according to the probability distribution, and weighting and summing the multiple prediction results to obtain the final electric load prediction result.
[0156] Preferably, both the general electrical load forecasting expert model and the partially specialized electrical load forecasting expert model are constructed based on the PatchTST time series modeling architecture.
[0157] Example 4
[0158] Figure 4 This is a schematic diagram of the structure of an electronic device provided in Embodiment 4 of the present invention, as shown below. Figure 4 As shown, the electronic device includes a processor 410, a memory 420, an input device 430, and an output device 440; the number of processors 410 in the computer device can be one or more. Figure 4 Taking a processor 410 as an example; the processor 410, memory 420, input device 430, and output device 440 in the electronic device can be connected via a bus or other means. Figure 4 Taking the example of a connection between China and Israel via a bus.
[0159] The memory 420, as a computer-readable storage medium, can be used to store software programs, computer-executable programs, and modules, such as the program instructions / modules corresponding to the power load prediction method based on a hybrid expert architecture in the embodiments of the present invention. The processor 410 executes various functional applications and data processing of the electronic device by running the software programs, instructions, and modules stored in the memory 420, thereby implementing the power load prediction method based on a hybrid expert architecture as described in Embodiments 1 and 2 above.
[0160] The memory 420 may primarily include a program storage area and a data storage area. The program storage area may store the operating system and at least one application program required for a given function; the data storage area may store data created based on terminal usage. Furthermore, the memory 420 may include high-speed random access memory and non-volatile memory, such as at least one disk storage device, flash memory device, or other non-volatile solid-state storage device. In some instances, the memory 420 may further include memory remotely located relative to the processor 410, which can be connected to the electronic device via a network. Examples of such networks include, but are not limited to, the Internet, intranets, local area networks, mobile communication networks, and combinations thereof.
[0161] Input device 430 can be used to receive input user identity information, timing data, etc. Output device 440 may include display devices such as a display screen.
[0162] Example 5
[0163] Embodiment 5 of the present invention also provides a storage medium containing computer-executable instructions, which can be used by a computer to execute an electrical load forecasting method based on a hybrid expert architecture, the method comprising:
[0164] Obtain a training dataset of electrical load time series and construct a prediction framework composed of multiple electrical load prediction expert models. The electrical load prediction expert models include at least a general electrical load prediction expert model and multiple partially specialized electrical load prediction expert models.
[0165] The general electric load prediction expert model is pre-trained by inputting the electric load time series training dataset into the general electric load prediction expert model for training, and the model parameters obtained after training are saved as initialization parameters.
[0166] Based on the initialization parameters, the partial specialization electrical load prediction expert model is fine-tuned using a low-rank adaptation mechanism.
[0167] A route embedding with electrical load pattern discrimination capability is generated through class-aware contrastive learning, and the input electrical load samples are dynamically allocated to the corresponding partial specialization electrical load prediction expert model based on the route embedding.
[0168] The time series of electrical loads are predicted using a pre-assigned, partially specialized electrical load prediction expert model.
[0169] Of course, the computer-executable instructions provided in the embodiments of the present invention are not limited to the method operations described above, but can also perform related operations in the power load prediction method based on a hybrid expert architecture provided in any embodiment of the present invention.
[0170] Based on the above description of the implementation methods, those skilled in the art can clearly understand that the present invention can be implemented using software and necessary general-purpose hardware, and of course, it can also be implemented using hardware, but in many cases the former is a better implementation method. Based on this understanding, the technical solution of the present invention, or the part that contributes to the prior art, can be embodied in the form of a software product. This computer software product can be stored in a computer-readable storage medium, such as a computer floppy disk, read-only memory (ROM), random access memory (RAM), flash memory, hard disk, or optical disk, etc., including several instructions to cause an electronic device (which may be a mobile phone, personal computer, server, or network device, etc.) to execute the methods described in the various embodiments of the present invention.
[0171] It is worth noting that in the embodiments of the above-mentioned load forecasting method and device based on hybrid expert architecture, the various units and modules included are only divided according to functional logic, but are not limited to the above division, as long as the corresponding functions can be achieved; in addition, the specific names of each functional unit are only for easy differentiation and are not used to limit the scope of protection of the present invention.
[0172] For those skilled in the art, various other corresponding changes and modifications can be made based on the technical solutions and concepts described above, and all such changes and modifications should fall within the protection scope of the claims of this invention.
Claims
1. A method for predicting electrical load based on a hybrid expert architecture, characterized in that, Includes the following steps: Obtain a training dataset of electrical load time series and construct a prediction framework composed of multiple electrical load prediction expert models. The electrical load prediction expert models include at least a general electrical load prediction expert model and multiple partially specialized electrical load prediction expert models. The general electric load prediction expert model is pre-trained by inputting the electric load time series training dataset into the general electric load prediction expert model for training, and the model parameters obtained after training are saved as initialization parameters. Based on the initialization parameters, the partial specialized electrical load prediction expert model is fine-tuned using a low-rank adaptation mechanism; the initialization parameters include the weight parameters and bias parameters of the multi-head self-attention layer, as well as the prediction head weight parameters and bias parameters for electrical load numerical output. Based on the initialization parameters, the partially specialized electrical load prediction expert model is fine-tuned using a low-rank adaptation mechanism, including: Using the parameters of a general electrical load forecasting expert model, several partially specialized electrical load forecasting expert models are fine-tuned to satisfy the following conditions: ,in, The parameters representing the general electrical load forecasting expert model, Representing the Parameters for a specialized load forecasting expert; During fine-tuning, a low-rank adaptation approach is adopted, updating only a subset of parameter subspaces to maintain general knowledge and enhance adaptability to specific time-series patterns. During fine-tuning, the first... The output calculation of the individual load forecasting expert satisfies: ,in, Representing a time series, Represents a low-rank increment matrix. Indicates the process After fine-tuning, the first The final output obtained by an electricity load forecasting expert; The low-rank increment matrix This is achieved through the product of two learnable matrices, and can be represented as: ,in , , for dimensions smaller than the original parameters The inherent order; A route embedding with electrical load pattern discrimination capability is generated through class-aware contrastive learning, and the input electrical load samples are dynamically allocated to the corresponding partial specialization electrical load prediction expert model based on the route embedding. The time series of electrical loads are predicted using a pre-assigned, partially specialized electrical load prediction expert model.
2. The load forecasting method based on a hybrid expert architecture as described in claim 1, characterized in that, Electricity load time series samples are input into a general electricity load prediction expert model in an end-to-end manner for training, so that the general electricity load prediction expert model learns the common dynamic characteristics of electricity load across regions and time scales. After training, the model parameters are used as the initialization parameters.
3. The load forecasting method based on a hybrid expert architecture as described in claim 1, characterized in that, Route embeddings with electrical load pattern discrimination capabilities are generated through class-aware contrastive learning, including: Extract the latent representation of the input electrical load time series samples; Define a contrastive loss function and map the latent representation to a decision embedding using a routing network; By using contrastive learning to optimize the representation capability of the decision embedding, a routing embedding with electrical load pattern discrimination capability is obtained to guide sample allocation; Wherein, the contrastive loss function The calculation is as follows: , in, Let τ be the similarity function, and τ be the temperature hyperparameter. Indicates the anchor point. Indicates a positive sample. This indicates a negative sample.
4. The load forecasting method based on a hybrid expert architecture as described in claim 1, characterized in that, The electric load time series is predicted by assigning pre-defined partial specialized electric load prediction expert models. This includes calculating the probability distribution of the electric load time series sample to be predicted using the Softmax function based on its corresponding routing embedding, assigning the sample to one or more partial specialized electric load prediction expert models for prediction according to the probability distribution, and weighting and summing the multiple prediction results to obtain the final electric load prediction result.
5. The load forecasting method based on a hybrid expert architecture as described in claim 1, characterized in that, Both the general power load forecasting expert model and the partially specialized power load forecasting expert model are built based on the PatchTST time series modeling architecture.
6. An electrical load forecasting device based on a hybrid expert architecture, characterized in that, It includes: The module is used to acquire the training dataset of electric load time series and construct a prediction framework composed of multiple electric load prediction expert models. The electric load prediction expert models include at least a general electric load prediction expert model and multiple partially specialized electric load prediction expert models. The training module is used to pre-train the general electric load prediction expert model. The electric load time series training dataset is input into the general electric load prediction expert model for training, and the model parameters obtained after training are saved as initialization parameters. Based on the initialization parameters, the partial specialized load prediction expert model is fine-tuned using a low-rank adaptation mechanism; a routing embedding with load pattern discrimination capability is generated through class-aware contrastive learning, and the input load samples are dynamically allocated to the corresponding partial specialized load prediction expert models according to the routing embedding; the initialization parameters include the weight parameters and bias parameters of the multi-head self-attention layer, and the prediction head weight parameters and bias parameters for load numerical output; Based on the initialization parameters, the partially specialized electrical load prediction expert model is fine-tuned using a low-rank adaptation mechanism, including: Using the parameters of a general electrical load forecasting expert model, several partially specialized electrical load forecasting expert models are fine-tuned to satisfy the following conditions: ,in, The parameters representing the general electrical load forecasting expert model, Representing the Parameters for a specialized load forecasting expert; During fine-tuning, a low-rank adaptation approach is adopted, updating only a subset of parameter subspaces to maintain general knowledge and enhance adaptability to specific time-series patterns. During fine-tuning, the first... The output calculation of the individual load forecasting expert satisfies: ,in, Representing a time series, Represents a low-rank increment matrix. Indicates the process After fine-tuning, the first The final output obtained by an electricity load forecasting expert; The low-rank increment matrix This is achieved through the product of two learnable matrices, and can be represented as: ,in , , for dimensions smaller than the original parameters The inherent order; The prediction module is used to predict the time series of electrical loads using a pre-assigned, partially specialized electrical load prediction expert model.
7. An electronic device comprising a processor, a storage medium, and a computer program, wherein the computer program is stored in the storage medium, characterized in that, When the computer program is executed by the processor, it implements the electrical load prediction method based on a hybrid expert architecture as described in any one of claims 1 to 5.
8. A computer-readable storage medium having a computer program stored thereon, characterized in that, When the computer program is executed by the processor, it implements the electrical load prediction method based on a hybrid expert architecture as described in any one of claims 1 to 5.