A univariate-based power load prediction method, a terminal and a storage medium

By combining multiple moving average decomposition, convolutional neural networks, and extended long short-term memory networks with a time-series-component dual-dimensional attention mechanism, the problem of lack of external features in power load forecasting is solved, achieving high-precision and low-cost power load forecasting.

CN122241336APending Publication Date: 2026-06-19STATE GRID FUJIAN POWER ELECTRIC CO ECONOMIC RESEARCH INSTITUTE +1

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
STATE GRID FUJIAN POWER ELECTRIC CO ECONOMIC RESEARCH INSTITUTE
Filing Date
2026-01-30
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Existing power load forecasting technologies, lacking external feature support, struggle to effectively analyze the multi-scale characteristics of load sequences, resulting in insufficient forecast accuracy and robustness. In particular, the deployment cost of models is high in small and medium-sized power grids and data-constrained scenarios.

Method used

The historical power load sequence is decomposed into multiple subsequences using a multiple moving average decomposition method. Local spatiotemporal features are extracted by a convolutional neural network, and time series modeling is performed using an extended long short-term memory network. Dynamic weighting is then performed using a time series-component dual-dimensional attention mechanism to output the power load prediction results.

Benefits of technology

Without relying on external features, it significantly improves the accuracy and robustness of power load forecasting, reduces data acquisition costs, and is suitable for power load forecasting in data-constrained scenarios.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122241336A_ABST
    Figure CN122241336A_ABST
Patent Text Reader

Abstract

This application provides a univariate-based power load forecasting method, terminal, and storage medium. It utilizes a multiple moving average decomposition method to decompose the historical power load sequence into sub-sequences. Each sub-sequence is reconstructed into a two-dimensional matrix and input into a convolutional neural network to extract local temporal features from the reconstructed sequence. A temporal modeling of the features output by the convolutional neural network is performed based on an extended long short-term memory (LSTM) network. The features output by the LTM network are dynamically weighted based on a temporal-component dual-dimensional attention mechanism. Finally, the weighted prediction results of each load component are fused to output the power load forecast result for the target time, thereby achieving high-precision prediction of short-term power load under univariate conditions.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of power system load forecasting and intelligent analysis technology, and in particular to a single-variable power load forecasting method, terminal and storage medium. Background Technology

[0002] With the continuous expansion of the power system and the ongoing evolution of electricity consumption structure, the volatility, peak-to-valley difference, and uncertainty of power load have intensified. Power load forecasting, as a crucial foundation for power system operation management and dispatch control, directly impacts generation planning, grid dispatch optimization, and supply-demand balance management, playing a vital role in ensuring the safety and economy of the power system.

[0003] Existing power load forecasting technologies typically incorporate multiple exogenous variables, such as meteorological data, date type, and socioeconomic indicators, to improve forecast accuracy. However, in actual power grid operation, this reliance on external data has certain limitations in engineering applications. Constrained by data acquisition conditions, communication stability, and regional differences, high-quality and continuous external data is often difficult to guarantee, especially in small and medium-sized power grids, distribution networks, or newly built areas. The lack, lag, or inconsistent standards of external data are common problems, limiting the practical application scope and engineering effectiveness of multivariate forecasting methods.

[0004] However, under the strict constraint of not introducing external features, univariate forecasting methods based solely on historical load time series, while reducing dependence on multi-source external data, also face greater modeling challenges than multivariate forecasting. Due to the lack of explanatory variables such as meteorological conditions and social activities, load series exhibit stronger non-stationarity and randomness. Existing methods often directly model the original series as a whole, making it difficult to effectively isolate and analyze multi-scale features such as long-term trends, periodic changes, and short-term fluctuations. Furthermore, without the guidance of external features, models struggle to autonomously capture the temporal dependencies of key time points and complex components, easily leading to underfitting or delayed responses to load abrupt changes, resulting in forecast accuracy and robustness failing to meet high standards. Summary of the Invention

[0005] The technical problem to be solved by this invention is to provide a single-variable power load forecasting method, terminal and storage medium that can effectively mine the inherent evolution law, autonomously focus on key information and achieve high-precision forecasting by relying only on a single variable without introducing any external features, so as to solve the problems of difficult power load forecasting and high model deployment cost in data-constrained scenarios.

[0006] To solve the above-mentioned technical problems, the technical solution adopted by the present invention is as follows:

[0007] A univariate-based power load forecasting method includes: Historical power load sequences are collected, and multiple moving average decomposition is performed on the historical power load sequences to obtain multiple subsequences; The subsequences are reconstructed into a two-dimensional matrix, and the local spatiotemporal features of each reconstructed sequence are extracted using a convolutional neural network; The local spatiotemporal features are input into an extended long short-term memory network for temporal modeling, and the output is temporal features containing long-range dependencies. Based on a time-component dual-dimensional attention mechanism, the time-series features are dynamically weighted; The short-term power load forecast is calculated based on the weighted results of each time series feature.

[0008] To solve the above-mentioned technical problems, another technical solution adopted by the present invention is as follows: A univariate-based power load forecasting terminal includes a memory, a processor, and a computer program stored in the memory and executable on the processor. When the processor executes the computer program, it implements the various steps of the univariate-based power load forecasting method described above.

[0009] To solve the above-mentioned technical problems, another technical solution adopted by the present invention is as follows: A computer storage medium storing a computer program that, when executed by a processor, implements the steps of the aforementioned univariate-based power load forecasting method.

[0010] The beneficial effects of this invention are as follows: It utilizes a multiple moving average decomposition method to decompose the historical power load sequence into sub-sequences; reconstructs each sub-sequence into a two-dimensional matrix and inputs it into a convolutional neural network to extract local temporal features from the reconstructed sequence; performs temporal modeling on the features output by the convolutional neural network based on an extended long short-term memory network; dynamically weights the features output by the extended long short-term memory network based on a temporal-component dual-dimensional attention mechanism; and integrates the weighted prediction results of each load component to output the power load prediction result at the target time, thereby achieving high-precision prediction of short-term power load under univariate conditions. Attached Figure Description

[0011] Figure 1 This is a flowchart of a single-variable power load forecasting method according to an embodiment of the present invention; Figure 2 This is a diagram showing the load decomposition results in an embodiment of the present invention; Figure 3 This is a diagram showing the CNN feature extraction results in an embodiment of the present invention; Figure 4This is a comparison chart of the load prediction results of three models, MA-CNN-sLSTM, MA-CNN-mLSTM, and MA-CNN-xLSTM, in an embodiment of the present invention. Figure 5 This is a comparison chart of the final load prediction results of three models based on the TCA mechanism: MA-CNN-sLSTM-TCA, MA-CNN-mLSTM-TCA, and MA-CNN-xLSTM-TCA, in an embodiment of the present invention. Figure 6 This is a schematic diagram of a single-variable power load forecasting terminal according to an embodiment of the present invention; Label Explanation: 1. A univariate-based power load forecasting terminal; 2. Memory; 3. Processor. Detailed Implementation

[0012] To explain in detail the technical content, objectives, and effects of the present invention, the following description is provided in conjunction with the embodiments and accompanying drawings.

[0013] Before detailing the embodiments of this application, some related concepts will first be explained: (1) Univariate load forecasting: refers to a technical method that relies solely on a single historical power load time series data, without introducing external auxiliary variables such as weather, electricity price, and socio-economic activities, to construct a forecasting model to estimate the load value at a specific future time. (2) Extended Long Short-Term Memory Network (xLSTM): This is an improved sequence modeling architecture based on traditional LSTM. This method overcomes the limitations of traditional LSTM, such as limited memory capacity, inability to dynamically modify memory, and difficulty in parallel computation, by fusing two types of sub-components: sLSTM and mLSTM, and introducing an exponential gating mechanism and matrix-based memory components. (3) Time-component dual-dimensional attention mechanism (TCA): This is an attention mechanism designed specifically for processing multi-component time series after decomposition. This mechanism dynamically evaluates and weights the input features from two complementary dimensions: in the time dimension, it focuses on the importance of different historical time steps to the current prediction, so as to strengthen the influence of key time nodes; in the component dimension, it evaluates the relative contribution between the sequence components at the same time.

[0014] In existing technologies, with the increasing demands for forecast accuracy and timeliness in power system operation, short-term load forecasting has become a key technology supporting real-time grid dispatch and renewable energy consumption. Although introducing multiple exogenous variables can theoretically assist modeling, in practical engineering applications, the difficulty or unreliability of meteorological and socioeconomic data collection makes univariate forecasting methods, which rely solely on historical load sequences and do not require external auxiliary information, widely popular due to their greater engineering applicability and low deployment cost. However, existing univariate forecasting models still face significant limitations under the strict absence of external feature support: on the one hand, due to the lack of external explanatory variables, load data exhibits stronger non-stationarity and multi-scale aliasing characteristics, making it difficult for traditional data-driven models to effectively extract and analyze their deep time-series patterns, resulting in insufficient ability to capture sudden load changes and long-term trends; on the other hand, without the constraint of external feature guidance, mainstream forecasting architectures struggle to autonomously identify key time nodes in the sequence and lack an adaptive dynamic weighting mechanism for input features, making it impossible for the model to accurately focus on high-value information in complex fluctuations. Furthermore, existing research often focuses on improving single models, lacking sufficient collaborative optimization in feature extraction, time-series modeling, and information filtering. This makes it difficult to fully tap the predictive potential with limited data sources, thus limiting the model's generalization ability and practical application effectiveness in scenarios lacking external information support. Therefore, constructing a predictive architecture capable of deeply deconstructing the inherent multi-scale laws of power load and achieving autonomous focusing of key features without external guidance is crucial to overcoming the bottleneck of univariate power load prediction.

[0015] To at least solve the above problems, please refer to Figure 1 This invention provides a univariate-based power load forecasting method, comprising: Historical power load sequences are collected, and multiple moving average decomposition is performed on the historical power load sequences to obtain multiple subsequences; The subsequences are reconstructed into a two-dimensional matrix, and the local spatiotemporal features of each reconstructed sequence are extracted using a convolutional neural network; The local spatiotemporal features are input into an extended long short-term memory network for temporal modeling, and the output is temporal features containing long-range dependencies. Based on a time-component dual-dimensional attention mechanism, the time-series features are dynamically weighted; The short-term power load forecast is calculated based on the weighted results of each time series feature.

[0016] As described above, the beneficial effects of this invention are as follows: This invention focuses on solving the engineering pain points of difficulty in acquiring high-quality external data, delayed updates, and high costs in actual power grids. It constructs a prediction system that strictly relies on a single variable of historical power load, fundamentally eliminating dependence on meteorological and socio-economic indicators, and significantly reducing the data collection cost and implementation threshold of the model. Addressing the technical bottlenecks of single-variable application scenarios with limited information sources and difficult feature mining, this invention first uses a multiple moving average decomposition strategy to decouple the strongly non-stationary original load into clear multi-scale components, effectively solving the problem of difficulty in separating trend and fluctuation terms from a single data source without external feature assistance. Based on this, it utilizes two-dimensional matrix reconstruction and convolutional neural networks to fully capture the spatiotemporal correlation patterns such as intraday periodicity and local fluctuations contained in the load sequence. Furthermore, it adopts an extended long short-term memory network to replace the traditional architecture, breaking through the memory capacity bottleneck under limited single-variable samples and significantly enhancing the model's ability to characterize long-term trends and abrupt changes. Simultaneously, it combines a time-series-component dual-dimensional attention mechanism to adaptively focus on key historical moments and important load components, achieving dynamic optimization of feature weights. Finally, it outputs the prediction results through the weighted superposition of multiple components. This method significantly enhances the model's ability to capture complex load patterns and its generalization performance in data-constrained scenarios, providing a low-cost, highly robust, and reliable prediction scheme for power grid dispatch that is independent of external data.

[0017] Furthermore, historical power load sequences are collected, including: Collect historical power load sequences of the target power system =[P t-1, P t-2 ,..,P t-NL In the formula, P t-i Indicates the target power system at time t The historical load value of i, where NL represents the preset historical time step length.

[0018] As described above, the first step is to extract a load sequence of length NL from the historical database of the target power system. This sequence provides a complete historical load context for subsequent models, enabling them to capture load changes. By setting an appropriate NL, sufficient time dependence can be preserved without introducing too much irrelevant information; at the same time, the adjustability of the historical length allows the model to flexibly adapt to different load characteristics.

[0019] Furthermore, the historical power load sequence is decomposed into multiple moving averages, which includes the following steps: Missing values ​​are filled and outliers are replaced in the historical power load sequence.

[0020] As described above, before performing multiple moving average decomposition on the historical load series, the original data is first preprocessed: missing values ​​are filled using interpolation or forward / backward imputation techniques; outliers are replaced. This step ensures that the subsequent decomposition is not affected by noise and missing values, improving the accuracy of trend and periodic components.

[0021] Furthermore, the historical power load sequence is decomposed into multiple moving averages, including: The preprocessed load sequence X is decomposed into an overall trend sequence using the deep moving average function DeepAvg. Cyclical trend sequence and perturbation fluctuation sequence ; The overall trend sequence The formula for calculation is: ; The periodic trend sequence The formula for calculation is: ; The disturbance fluctuation sequence The formula for calculation is: ; The DeepAvg depth moving average function is specifically calculated as follows: for any time point t in the input sequence, the corresponding depth moving average is defined as:

[0022] In the formula, This represents the set of valid time indices associated with time point t; Represents the set of indexes The number of elements in the middle; Indicates the index of the input sequence. The value at that location.

[0023] As described above, the gradual decomposition of a univariate power load series is achieved through deep moving average (DeepAvg function). First, a smoothing window is used to average the data point by point to obtain the overall trend. Then the residuals Perform a deep moving average again to obtain the periodic trend. The remaining portion is ultimately defined as a disturbance fluctuation. Through this decomposition, the original non-stationary load sequence is broken down into three subsequences that tend to be stationary: the long-term trend, the intraday / weekly cyclical trend, and the random disturbance. After stabilization, subsequent models can converge faster and reduce overfitting. In practical scenarios such as power dispatching, demand response, and peak-valley forecasting, the componentized load characteristics help improve forecast accuracy, reduce operating costs, and provide a more reliable basis for rolling dispatching decisions.

[0024] Further, the subsequence is reconstructed into a two-dimensional matrix, including: A two-dimensional matrix is ​​constructed using historical data for a consecutive preset number of days prior to the prediction time. The preset number of days is used as the number of rows. Each row in the two-dimensional matrix is ​​mapped to a sampling point with a preset time granularity within a day, forming an s×N input structure, where s represents the preset number of days and N represents the number of sampling points per day.

[0025] As described above, each subsequence is reconstructed into an s×N two-dimensional matrix according to a time window: the number of rows s represents the past few days, and the number of columns N represents the intraday sampling points. This structure maps intraday cycles and cross-day trends to a two-dimensional space, making it easier for convolutional networks to extract local spatiotemporal features. It can quickly capture intraday fluctuations and cross-day patterns, reducing the difficulty of model training and improving prediction accuracy.

[0026] Furthermore, local spatiotemporal features of each reconstructed sequence are extracted using a convolutional neural network, including: The convolutional neural network adopts a two-layer architecture, with each layer containing a preset number of fixed-size convolutional kernels, which slide and scan with a first fixed stride. The reconstructed sequence is input into the convolutional neural network. Each convolutional operation in the convolutional neural network is followed by a second max-pooling layer with a fixed stride to flatten the reconstructed sequence into a one-dimensional vector, thereby obtaining the local spatiotemporal features of the reconstructed sequence output by the convolutional neural network.

[0027] As described above, a two-layer convolutional neural network (CNN) is used to extract local spatiotemporal features from a two-dimensional matrix s×N. Each layer contains several fixed-size convolutional kernels (e.g., 3×3), and the convolution operation uses a preset stride of 1 for sliding scanning, which can capture the relationships between adjacent sampling points within the day-to-day and intraday time windows. Immediately following the convolution is a second fixed-stride max-pooling layer, which further compresses the feature map, suppresses local noise, and retains the most significant activation values, thereby achieving spatial downsampling. Through the combination of two convolutional-pooling layers, the network can first extract low-level local patterns (such as short-term fluctuations and intraday peaks and troughs), and then aggregate them into higher-level spatiotemporal features. Finally, the feature map is flattened into a one-dimensional vector, forming the input for subsequent XLSTM processing. This mechanism significantly improves the model's ability to identify intraday cycles and cross-day trends in scenarios such as power load forecasting and demand response scheduling, reduces training difficulty, and improves prediction accuracy and robustness.

[0028] Furthermore, the local spatiotemporal features are input into an extended long short-term memory network for temporal modeling, outputting temporal features containing long-range dependencies, including: The extended long short-term memory network is based on an extended long short-term memory fusion block, which integrates stable long short-term memory components and matrix long short-term memory components to form a stacked architecture. The local spatiotemporal features are mixed with the hidden state by the stable long short-term memory component to generate candidate memory. The memory component is updated by gating control, and the first hidden state is calculated based on the updated memory component. The local spatiotemporal features are multiplied by the preset key vector using the matrix long short-term memory component. The matrix memory component is obtained by combining the result of the multiplication calculation and the forget gate calculation. The matrix memory component is then normalized to obtain the second hidden state. The extended long short-term memory fusion block stacks the first hidden states of all time steps in chronological order to obtain a first feature matrix, and stacks the second hidden states of all time steps in chronological order to obtain a second feature matrix. The first feature matrix and the second feature matrix are then integrated by feature splicing projection, residual connection and layer normalization to output temporal features containing long-range dependencies.

[0029] As described above, this method employs an Extended Long Short-Term Memory (XLSTM) network, with its core being an XLSTM fusion block. At each time step, the local spatiotemporal features extracted by the CNN undergo dual-component memory processing. First, the Stable Long Short-Term Memory (sLSTM) mixes the input features with the previous hidden state to generate candidate memories. Memory updates are controlled through input, forget, and output gates to obtain the first hidden state. Subsequently, the Matrix Long Short-Term Memory (mLSTM) performs an outer product of the features with a preset key vector to obtain matrix memories, which are then normalized using a forget gate to obtain the second hidden state. The first and second hidden states at all time steps form two-row feature matrices, which are then integrated through feature concatenation projection, residual connections, and layer normalization to output temporal features containing long-range dependencies. This dual-component architecture retains the robust memory updates of traditional LSTMs while utilizing matrix memories to capture cross-dimensional correlations, significantly improving the modeling ability for cross-day cycles, seasonality, and sudden events. Applied to power load forecasting, it can more accurately capture long-term trends and short-term fluctuations, thereby improving the accuracy of peak-valley forecasting and the reliability of scheduling decisions.

[0030] Furthermore, based on a time-series-component dual-dimensional attention mechanism, the time-series features are dynamically weighted, including: The time series features Reconstruction is performed to obtain reconstructed features. T is the number of time steps, C is the number of components, and D is the dimension of the hidden layer; Calculate the first attention score of the reconstructed feature in the temporal dimension; Calculate the second attention score of the reconstructed feature in the component dimension; Based on the first attention score and the second attention score, the feature vector of the c-th component at time step t Weighting is applied.

[0031] As described above, the temporal-component dual-dimensional attention mechanism (TCA) dynamically evaluates and weights input features from two complementary dimensions: in the temporal dimension, it evaluates the importance of different historical time steps by calculating attention scores and focuses on key time nodes; in the component dimension, it evaluates the relative contribution between sequence components at the same time by calculating component attention weights.

[0032] Another embodiment of the present invention provides a univariate-based power load forecasting terminal, including a memory, a processor, and a computer program stored in the memory and executable on the processor. When the processor executes the computer program, it implements the various steps of the univariate-based power load forecasting method described above.

[0033] Another embodiment of the present invention provides a computer storage medium storing a computer program thereon, which, when executed by a processor, implements the various steps of the above-described univariate-based power load forecasting method.

[0034] The present invention provides a univariate-based power load forecasting method, terminal, and storage medium. This method is applicable to situations where, without introducing any external features, it can effectively uncover inherent evolutionary patterns, autonomously focus on key information, and achieve high-precision forecasting relying solely on a single variable. This addresses the problems of difficult power load forecasting and high model deployment costs in data-constrained scenarios. The following detailed implementation methods illustrate this approach: Please refer to Figure 1 One embodiment of the present invention is as follows: A univariate-based power load forecasting method includes the following steps: S1. Collect historical power load sequences and perform multiple moving average decomposition on the historical power load sequences to obtain multiple subsequences.

[0035] S11. Collect historical power load sequences of the target power system =[P t-1, P t-2 ,..,P t-NL In the formula, P t-i Indicates the target power system at time t The historical load value of i, where NL represents the preset historical time step length, and the historical power load sequence is used to characterize the state information of the load evolution over time.

[0036] Specifically, in this embodiment, the prediction timescale is set to short-term power load forecasting, and the modeling and prediction data are selected as the annual historical load data of a prefecture-level city's power system, with a data time resolution of 15 minutes. =15 minutes. Using a continuous year as the research period, the annual load data is divided into multiple consecutive time steps according to time sequence. Each time step corresponds to a 15-minute load sampling point, denoted as t=1,2,…,T. At any prediction time t, historical load data from the preceding NL time steps are selected to form a unified load state vector. =[P t-1, P t-2 ,..,P t-NL ].

[0037] S12. Fill in missing values ​​and replace outliers in the historical power load sequence.

[0038] In this embodiment, missing value filling is performed by using linear interpolation to fill consecutive missing data, and missing values ​​at the beginning and end are filled by forward and backward filling to ensure the continuity of the time series. Outlier replacement: based on In principle, calculate the mean of the load data. and standard deviation It will exceed Values ​​within the range are identified as outliers and replaced with the mean of the two normal data points before and after them to avoid interference from outliers in the decomposition results.

[0039] In this embodiment, the historical power load sequence is further divided: it is divided into training set, validation set and test set in a time sequence ratio of 8:1:1 to avoid data leakage; based on the maximum and minimum values ​​of the training set, each component is normalized by Max-Min to ensure data dimension uniformity.

[0040] S13. Using the deep moving average function DeepAvg, the preprocessed load sequence X is decomposed into an overall trend sequence. Cyclical trend sequence and perturbation fluctuation sequence .

[0041] Specifically, the DeepAvg deep moving average function is used to decompose the preprocessed load sequence: The overall trend sequence The formula for calculation is: ; The periodic trend sequence The formula for calculation is: ; The disturbance fluctuation sequence The formula for calculation is: .

[0042] The specific calculation method of the DeepAvg depth moving average function is as follows: For any time point t to be calculated in the input sequence, the corresponding depth moving average is defined as:

[0043] In the formula, This represents the set of valid time indices associated with time point t; Represents the set of indexes The number of elements in the middle; Indicates the index of the input sequence. The value at that location.

[0044] The set The construction rules include: (1) Constructing a same-day time neighborhood index: within the date to which time point t belongs, select the indexes corresponding to multiple adjacent time nodes within a preset time window range before and after it; (2) Constructing a similar-day time neighborhood index: select several similar dates in history that have similar periodic attributes to the date to which time point t belongs, and in each similar date: first determine the anchor point corresponding to time point t in terms of time, and then select the indexes corresponding to adjacent time nodes within the same time window range before and after the anchor point; (3) Set merging and deduplication: merge the above two types of indexes and deduplicate them to form the final index set. .

[0045] Please refer to Figure 2 This figure illustrates the sequences obtained from multiple moving average (MA) decomposition. After MA decomposition, the original electricity load time series yields the overall trend series (i.e., Figure 2 Trend items ), cyclical trend sequence (i.e. Figure 2 Cyclical trend item ) and perturbation wave sequences (i.e. Figure 2 Periodic terms ).in, It reflects the overall trend of load changes over a longer time scale; It describes the medium-term variation characteristics of the load with stable periodicity; This decomposition characterizes the high-frequency fluctuations and random disturbances in the load sequence. The above decomposition effectively reduces the complexity of the original load sequence, providing a foundation for subsequent component modeling and combined prediction.

[0046] S2. Reconstruct the subsequences into a two-dimensional matrix, and extract the local spatiotemporal features of each reconstructed sequence using a convolutional neural network.

[0047] S21. Construct a two-dimensional matrix using historical data for a consecutive preset number of days before the prediction time. With the preset number of days as the number of rows, map each row of the two-dimensional matrix to a sampling point with a preset time granularity within a day, forming an s×N input structure, where s represents the preset number of days and N represents the number of sampling points per day.

[0048] In this embodiment, s represents 3 days. The two-dimensional matrix reconstruction is based on historical data from the three consecutive days prior to the prediction time. The one-dimensional load sequence is truncated and stacked by day to form a 3-row × N-column two-dimensional input structure, where N is the number of sampling points per day. In this embodiment, N=96 at a 15-minute power load granularity. Selecting three consecutive days as the reconstruction time span is based on the optimal balance between the power load time series pattern and model adaptability: this duration constitutes the smallest complete verification unit for load trend changes, which is sufficient to cover the typical transition period between weekdays and rest days or the short-term inertial trend of consecutive workdays, ensuring that the model can capture intraday fluctuations, cross-day continuity, and short-term trend changes. At the same time, this duration avoids the non-stationarity interference and trend feature dilution caused by introducing too much outdated historical data, significantly reducing the input dimension and overfitting risk of the model while ensuring the timeliness of features.

[0049] Building upon this foundation, the constructed two-dimensional matrix structure transforms univariate time-series prediction into an image processing-like problem. This structure is precisely suited to the characteristics of convolutional neural networks (CNNs). Through explicit row and column design, the column dimension of the matrix strictly corresponds to fixed moments within the day to focus on intraday cyclical patterns, while the row dimension corresponds to the number of consecutive days prior to prediction to focus on cross-day continuation patterns. This structured arrangement achieves feature alignment for the same moment on different dates. Utilizing local perception mechanisms and weight-sharing characteristics, the CNN, through the sliding scan of the convolutional kernel across the matrix, can simultaneously capture both the horizontal intraday load evolution and the vertical cross-day load stability, thereby efficiently extracting composite local features across days and time periods. Furthermore, this architecture significantly reduces the number of parameters compared to fully connected layers, and, combined with pooling operations, effectively isolates the spread of single-point noise, structurally ensuring the model's prediction reliability and environmental adaptability even in the absence of external feature support.

[0050] S22. The convolutional neural network adopts a two-layer architecture, with each layer containing a preset number of fixed-size convolutional kernels. It slides and scans with a first fixed stride and introduces nonlinearity through the ReLU activation function. The reconstructed sequence is input into the convolutional neural network, and each convolutional operation in the convolutional neural network is followed by a second fixed-stride max pooling layer to flatten the reconstructed sequence into a one-dimensional vector, thereby obtaining the local spatiotemporal features of the reconstructed sequence output by the convolutional neural network.

[0051] In this embodiment, each layer of the convolutional neural network contains eight 3×3 convolutional kernels, which slide and scan with a stride of 1, and introduce nonlinearity through the ReLU activation function; each convolutional operation is followed by a max pooling layer with a stride of 2, and finally the extracted high-dimensional features are flattened into a one-dimensional vector.

[0052] Specifically, the first convolutional layer: the convolutional kernel slides across a 3×96 matrix, covering local blocks of 3 days × 3 sampling points: horizontally (second dimension) it captures short-term fluctuations at the 15-minute level within a day; vertically (first dimension) it captures the continuation of trends across days. ReLU activation function: introduces non-linearity, allowing the network to learn non-linear combinations. Pooling layer: reduces the 3×96 to 2×48, retaining the maximum activation value while reducing the number of parameters and noise propagation. Second convolutional layer: again slides a 3×3 convolutional kernel, further aggregating a broader local context (3 days × 3 sampling points). Second pooling: reduces the 2×48 to 1×24, where each channel represents different features across days. This results in a 1×24×8 dimensional feature vector, which can then be directly fed into an extended LSTM or fully connected layer for temporal modeling.

[0053] In this way, local patterns in both the horizontal (intraday short-term fluctuations) and vertical (cross-day trends) dimensions can be adaptively extracted from a two-dimensional matrix, ultimately yielding a 1×24×8 vector that represents local spatiotemporal features. This vector retains both time and periodic information and possesses abstract features that can be used for further modeling using subsequent LSTM or fully connected layers.

[0054] Please refer to Figure 3 This figure shows a heatmap of the intermediate process of CNN feature extraction and the final feature activation results. After performing MA decomposition on the load data, the historical load data of three consecutive days is reconstructed into a two-dimensional feature matrix (e.g., Figure 3 (As shown in the heatmap on the left), where the horizontal axis represents 96 time steps within a day and the vertical axis represents the number of consecutive historical days, this matrix preserves the cyclical structure of the load within a day in the time dimension, and intuitively presents the numerical inertia and dynamic gradient changes of the load value at the same moment over consecutive days in the cross-day dimension. The core function of this structure is that, through the vertical convolution operation of CNN, it can capture the short-term guiding pattern of historical data on the prediction day, thereby making up for the problem of missing cross-day correlation information under univariate input. Figure 3 The feature extraction results on the right show that the feature dimensions exhibit significant differential activation, indicating that the CNN successfully identified and focused on the key patterns of the cross-day trend gradient and intraday local fluctuations from the two-dimensional matrix, verifying the effectiveness of its feature extraction.

[0055] S3. Input the local spatiotemporal features into an extended long short-term memory network for time series modeling, and output time series features containing long-range dependencies.

[0056] The extended long short-term memory network is based on the extended long short-term memory fusion block (XLSTMBlock), which forms a stacked architecture by fusing stable long short-term memory (sLSTM) components and matrix long short-term memory (mLSTM) components. The fusion of the sLSTM and mLSTM modules is a complementary design addressing the coexistence of short-term fluctuations and long-term cycles in univariate load data: sLSTM utilizes exponential gating and scalar memory to focus on capturing short-term high-frequency fluctuations and instantaneous changes, while filtering random noise through a stabilization mechanism; mLSTM utilizes matrix-based memory units, increasing storage capacity from linear to quadratic, while significantly improving computational efficiency through the parallelism of matrix operations. Compared to traditional LSTM, this architecture overcomes the capacity bottleneck of scalar memory; compared to Transformer, while maintaining long-term modeling capabilities, this architecture reduces computational complexity from quadratic to linear, avoiding the risk of overfitting under limited univariate data, and is more suitable for the real-time and lightweight requirements of power terminals.

[0057] S31. The local spatiotemporal features and hidden states are mixed by the stable long short-term memory components to generate candidate memories. The memory components are updated by gating control. The first hidden state is calculated based on the updated memory components. The first hidden states of all time steps are stacked in chronological order to obtain the first feature matrix.

[0058] Specifically, the sLSTM components include candidate memory generation, gating control, and stable state update components, with the state variable being the hidden state. With stable state Its key calculations include: Candidate memory states: ; In the formula, Let t be the input vector at step t. Let this be the hidden state at step t. These are the weight matrix and the bias vector; Input gate (exponential activation): ; Forgot Gate (sigmoid activation): ; Output gate (sigmoid activated): ; In the formula, It is the sigmoid activation function; Memory component update: ;

[0059] In the formula, This is the scalar memory component of sLSTM.

[0060] First hidden state: .

[0061] In the formula, Indicates that sLSTM is in the first... The output value of the time step output gate is a vector with the same dimension as the number of hidden units in the sLSTM. Its core function is to control the proportion of the memory unit information of the current time step output to the hidden state.

[0062] S32. The local spatiotemporal features are multiplied by the preset key vector using the matrix long short-term memory component. The matrix memory component is obtained by combining the result of the multiplication calculation and the forget gate calculation. The matrix memory component is normalized to obtain the second hidden state. The second hidden states of all time steps are stacked in chronological order to obtain the second feature matrix.

[0063] Specifically, the mLSTM components replace traditional scalar memory with matrix-based memory components, and the state variables are matrix memory and normalized state. Key calculations include: Gating mechanism: , ; Matrix memory components: ; In the formula, and They are value vector and key vector, respectively. Indicates the outer product operation; Normalized state: ; In the formula, It is an L2 norm; Second hidden state:

[0064] Where tr represents the trace operation of the matrix.

[0065] S33. The extended long short-term memory fusion block stacks the first hidden states of all time steps in chronological order to obtain a first feature matrix, and stacks the second hidden states of all time steps in chronological order to obtain a second feature matrix. The first feature matrix and the second feature matrix are integrated by feature splicing projection, residual connection and layer normalization to output temporal features containing long-range dependencies.

[0066] Specifically, the XLSTM fusion block integrates dual-component features through temporal parallel processing, feature stitching projection, residual connections, and layer normalization, specifically including: Temporal parallel processing: The first hidden state of all time steps The first feature matrix is ​​obtained by stacking them in chronological order. ; Hide the state of all time steps The second feature matrix is ​​obtained by stacking them in chronological order. .in, This represents the number of hidden units in the SLSTM. The matrix dimension of the MLSTM; Feature stitching and projection:

[0067]

[0068] In the formula, Let be the projection weight matrix. For projection bias; Input projection and residual connection:

[0069] Layer normalization: ; In the formula, The output temporal features of XLSTMBlock can be used as input to the next layer of XLSTMBlock or as feature input for the final prediction task.

[0070] Please refer to Figure 4 This figure compares the load prediction results of three model architectures—MA-CNN-sLSTM, MA-CNN-mLSTM, and MA-CNN-xLSTM—on the same test set. In the figure, the solid black line represents the actual load curve, while the three dashed lines of different colors correspond to the predicted outputs of the MA-CNN-sLSTM, MA-CNN-mLSTM, and MA-CNN-xLSTM models, respectively. Overall, the MA-CNN-xLSTM prediction curve shows the highest degree of fit with the actual load curve, accurately tracking the peaks and troughs and fluctuations in load. The MA-CNN-sLSTM prediction is second best, showing stability for most periods, but exhibiting some lag or bias at points of significant fluctuation or turning points. In contrast, the MA-CNN-mLSTM prediction curve deviates most significantly from the actual value, with its predicted values ​​significantly higher or lower than the actual load in multiple periods, demonstrating large prediction errors and instability.

[0071] Please refer to Table 1, which shows the load prediction performance metrics for three models: MA-CNN-sLSTM, MA-CNN-mLSTM, and MA-CNN-xLSTM.

[0072] Table 1. Load forecasting performance indicators for the three models

[0073] The MA-CNN-xLSTM model achieved the best performance across all three evaluation metrics: MAE, RMSE, and MAPE, with predictions of 172.28MW, 221.05MW, and 2.63%, respectively, significantly outperforming the MA-CNN-sLSTM and MA-CNN-mLSTM models. In contrast, the MA-CNN-mLSTM model exhibited a significantly larger prediction error, indicating its limitations in modeling highly volatile load sequences. Overall, the MA-CNN-xLSTM model, incorporating an extended LSTM, demonstrates significant advantages in both prediction accuracy and stability, making it more suitable for forecasting complex power load time series.

[0074] S4. Based on the time-component dual-dimensional attention mechanism, the time-series features are dynamically weighted.

[0075] The Temporal-Component Dual-Dimensional Attention Mechanism (TCA) dynamically evaluates and weights input features from two complementary dimensions: in the temporal dimension, it evaluates the importance of different historical time steps by calculating attention scores and focuses on key time nodes; in the component dimension, it evaluates the relative contribution between sequence components at the same time by calculating component attention weights.

[0076] Specifically, the TCA uses a three-dimensional tensor The input is given by T, where T is the number of time steps, C is the number of components, and D is the hidden layer dimension. The output consists of a fused feature O and component attention weights, weighted by both temporal and component attention. ; S41. The time-series attention sub-components independently calculate time-step weights for each component. Key steps include: Input rearrangement: Reconstructed Extract the time step features of the c-th component. .

[0077] First attention score calculation: In the formula, This represents the hidden feature vector of the c-th component at time step t. This is the temporal attention weight matrix; Weight normalization:

[0078] Feature weighting: , forming a tensor ; S42. The component attention sub-components are used to calculate component weights for the time-weighted features. Key steps include: Attention score calculation: ; In the formula, This is the component attention weight matrix; Weight normalization: ; Feature fusion: ; The TCA introduces an attention entropy constraint to calculate the entropy of the component attention weights:

[0079] A small coefficient is added to the total loss as a regularization loss.

[0080] S5. Short-term power load forecast results are calculated based on the weighted results of each time series feature.

[0081] The final predicted value of power load at the target time is obtained by superimposing the prediction results of each weighted component, specifically including S51 to S52.

[0082] S51. The prediction results of the three components are weighted and summed to obtain the load prediction value at the target time.

[0083] Component attention weights obtained based on the TCA mechanism The prediction results for each component are weighted:

[0084] In the formula, Let c be the predicted value of the c-th component (trend term, periodic trend term, periodic term). This is the attention weight for that component.

[0085] S52. The mean absolute error, root mean square error, and mean absolute percentage error are used as evaluation indicators for the prediction results.

[0086] Please refer to Figure 5 This figure shows a comparison of load prediction results for three model architectures—MA-CNN-sLSTM-TCA, MA-CNN-mLSTM-TCA, and MA-CNN-xLSTM-TCA—on the same test set after introducing the TCA mechanism. After introducing the TCA mechanism, the fit between the three prediction curves and the actual load curves is significantly improved. (Compared to the model without attention...) Figure 4In comparison, the fluctuation range of the prediction results of each model was significantly reduced, especially during peak and trough periods of rapid load changes, with the prediction curves tracking the true values ​​more closely and promptly. Among them, the prediction trajectory of MA-CNN-xLSTM-TCA was the smoothest and almost coincided with the true curve, demonstrating the best tracking ability. It is worth noting that the MA-CNN-mLSTM model, which originally performed poorly, showed a significant improvement in prediction performance after integrating TCA, with the curve deviation converging considerably.

[0087] Please refer to Table 2, which shows the final performance comparison of the three models MA-CNN-sLSTM-TCA, MA-CNN-mLSTM-TCA, and MA-CNN-xLSTM-TCA after introducing the TCA mechanism.

[0088] Table 2 Comparison of the final load forecasting performance of the three models

[0089] The quantitative data in Table 2 strongly support the effectiveness of the TCA mechanism in processing complex temporal features: after introducing TCA, the evaluation metrics of all models were significantly optimized. Among them, MA-CNN-xLSTM-TCA achieved the best performance, with the lowest MAE (125.63MW), RMSE (181.18MW), and MAPE (1.91%). The performance of MA-CNN-sLSTM-TCA and MA-CNN-mLSTM-TCA was very close, and both were far superior to their corresponding versions without attention. In particular, after introducing TCA, the MAPE of the MA-CNN-mLSTM model dropped significantly from 14.45% to 2.26%. This significant performance optimization is not an independent contribution from a single dimension, but rather stems from the synergistic modeling effect of the temporal dimension and the component dimension. Specifically, in the temporal dimension, since changes in power load are often strongly driven by specific historical moments such as the peak value at the same time yesterday or sudden anomalies, this mechanism can accurately capture and highlight these sparse key time steps from long sequences, effectively solving the information dilution problem in long-range dependencies. Meanwhile, in terms of component dimensions, given the significant differences in the contributions of trend, periodic, and fluctuation components under different prediction scenarios—for example, emphasizing the periodic component during stable periods and the fluctuation component during abrupt changes—this mechanism achieves adaptive dynamic weighting of the contributions of each component. Through this joint characterization of temporal focus and feature reorganization, the model can simultaneously capture the temporal evolution of the load and its multi-scale structural features, thereby achieving a substantial improvement in prediction accuracy.

[0090] In summary, this invention provides a univariate-based power load forecasting method. For univariate power load forecasting, firstly, a multiple moving average decomposition is performed on the historical load sequence, breaking down the original sequence into several smooth subsequences to capture trend and periodic components at different frequencies. Then, each subsequence is truncated daily and stacked into a two-dimensional matrix. A convolutional neural network combining convolutional kernels and pooling is used to extract local spatiotemporal features bidirectionally along both time (rows) and intraday cycles (columns), significantly reducing noise and compressing dimensionality. These features are then fed into an extended long short-term memory network (XLSTM), which integrates stable long short-term memory and matrix long short-term memory. This allows for rapid response to short-term abrupt changes and expands capacity through matrix memory to capture long-range dependencies across days, overcoming the memory bottleneck of traditional LSTM. Based on this, temporal-component dual-dimensional attention (TCA) is introduced. First, attention weights are calculated individually for each component of the temporal features along the time axis. Then, component attention is applied to the weighted results of different components, dynamically focusing on key historical nodes and important components, and preventing over-concentration through entropy constraints. Finally, the weighted features are linearly combined to obtain the final short-term load forecast value. This process improves upon traditional LSTM / Transformer in terms of mean absolute error, root mean square error, and mean absolute percentage error on load data with a sampling granularity of 15 minutes, demonstrating superior real-time performance and lightweight characteristics, making it suitable for scenarios such as power grid dispatching, demand response, and load management.

[0091] According to another aspect of the invention, Figure 6 This is a schematic diagram of a univariate-based power load forecasting calculation terminal according to an embodiment of the present invention. The electronic device includes a memory, a processor, and a computer program stored in the memory and executable on the processor. When the processor executes the computer program, it implements the various steps of the univariate-based power load forecasting method described above.

[0092] According to another aspect of the present invention, a computer storage medium is provided having a computer program stored thereon, which, when executed by a processor, implements the steps of a univariate-based power load forecasting method as described above.

[0093] The above description is merely an embodiment of the present invention and does not limit the patent scope of the present invention. Any equivalent modifications made based on the content of the present invention specification and drawings, or direct or indirect applications in related technical fields, are similarly included within the patent protection scope of the present invention.

Claims

1. A univariate-based power load forecasting method, characterized in that, include: Historical power load sequences are collected, and multiple moving average decomposition is performed on the historical power load sequences to obtain multiple subsequences; The subsequences are reconstructed into a two-dimensional matrix, and the local spatiotemporal features of each reconstructed sequence are extracted using a convolutional neural network; The local spatiotemporal features are input into an extended long short-term memory network for temporal modeling, and the output is temporal features containing long-range dependencies. Based on a time-component dual-dimensional attention mechanism, the time-series features are dynamically weighted; The short-term power load forecast is calculated based on the weighted results of each time series feature.

2. The univariate-based power load forecasting method according to claim 1, characterized in that, Collect historical power load sequences, including: Collect historical power load sequences of the target power system =[P t-1, P t-2 ,..,P t-NL In the formula, P t-i Indicates the target power system at time t The historical load value of i, where NL represents the preset historical time step length.

3. The univariate-based power load forecasting method according to claim 1, characterized in that, The historical power load sequence is decomposed into multiple moving averages, prior to which the following steps are performed: Missing values ​​are filled and outliers are replaced in the historical power load sequence.

4. The univariate-based power load forecasting method according to claim 3, characterized in that, The historical power load sequence is decomposed into multiple moving averages, including: The preprocessed load sequence X is decomposed into an overall trend sequence using the deep moving average function DeepAvg. Cyclical trend sequence and perturbation fluctuation sequence ; The overall trend sequence The formula for calculation is: ; The periodic trend sequence The formula for calculation is: ; The disturbance fluctuation sequence The formula for calculation is: ; The DeepAvg depth moving average function is specifically calculated as follows: for any time point t in the input sequence, the corresponding depth moving average is defined as: In the formula, This represents the set of valid time indices associated with time point t. Represents the set of indexes The number of elements in the middle; Indicates the index of the input sequence. The value at that location.

5. The univariate-based power load forecasting method according to claim 1, characterized in that, Reconstructing the subsequence into a two-dimensional matrix includes: A two-dimensional matrix is ​​constructed using historical data for a consecutive preset number of days prior to the prediction time. The preset number of days is used as the number of rows. Each row in the two-dimensional matrix is ​​mapped to a sampling point with a preset time granularity within a day, forming an s×N input structure, where s represents the preset number of days and N represents the number of sampling points per day.

6. The univariate-based power load forecasting method according to claim 1, characterized in that, Local spatiotemporal features of each reconstructed sequence are extracted using a convolutional neural network, including: The convolutional neural network adopts a two-layer architecture, with each layer containing a preset number of fixed-size convolutional kernels, which slide and scan with a first fixed stride. The reconstructed sequence is input into the convolutional neural network. Each convolutional operation in the convolutional neural network is followed by a second max-pooling layer with a fixed stride to flatten the reconstructed sequence into a one-dimensional vector, thereby obtaining the local spatiotemporal features of the reconstructed sequence output by the convolutional neural network.

7. The univariate-based power load forecasting method according to claim 1, characterized in that, The local spatiotemporal features are input into an extended long short-term memory network for temporal modeling, and the output temporal features containing long-range dependencies include: The extended long short-term memory network is based on an extended long short-term memory fusion block, which integrates stable long short-term memory components and matrix long short-term memory components to form a stacked architecture. The local spatiotemporal features are mixed with the hidden state by the stable long short-term memory component to generate candidate memory. The memory component is updated by gating control, and the first hidden state is calculated based on the updated memory component. The local spatiotemporal features are multiplied by the preset key vector using the matrix long short-term memory component. The matrix memory component is obtained by combining the result of the multiplication calculation and the forget gate calculation. The matrix memory component is then normalized to obtain the second hidden state. The extended long short-term memory fusion block stacks the first hidden states of all time steps in chronological order to obtain a first feature matrix, and stacks the second hidden states of all time steps in chronological order to obtain a second feature matrix. The first feature matrix and the second feature matrix are then integrated by feature splicing projection, residual connection and layer normalization to output temporal features containing long-range dependencies.

8. The univariate-based power load forecasting method according to claim 1, characterized in that, Based on a time-series-component dual-dimensional attention mechanism, the time-series features are dynamically weighted, including: The time series features Reconstruction is performed to obtain reconstructed features. T is the number of time steps, C is the number of components, and D is the dimension of the hidden layer; Calculate the first attention score of the reconstructed feature in the temporal dimension; Calculate the second attention score of the reconstructed feature in the component dimension; Based on the first attention score and the second attention score, the feature vector of the c-th component at time step t Weighting is applied.

9. A univariate-based power load forecasting terminal, comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, characterized in that, When the processor executes the computer program, it implements each step of the univariate-based power load forecasting method according to any one of claims 1 to 8.

10. A computer storage medium having a computer program stored thereon, characterized in that, When the computer program is executed by the processor, it implements the various steps of the univariate-based power load forecasting method according to any one of claims 1 to 8.