Time series prediction method based on multi-scale network
By constructing a time series prediction method using multi-scale networks, we have addressed the shortcomings of existing models in extracting multi-scale features and local contextual information, achieving more accurate time series prediction and faster training speed.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- CIVIL AVIATION UNIV OF CHINA
- Filing Date
- 2024-03-01
- Publication Date
- 2026-06-23
AI Technical Summary
Existing time series prediction models struggle to extract multi-scale time series features simultaneously, neglect local contextual information and the correlation between variables, and fail to deeply analyze the importance of covariates.
A time series prediction method based on multi-scale networks is proposed, including a sequence correlation discrimination module, a multi-scale information extraction module, a local attention module, and a feature fusion module. Multi-scale and local contextual information is extracted through a recursive downsampling-convolution-interaction structure and a convolutional local attention mechanism, and prediction evaluation is performed using fully connected layers.
It improves the model's prediction accuracy and generalization ability, can more accurately capture the correlation between variables and local information, reduces the amount of input data to the model, and improves training speed and prediction performance.
Smart Images

Figure CN117972636B_ABST
Abstract
Description
Technical Field
[0001] This invention belongs to the field of computer technology, and in particular relates to a time series prediction method based on multi-scale networks. Background Technology
[0002] Multidimensional time-series data forecasting has demonstrated broad application value in various fields such as power, finance, transportation, and meteorology. This technology can effectively predict future trends of data within specific fields, enabling early warning of emergencies, mitigating economic losses, and assisting decision-makers in making data-supported decisions.
[0003] In the field of time series data prediction, early research mainly focused on statistical methods, but these methods mostly rely on linear relationships in the data for modeling and perform poorly in predicting nonlinear data and identifying complex patterns.
[0004] In contrast, deep learning models can effectively model nonlinear relationships between features and exhibit stronger learning capabilities. Many important algorithms in the field of time series forecasting involve recurrent neural networks (RNNs) and convolutional neural networks (CNNs). However, existing time series forecasting models still face the following challenges: first, it is difficult to simultaneously extract multi-scale time series features from the data; second, attention models that rely on data points often ignore local contextual information and the correlation information between variables; and third, past models have not conducted in-depth analysis of covariates, neglecting the importance of key covariates. Summary of the Invention
[0005] In view of this, the present invention aims to overcome the shortcomings of the above-mentioned problems in the prior art and proposes a time series prediction method based on multi-scale networks. By constructing a new prediction model MSLA, it can not only extract multi-scale information, but also capture the correlation and local information between variables, thereby improving the prediction effect of the model.
[0006] To achieve the above objectives, the technical solution of the present invention is implemented as follows:
[0007] The first aspect of this invention proposes a time series prediction method based on multi-scale networks, comprising:
[0008] A prediction model is constructed, comprising a sequence correlation discrimination module and a main module. The sequence correlation discrimination module is used to determine the correlation between multidimensional time-series data and identify key variables. The main module includes a multi-scale information extraction module, a local attention module, a feature fusion module, and a prediction evaluation module. The multi-scale information extraction module is used to extract information at different scales from the original data. The local attention module is used to extract the correlation between variables. The feature fusion module is used to perform weighted fusion of the features extracted from the above two parts in each training iteration. The prediction evaluation module uses a fully connected layer as the decoder of the model, outputs the predicted sequence, and evaluates the results of each training iteration.
[0009] The prediction model is trained to obtain the final prediction model, and this prediction module is used for time series prediction of multi-scale networks.
[0010] Furthermore, the multi-scale information extraction module adopts a recursive downsampling-convolution-interaction structure to extract multi-scale information.
[0011] Furthermore, the multi-scale information extraction module uses a stacked convolutional network as an encoder and replaces the conventional convolution operation in the feature extraction process with dilated convolution.
[0012] Furthermore, the multi-scale information extraction module includes a basic block MSI-Block, which downsamples the input data, divides the sequence into two subsequences, odd and even, and then inputs them into different convolutional filters for feature extraction. Each MSI-Block incorporates an interaction learning method between the two sequences.
[0013] Furthermore, the local attention module first divides the multidimensional sequence into several data segments for different data, performs overall convolution on each segment, and finally performs attention operations.
[0014] Furthermore, the local attention module introduces a fully connected layer as a decoder.
[0015] Furthermore, in the process of calculating Query(Q) and Key(K), the local attention module uses a convolution kernel greater than 1 to perform convolution operations, thereby focusing attention on the local context and enabling more relevant local features to be matched.
[0016] A second aspect of the present invention proposes a time series prediction device based on a multi-scale network, comprising:
[0017] The model building module is used to construct a prediction model, which includes a sequence correlation discrimination module and a main module. The sequence correlation discrimination module is used to determine the correlation between multi-dimensional time series data and identify key variables. The main module includes a multi-scale information extraction module, a local attention module, a feature fusion module, and a prediction evaluation module. The multi-scale information extraction module is used to extract information at different scales from the original data. The local attention module is used to extract the correlation between variables. The interaction module is used to perform weighted fusion of the features extracted from the above two parts in each training iteration. The prediction evaluation module uses a fully connected layer as the decoder of the model, outputs the predicted sequence, and evaluates the results of each training iteration.
[0018] The prediction module is used to train the prediction model to obtain the final prediction model, and to perform time series prediction using a multi-scale network.
[0019] A third aspect of the present invention provides an electronic device, including a processor and a memory communicatively connected to the processor and used to store executable instructions of the processor, the processor being used to execute the above-described time series prediction method based on multi-scale networks.
[0020] The fourth aspect of the present invention proposes a computer-readable storage medium storing a computer program, which, when executed by a processor, implements the above-described time series prediction method based on multi-scale networks.
[0021] Compared with existing technologies, the time series prediction method based on multi-scale networks described in this invention has the following advantages:
[0022] This invention proposes a novel MSLA model. The model's Multi-Scale Information Extraction (MSI) module extracts multi-scale information from the data; the Local Attention (LA) module extracts the relationships between variables, enhancing the model's ability to extract overall information from local contexts; and the Data Association Analysis (DAA) module identifies key dimensional variables, reducing the amount of input data and thus enabling faster model operation. Experiments demonstrate that the proposed model provides more accurate prediction results compared to other models. In multi-scenario experiments, the model achieved excellent results, exhibiting strong generalization ability and showing great application potential in the civil aviation field. Attached Figure Description
[0023] The accompanying drawings, which form part of this invention, are used to provide a further understanding of the invention. The illustrative embodiments of the invention and their descriptions are used to explain the invention and do not constitute an undue limitation of the invention. In the drawings:
[0024] Figure 1 This is a schematic diagram of the overall architecture of the prediction model of the present invention;
[0025] Figure 2 This is a schematic diagram of the MSI-Block structure of the present invention;
[0026] Figure 3 This is a schematic diagram of local attention in this invention. Detailed Implementation
[0027] It should be noted that, unless otherwise specified, the embodiments and features described in the present invention can be combined with each other.
[0028] In the description of this invention, it should be understood that the terms "center," "longitudinal," "lateral," "upper," "lower," "front," "rear," "left," "right," "vertical," "horizontal," "top," "bottom," "inner," and "outer," etc., indicating orientations or positional relationships based on the orientations or positional relationships shown in the accompanying drawings, are only for the convenience of describing the invention and simplifying the description, and do not indicate or imply that the device or element referred to must have a specific orientation, or be constructed and operated in a specific orientation, and therefore should not be construed as a limitation of the invention. Furthermore, the terms "first," "second," etc., are used for descriptive purposes only and should not be construed as indicating or implying relative importance or implicitly specifying the number of indicated technical features. Thus, a feature defined with "first," "second," etc., may explicitly or implicitly include one or more of that feature. In the description of this invention, unless otherwise stated, "a plurality of" means two or more.
[0029] In the description of this invention, it should be noted that, unless otherwise explicitly specified and limited, the terms "installation," "connection," and "linking" should be interpreted broadly. For example, they can refer to a fixed connection, a detachable connection, or an integral connection; they can refer to a mechanical connection or an electrical connection; they can refer to a direct connection or an indirect connection through an intermediate medium; and they can refer to the internal connection of two components. Those skilled in the art will understand the specific meaning of the above terms in this invention based on the specific circumstances.
[0030] The present invention will now be described in detail with reference to the accompanying drawings and embodiments.
[0031] Example 1:
[0032] Historical time series matrix X∈R N*T Where N is the dimension of data collection at each time point, T is the past time period, and X is the dimension of the data collection. i Let X be the time series of the i-th variable. This invention aims to learn a time series X that can utilize past time series X. 1:T Predicting the time series X over the next τ time steps T+1:T+τ The model F(.).
[0033] This invention constructs a prediction module MSLA, which consists of three parts, and the overall framework is as follows: Figure 1 As shown, the model consists of several modules. First, a sequence correlation discrimination module (Data Corr) determines the correlation between multi-dimensional time-series data and identifies key variables. Next is the main body of the model, comprised of a multi-scale information extraction (MSI) module, a local attention (LA) module, a feature fusion module, and a prediction and evaluation module. The MSI module extracts information from different scales of the original data, while the LA module addresses the issue of the MSI module's inability to extract correlation information between variables. The feature fusion module performs a weighted fusion of the features extracted from the two modules in each training iteration. Finally, the prediction and evaluation module uses fully connected layers as the model's decoder, outputting predicted sequences and evaluating the results of each training iteration. After training through these multiple network layers, the model completes its training.
[0034] While introducing more covariate data increases the total information contained in the data, the proportion of key information that helps improve prediction accuracy may decrease, which is detrimental to extracting key features. Experiments have shown that using only key covariates can achieve good prediction results and improve the speed of model training. Therefore, the sequence correlation discrimination module of this invention adopts the DTW (Dynamic Time Warping) algorithm. Based on the idea of dynamic programming, it can calculate the similarity between two non-equal time sequence data through time warping, thus solving the problem of similarity calculation. After data processing, data with high correlation often have higher similarity. In addition, this module can be turned on or off as needed; it can be turned off when the model needs to predict all input data.
[0035] Specifically, the multi-scale information extraction module is derived from an optimized encoder based on the Scinet model. This encoder is a stacked convolutional network that utilizes a rich set of convolutional filters to capture dynamic temporal dependencies at multiple resolutions. Compared to recent time series prediction models, the Scinet model has achieved significantly better results.
[0036] Analysis of the Scinet model revealed that its convolutional part uses a fixed kernel size k of 5. To expand the scope of information extraction without increasing computational cost, this invention employs dilated convolution instead of the conventional convolution operation in the feature extraction process, and allows for flexible adjustment of the kernel size and dilation factor r. This optimized MSI module more easily extracts data information at different resolutions.
[0037] MSI is composed of basic blocks, such as MSI-Blocks. Figure 2As shown, it downsamples the input data, dividing the sequence into two subsequences X: one odd and one even. odd and X even The samples are then fed into different convolutional filters for feature extraction. To reduce the impact of information loss during downsampling, two inter-sequence learning methods are incorporated into each MSI-Block. First, two different one-dimensional functions φ and φ are used. X respectively odd and X even Map to two hidden states, then convert the hidden states to exponential form, and multiply the corresponding elements to get X. s odd and X s even Finally, the above results are mapped to two hidden states using one-dimensional functions ρ and η, and addition and subtraction operations are performed to obtain the final sub-feature X′. odd and X′ even The specific calculation process is as follows:
[0038] X s odd =X odd ⊙exp(φ(X even )) (1)
[0039]
[0040] X′ odd =X s odd ±ρ(X s even (3)
[0041] X′ even =X s even ±η(X s odd (4).
[0042] The entire MSI module adopts a recursive downsampling-convolution-interaction structure, which can extract information at multiple scales. This invention also introduces a local attention module to simultaneously and effectively extract dependency information between variables and local context information.
[0043] Traditional attention mechanisms, in calculating Query(Q), Key(K), and Value(V), may cause a shift in the focus of the data, as the score only reflects the correlation between single points in time. For example, in... Figure 3In (a), the middle point only focuses on the leftmost point with similar values, without considering its contextual trends. Meanwhile, in the real world, relevant time-series data often exhibit high correlations; for example, weather and holiday changes in traffic data are highly correlated with traffic flow. These data contain valuable information, so the dependencies between variables cannot be ignored.
[0044] To address the characteristics of time-series prediction and extract inter-variable correlations and local contextual information, this invention optimizes the attention mechanism into a convolutional local attention module (LA). During the computation of Q and K, a convolutional kernel greater than 1 is used to perform convolution operations, thereby focusing attention on the local context and enabling more relevant local features to be matched, such as... Figure 3 As shown in (b), because convolution uses data of different dimensions, this operation enhances the ability to extract information about the relationships between sequence variables. The specific improvements compared to the traditional attention model are as follows:
[0045] 1) For different data, the multidimensional sequence is first divided into several data segments, each segment is subjected to overall convolution, and finally attention operation is performed. This solves the problem of high computational complexity of the attention mechanism when calculating the relationship between points, enabling the model to handle longer data sequences.
[0046] 2) Traditional attention-based decoders may accumulate errors, leading to a decrease in prediction accuracy. Therefore, this module eliminates the original decoding layer and introduces a fully connected layer as the decoder.
[0047] 3) Based on the number of dimensions of the time series data, instead of being limited to the original input dimensions, the input dimensions are appropriately expanded. Experiments have shown that this can enhance the stability and generalization ability of the model.
[0048] In each training iteration, the feature fusion module weights and fuses the features extracted by the MSI and LA modules. After multiple experiments, the optimal model training results were obtained by weighting and fusing the network features of MSI and LA at a ratio of 0.7:0.3.
[0049] Specifically, the prediction evaluation module first uses a fully connected layer as a decoder to output a prediction sequence at the end of each training session, and then evaluates the results of each output.
[0050] The effectiveness of the present invention will be verified through experiments below.
[0051] The experiment used four datasets for model training and testing, namely: (1) Power transformer oil temperature data Etth1. Etth1 is provided by State Grid and is a power load data of China, spanning two years and recorded in hourly granularity. Each data point contains eight feature dimensions, including date, target oil temperature, and six different types of power load features; (2) Pems04 traffic dataset, which reflects the complex spatiotemporal sequence in the public transportation network and is characterized by its large data dimensions;
[0052] (3) Solar-AL is solar energy data, with 52 feature dimensions recorded at each time point; (4) Exchange_Rate is the exchange rate dataset, with 7 feature dimensions recorded at each time point; the overall information is shown in Table 1.
[0053] Table 1
[0054]
[0055] To verify the model's prediction accuracy, this invention uses root mean square error (MSE), mean absolute error (MAE), and residual standard error (RSE) as evaluation metrics for model performance. The specific calculation formulas are as follows:
[0056]
[0057]
[0058]
[0059] in It is the model's predicted data, x i The data is real data, τ is the length of the predicted data, and mean represents the average of the data.
[0060] In this invention, 70% of the dataset is used as the training set, 15% as the validation set, and the remaining 15% as the test set. The model is built on a PyTorch platform, using Python 3.8, a multi-core CPU, 64GB of RAM, and a GTX 2060 graphics card.
[0061] The experimental results on the Exchange_Rate dataset are shown in Table 2. The input window length was set to 168. Compared with the baseline model Scinet, the RSE decreased by -11.05%, 45.42%, 48.05%, and 29.15% for prediction lengths of 3, 6, 12, and 24, respectively. The RSE reduction was significant at prediction lengths of 6, 12, and 24. Although the RSE increased slightly compared to the baseline model at a prediction length of 3, it still performed better than other baseline models. Since exchange rate data is relatively irregular and lacks significant periodicity compared to other datasets, the good experimental results are mainly attributed to the MSI module's ability to extract multi-scale features and the LA module's focus on local contextual information and the correlation between variables.
[0062] Table 2
[0063]
[0064] The experimental results for the Solar-AI dataset are shown in Table 3. In the experiments, the input window was set to 160, and the prediction lengths were 3, 6, 12, and 24. Compared to the baseline model, the RSE of this model decreased by 4.45%, 17.22%, 38.31%, and 46.36% respectively. A prominent characteristic of solar energy data is its periodicity, primarily exhibiting 24-hour and seasonal cycles. The experiments demonstrate that the MSLA model performs significantly well in extracting these characteristics.
[0065] Table 3
[0066]
[0067] Etth1 One-Dimensional and Multi-Dimensional Time Series Prediction:
[0068] Experiments were conducted with different input window sizes and prediction lengths. When the input window length was set to 48, the corresponding prediction length was 24; when the input window length was set to 96, the corresponding prediction length was 48. Single-dimensional and multi-dimensional data predictions were performed on this dataset. In the single-dimensional prediction experiment, only oil temperature (OT) was predicted. Compared with the baseline model, the MSE of this model was reduced by 29.5% and 10.5%, respectively, as shown in Table 4. In the multi-dimensional prediction experiment, compared with the baseline model, the MSE of this model was reduced by 43.22% and 33.74%, respectively, as shown in Table 5. These experiments validated the excellent performance of the model in both multi-dimensional and single-dimensional data prediction scenarios, demonstrating that the model can comprehensively extract information, making it adaptable to prediction tasks of various dimensions.
[0069] Table 4
[0070]
[0071] Table 5
[0072]
[0073] The prediction results for traffic time series data are shown in Table 6. In the experiment using traffic time series data for prediction, with an input window and prediction length of 12, the MAE of this model decreased by 0.26% compared to the baseline model. This further demonstrates the effectiveness of the MSLA model for time series prediction. Compared to the Etth1 dataset, Pems04 shows significant differences between weekday and weekend traffic data, and traffic data at adjacent locations exhibits strong correlations. Therefore, the accurate prediction for the Pems04 dataset convincingly demonstrates that the model can simultaneously capture both long-term and short-term dependencies and the association information between variables. In previous models, when processing traffic data, graph neural networks, such as STGCN and LSGCN, are typically used to capture spatial relationships, while temporal dependencies are modeled using traditional TCN or LSTM frameworks. These GNN-based methods generally outperform pure RNN or TCN methods. This model, however, also extracts the association information between adjacent nodes through a local attention module, achieving excellent results.
[0074] Table 6
[0075]
[0076] To verify the effectiveness of each component module of the MSLA model, ablation experiments were conducted uniformly on the solar-al dataset. Several components of the MSLA model were removed one by one for comparative experiments, with the following settings:
[0077] Experiment A involves removing the data association analysis module;
[0078] Experiment B involves removing the local attention module LA;
[0079] Experiment C involves removing the Multiscale Information Extraction (MSI) module;
[0080] Experiment D involves replacing the LA module with the original transformer;
[0081] Experiment E is the original MSLA model.
[0082] The experimental results are shown in Table 7. It can be seen that in Experiment A, after removing the data association analysis module, the experimental performance was at the same level as the MSLA model, but the training speed decreased due to the increased input data volume. Combining Experiments B and C, it can be seen that the multi-scale information extraction module (MSI) has the greatest impact on the experimental results and plays a crucial role. In Experiment D, replacing the local attention module with the original transformer demonstrated that it both worsened the results and slowed down the model training speed, confirming that the optimization of the local attention module has a significant positive impact on the experimental results. In summary, through these experiments, we conclude that the MSLA model proposed in this invention is effective, and each component module is valuable.
[0083] Table 7
[0084]
[0085] This invention also analyzes the rationality of the parameters, including:
[0086] Analysis of the size of multi-scale convolution dilation factor r and convolution kernel k:
[0087] After repeated experiments, the optimal settings were determined to be a convolution kernel k of 5 and an inflation factor r of 2. The experimental results on the Solar-AI dataset are shown in Table 8.
[0088] Table 8
[0089]
[0090]
[0091] Analysis of the kernel k values for Local Attention (LA):
[0092] Experiments on the Etth1 power data set show that changing the value of k significantly affects the results. The best performance is achieved when the kernel value of the Local Attention (LA) is 3. Further experiments lead to the general conclusion that for low-dimensional data (40 dimensions), such as the Solar_Al dataset, k values of 1, 3, and 9 were tested. The best results are achieved when k is 9, where the Local Attention (LA) and Multi-Scale Information Extraction (MSI) modules complement each other, resulting in faster model convergence and improved training performance, as shown in Table 9.
[0093] Table 9
[0094]
[0095] We also found that when the kernel size k ≤ 2 for the local attention module, the experimental results are comparable to those when the module is replaced by the original attention module. Through analysis, we learned that when the data dimensionality is large, the internal correlation information is sufficient. If the convolution of information data at adjacent time points is too small, it will be detrimental to feature extraction and may even lead to interference between data of different dimensions. However, when k is 9, the ability to extract local information is enhanced, and combined with the complementarity of other modules, the model achieves the best results.
[0096] Input data dimensional analysis:
[0097] This section verifies the impact of increasing the input data dimension d_model in the attention module LA on the experimental results. The Pems04 dataset was used for the experiments, with an original dimension of 307. The experimental results show that appropriately increasing the input data dimension makes it easier for the neural network to extract information, thereby improving the experimental performance. When d_model is 320, the MAE is 19.74; when d_model is 600, the MAE is 19.41.
[0098] Analysis of the impact of training batch size on experiments:
[0099] When the dataset size is small, a significant improvement in model prediction performance can greatly expand the model's application scenarios. Here, batch size has a significant impact on the experimental results and requires special attention. We conducted experiments using the small-scale ExchangeRate dataset. The results show that for small datasets, batch size should not be too large. This is because as batch size increases, the number of parameter updates per training round decreases significantly, which is detrimental to model convergence. The experimental results are shown in Table 10.
[0100] Table 10
[0101]
[0102] Experimental results of the sequence correlation discrimination module:
[0103] Identifying the strong correlation between covariates and the target variable has a significant impact on the target variable. This allows us to adjust the dimensions of the model's input data and improve training speed. For the Etth1 dataset, each time point contains eight features: time, six different power load characteristics, and oil temperature: DATE, HUFL, HULL, MUFL, MULL, LUFL, LULL, and OT. We used the DTW algorithm to calculate the correlation between the two sequences. After normalization, values between 0-0.3, 0.3-0.6, and 0.6-1.0 represent strong, moderate, and weak correlations, respectively. Table 11 shows the correlation between the relevant variables and the target value OT in the Etth1 dataset. Therefore, for the Etth1 dataset, we can choose to use five dimensions (DATE, HULL, MULL, LULL, and OT) for prediction, which improves the prediction performance by 3.5% compared to using all dimensions, while also improving model training efficiency by 2.8%.
[0104] Table 11
[0105]
[0106] This invention studies the problem of multi-scenario time series data prediction and proposes a novel MSLA model to address the shortcomings of existing work. The model's Multi-Scale Information Extraction (MSI) module extracts multi-scale information from the data; the Local Attention (LA) module extracts the correlations between variables, enhancing the model's ability to extract overall information from local contexts; and the Data Association Analysis (DAA) module identifies key dimensional variables, reducing the amount of input data and thus enabling faster model execution.
[0107] We evaluated the model's performance through experiments and compared it with several other models. Experimental results show that the model proposed in this study provides more accurate predictions compared to other models, with an average improvement of 20%. In multi-scenario experiments, the model achieved good results, demonstrating strong generalization ability and showing great application potential in the civil aviation field. Furthermore, time series data exhibits seasonality, periodicity, and trend characteristics, which can be combined with data decomposition to further improve prediction accuracy and duration.
[0108] Example 2:
[0109] A time series prediction device based on multi-scale networks includes:
[0110] The model building module is used to construct a prediction model, which includes a sequence correlation discrimination module and a main module. The sequence correlation discrimination module is used to determine the correlation between multi-dimensional time series data and identify key variables. The main module includes a multi-scale information extraction module, a local attention module, a feature fusion module, and a prediction evaluation module. The multi-scale information extraction module is used to extract information at different scales from the original data. The local attention module is used to extract the correlation between variables. The interaction module is used to perform weighted fusion of the features extracted from the above two parts in each training iteration. The prediction evaluation module uses a fully connected layer as the decoder of the model, outputs the predicted sequence, and evaluates the results of each training iteration.
[0111] The prediction module is used to train the prediction model to obtain the final prediction model, and to perform time series prediction using a multi-scale network.
[0112] Example 3:
[0113] An electronic device includes a processor and a memory communicatively connected to the processor for storing processor-executable instructions, the processor being used to execute the aforementioned time series prediction method based on multi-scale networks.
[0114] Example 4:
[0115] A computer-readable storage medium storing a computer program, which, when executed by a processor, provides the aforementioned time series prediction method based on multi-scale networks.
[0116] The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of the present invention should be included within the protection scope of the present invention.
Claims
1. A time series prediction method based on multi-scale networks, characterized in that: include: A prediction model is constructed, which includes a sequence correlation degree discrimination module and a main module; The sequence correlation discrimination module is used to determine the correlation between multidimensional time series data and identify key variables. The sequence correlation discrimination module adopts the DTW algorithm, which is based on the idea of dynamic programming and calculates the similarity between two non-equal time series data through time warping. The main module includes a multi-scale information extraction module, a local attention module, a feature fusion module, and a prediction and evaluation module. The multi-scale information extraction module is used to extract information at different scales from the original data. The local attention module is used to extract the correlation between variables. The feature fusion module is used to perform weighted fusion of the features extracted by the multi-scale information extraction module and the local attention module in each training iteration. The multi-scale information extraction module is obtained by optimizing the encoder of the Scinet model. This encoder is a stacked convolutional network that uses a rich set of convolutional filters to capture dynamic temporal dependencies at multiple resolutions. Dilated convolution is used instead of conventional convolution operations in the feature extraction process. The multi-scale information extraction module includes basic blocks (MSI-Blocks) that downsample the input data and divide the sequence into odd and even subsequences. and The inputs are fed into different convolutional filters for feature extraction. Each MSI-Block incorporates two inter-sequence interaction learning methods. First, two different one-dimensional functions are used... and Each and Map to two hidden states, then convert the hidden states to exponential form, and multiply the corresponding elements to get the result. and Finally, a one-dimensional function is used. and The above results are mapped to two hidden states, and addition and subtraction operations are performed to obtain the final sub-features. and The specific calculation process is as follows: ; The local attention module is a convolutional local attention module LA. In the process of calculating Query(Q) and Key(K), a convolution kernel with a value greater than 1 is used to perform convolution operations, so that attention is focused on the local context, and more relevant local features can be matched. The prediction and evaluation module uses a fully connected layer as the decoder of the model, outputs a prediction sequence, and evaluates the results of each training iteration. The prediction model is trained to obtain the final prediction model, and then used to perform time series prediction for multi-scale networks.
2. The time series prediction method based on multi-scale networks according to claim 1, characterized in that: The multi-scale information extraction module adopts a recursive downsampling-convolution-interaction structure to extract multi-scale information.
3. The time series prediction method based on multi-scale networks according to claim 1, characterized in that: The local attention module first divides the multidimensional sequence into several data segments, performs an overall convolution on each segment, and finally performs attention operations.
4. The time series prediction method based on multi-scale networks according to claim 1, characterized in that: The local attention module introduces a fully connected layer as a decoder.
5. A prediction apparatus for implementing the time series prediction method based on multi-scale networks as described in any one of claims 1-4, characterized in that: include: The model building module is used to build a prediction model, which includes a sequence correlation degree discrimination module and a main module. The sequence correlation discrimination module is used to determine the correlation between multidimensional time series data and identify key variables. The sequence correlation discrimination module adopts the DTW algorithm, which is based on the idea of dynamic programming and calculates the similarity between two non-equal time series data through time warping. The main module includes a multi-scale information extraction module, a local attention module, a feature fusion module, and a prediction and evaluation module. The multi-scale information extraction module is used to extract information at different scales from the original data. The local attention module is used to extract the correlation between variables. The feature fusion module is used to perform weighted fusion of the features extracted by the multi-scale information extraction module and the local attention module in each training iteration. The multi-scale information extraction module is obtained by optimizing the encoder of the Scinet model. This encoder is a stacked convolutional network that uses a rich set of convolutional filters to capture dynamic temporal dependencies at multiple resolutions. Dilated convolution is used instead of conventional convolution operations in the feature extraction process. The multi-scale information extraction module includes basic blocks (MSI-Blocks) that downsample the input data and divide the sequence into odd and even subsequences. and The inputs are fed into different convolutional filters for feature extraction. Each MSI-Block incorporates two inter-sequence interaction learning methods. First, two different one-dimensional functions are used... and Each and Map to two hidden states, then convert the hidden states to exponential form, and multiply the corresponding elements to get the result. and Finally, a one-dimensional function is used. and The above results are mapped to two hidden states, and addition and subtraction operations are performed to obtain the final sub-features. and The specific calculation process is as follows: ; The local attention module is a convolutional local attention module LA. In the process of calculating Query(Q) and Key(K), a convolution kernel with a value greater than 1 is used to perform convolution operations, so that attention is focused on the local context, and more relevant local features can be matched. The prediction and evaluation module uses a fully connected layer as the decoder of the model, outputs a prediction sequence, and evaluates the results of each training iteration. The prediction module is used to train the prediction model to obtain the final prediction model, and to use the prediction model to perform time series prediction for multi-scale networks.
6. An electronic device, comprising a processor and a memory communicatively connected to the processor and used for storing processor-executable instructions, characterized in that: The processor is used to execute the time series prediction method based on multi-scale networks as described in any one of claims 1-4.
7. A computer-readable storage medium storing a computer program, characterized in that: When the computer program is executed by the processor, it implements the time series prediction method based on multi-scale networks as described in any one of claims 1-4.