A multi-element time series prediction method based on wavelet decomposition and multi-scale block hybrid

By combining wavelet decomposition with multi-scale block partitioning, the problems of multi-scale feature modeling and temporal dependency fusion in multivariate time series prediction are solved, achieving efficient and accurate multivariate time series prediction, which is particularly suitable for complex multivariate time series tasks.

CN122196479APending Publication Date: 2026-06-12NANJING UNIV OF POSTS & TELECOMM

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
NANJING UNIV OF POSTS & TELECOMM
Filing Date
2026-03-23
Publication Date
2026-06-12

AI Technical Summary

Technical Problem

Existing deep learning methods suffer from insufficient multi-scale feature modeling capabilities, inadequate temporal dependency fusion, low computational efficiency, and inconsistent prediction results in multivariate temporal series prediction, especially in scenarios with high noise and significant multi-scale changes.

Method used

A hybrid approach combining wavelet decomposition and multi-scale block partitioning is adopted to explicitly decompose multivariate time-series data into multiple resolutions, construct independent branches for parallel processing, generate prediction sequences through a wavelet reconstruction module, and introduce a patch structure-aware loss function for joint supervised training to optimize prediction accuracy and structural consistency.

🎯Benefits of technology

It significantly improves the accuracy and robustness of multivariate time series prediction, effectively handles high-noise and non-stationary multi-scale feature scenarios, and provides more accurate and efficient prediction results.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122196479A_ABST
    Figure CN122196479A_ABST
Patent Text Reader

Abstract

The present application relates to the technical field of time series prediction, and particularly relates to a multivariate time series prediction method based on wavelet decomposition and multi-scale block mixing, comprising: inputting preprocessed multivariate time series data into a wavelet decomposition module for explicit multi-resolution decomposition to obtain an approximate coefficient sequence and a detail coefficient sequence, which respectively correspond to information components of different time scales; constructing multiple independent branches, inputting each coefficient sequence into an independent branch for parallel processing, sequentially realizing local feature enhancement, multi-scale block adaptive fusion, cross-time and cross-variable interaction, and future value prediction to obtain a prediction result of each independent branch; integrating all prediction results into a wavelet reconstruction module for inverse transformation and fusion to generate a prediction sequence; given a real sequence, combining the prediction sequence to construct a patch structure perception loss function, and using a least square error joint supervision training to optimize prediction accuracy and structure consistency.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of time series prediction technology, specifically to a multivariate time series prediction method based on a hybrid wavelet decomposition and multi-scale block partitioning. Background Technology

[0002] Multivariate time series forecasting plays a crucial role in fields such as power load forecasting, financial market analysis, and weather forecasting, attracting continuous attention from academia and industry. Various deep learning methods have been systematically developed in this area. Early forecasting primarily relied on traditional statistical methods, such as autoregression, moving averages, and ensemble autoregressive-moving averages. These methods, based on assumptions of linearity and stationarity, relied on human experience to define model structures, making it difficult to characterize the nonlinear relationships and dynamic evolution of complex time series data. Their performance was limited in multivariate, long-term forecasting tasks. In recent years, deep learning, with its powerful nonlinear fitting and feature learning capabilities, has opened new paths for multivariate time series forecasting. Among these, Convolutional Neural Networks (CNNs) excel at capturing local and periodic patterns; Recurrent Neural Networks (RNNs) and their variant, Long Short-Term Memory (LSTM), specialize in sequence dependency modeling; Transformers can characterize long-range global dependencies; and the simple multilayer perceptron (MLP) has regained attention due to its high computational efficiency.

[0003] Existing mainstream deep learning methods have made some progress in multivariate time series prediction, but they share common problems in modeling complex dynamic patterns, namely:

[0004] Convolutional neural network (CNN) technology offers good predictive performance, but its temporal modeling capabilities are limited by the kernel size and network depth, resulting in a fixed modeling scale and difficulty in adaptively capturing multi-scale temporal features. More importantly, the channel-wise nature of convolution operations makes it difficult to capture global coupling relationships between variables; it can only achieve simple cross-variable interactions through channel concatenation, failing to characterize the complex dynamic dependencies between variables.

[0005] The Transformer-based approach has two core problems to solve. First, the computational and memory overhead of the self-attention mechanism increases exponentially with the sequence length, resulting in high training and deployment costs in ultra-long sequences and high-dimensional multivariate scenarios. Second, the multi-scale temporal structure features of time-series data are mostly learned implicitly by the model, lacking explicit expression and targeted fusion strategies for information at different time scales, which limits the ability to mine multi-level features.

[0006] Existing MLP-Mixer-type methods have limitations in modeling time dependencies. Multi-scale temporal pattern interactions rely on pre-defined mixing dimensions and structures, and the model lacks the ability to adapt to local dynamic changes. In real-world scenarios where periodicity and trend overlap and scale differences are significant, this method struggles to automatically focus on key information intervals.

[0007] Furthermore, most existing methods only use pointwise error losses such as MSE (Mean Squared Error) and MAE (Mean Absolute Error) as optimization objectives, without considering the structural consistency between the predicted sequence and the real sequence in terms of local fluctuation patterns, statistical distribution, and overall trend. This makes the model prone to problems such as trend drift and local pattern distortion in long-term predictions or tasks with significant multi-scale changes, reducing the reliability and interpretability of the prediction results. Summary of the Invention

[0008] To address the shortcomings of existing deep learning methods in multi-scale feature modeling, temporal dependency fusion, and computational efficiency, this invention aims to provide a multivariate temporal prediction method based on a hybrid wavelet decomposition and multi-scale block partitioning. The specific technical solution adopted is as follows:

[0009] The preprocessed multivariate time series data is input into the wavelet decomposition module for explicit multi-resolution decomposition, resulting in an approximate coefficient sequence and a detail coefficient sequence, which correspond to information components at different time scales.

[0010] Multiple independent branches are constructed, and each coefficient sequence is input into the independent branch for parallel processing. Local feature enhancement, multi-scale block adaptive fusion, cross-time and cross-variable interaction, and future value prediction are realized in sequence to obtain the prediction results of each independent branch.

[0011] All prediction results are integrated and input into the wavelet reconstruction module for inverse transformation and fusion to generate a prediction sequence;

[0012] Given a real sequence, a patch structure-aware loss function is constructed by combining it with the predicted sequence, and the mean squared error is used for joint supervised training to optimize prediction accuracy and structural consistency.

[0013] Preferably, the preprocessed multivariate time-series data is input into the wavelet decomposition module for explicit multi-resolution decomposition, yielding an approximation coefficient sequence and a detail coefficient sequence, which correspond to information components at different time scales, including:

[0014] Based on multivariate time-series data, instance normalization and dimension permutation are performed sequentially.

[0015] By pre-setting the wavelet basis and decomposition level, the preprocessed multivariate time series data is explicitly decomposed into multiple resolutions using the wavelet decomposition module, resulting in an approximate coefficient sequence and multiple detail coefficient sequences, corresponding to information components at different time scales.

[0016] Preferably, the number of independent branches corresponds to the coefficient sequence, and each independent branch has the same structure, including a preprocessing convolution module, a multi-scale block embedding module, a two-layer Mixer module that introduces convolution branches, and a prediction module arranged sequentially.

[0017] Preferably, each coefficient sequence is input into an independent branch for parallel processing, sequentially achieving local feature enhancement, multi-scale block adaptive fusion, cross-time and cross-variable interaction, and future value prediction, resulting in the prediction results of each independent branch, including:

[0018] The coefficient sequence is input into the preprocessing convolution module, and is processed sequentially through the convolutional backbone and the gated weight generator to obtain enhanced features;

[0019] The enhanced features are input into the multi-scale block embedding module, divided into multiple patches, and projected features are obtained through feature mapping. Attention weights are generated to adaptively fuse the projected features to obtain a multi-scale embedding representation.

[0020] The multi-scale embedding representation is input into a two-layer Mixer module, and a parallel feature interaction mechanism is designed to fuse the time dimension and the feature dimension to obtain the output feature.

[0021] The output features are input into the prediction module, and the output features are mapped to future time series prediction values ​​through feature flattening and linear projection to obtain the prediction results.

[0022] Preferably, the coefficient sequence is input to the preprocessing convolution module, and is processed sequentially through the convolutional backbone and the gated weight generator to obtain enhanced features, including:

[0023] Local temporal features are extracted by deep convolution based on coefficient sequences. After batch normalization and activation function, initial features are obtained by pointwise convolution.

[0024] After average pooling of the initial features, they are input into the gate weight generator, which generates gate weights in conjunction with the activation function. The enhanced features are obtained by fusing the initial features and gate weights through the gated residuals.

[0025] Preferably, the enhanced features are input into the multi-scale block embedding module, divided into multiple patches, projected features are obtained through feature mapping, and attention weights are generated to adaptively fuse the projected features to obtain a multi-scale embedding representation, including:

[0026] Preset the patch length and patch sliding step size, perform patch division operation, determine the number of patches, and obtain multiple patches;

[0027] Projected features are obtained by feature mapping of each patch at multiple time scales;

[0028] Attention weights are generated by assigning a corresponding scalar attention score to each projected feature through a shared MLP.

[0029] Multi-scale embedding representations are obtained by adaptively fusing projection features and attention weights.

[0030] Preferably, the multi-scale embedding representation is input into a two-layer Mixer module, and a parallel feature interaction mechanism is designed to fuse the time dimension and the feature dimension to obtain the output features, including:

[0031] The dual-layer Mixer module adopts a dual-layer hybrid structure of token mixing and feature mixing;

[0032] Based on the Token hybrid structure, long-range and local dependencies of multi-scale embedding representations are captured through fully connected layers and convolutional layers respectively, and the output feature is obtained by adaptive fusion through a learnable gating mechanism.

[0033] Based on the feature fusion structure, after replacing the dimension of the output feature one and performing batch normalization, the different feature dimensions are interacted and recombined to obtain the output feature two.

[0034] Output feature 1 and output feature 2 are residually connected to obtain the output feature.

[0035] Preferably, all prediction results are integrated into the wavelet reconstruction module for inverse transformation and fusion to generate a prediction sequence, including:

[0036] By employing wavelet basis and decomposition series, the prediction results are fused through inverse discrete wavelet transform to generate a reconstructed sequence.

[0037] The reconstructed sequence is subjected to instance inverse normalization and dimension permutation to obtain the predicted sequence.

[0038] Preferably, given the real sequence, a patch structure-aware loss function is constructed by combining it with the predicted sequence, and joint supervised training is performed using the mean squared error to optimize prediction accuracy and structural consistency, including:

[0039] Based on the real sequence and the predicted sequence being divided into overlapping local patches along the time dimension, correlation coefficient loss, variance distribution loss and mean alignment loss are constructed in sequence and integrated to obtain the patch structure-aware loss function.

[0040] The mean squared error loss function is combined with the patch structure-aware loss function to establish the total training loss function, which is used to optimize prediction accuracy and structural consistency.

[0041] To address the aforementioned problems, the present invention also provides an electronic device comprising a processor, a communication interface, a memory, and a communication bus, wherein the processor, the communication interface, and the memory communicate with each other via the communication bus, and the processor calls logical instructions in the memory to execute a multivariate time series prediction method based on wavelet decomposition and multi-scale block hybridization as described in any of the preceding claims.

[0042] The present invention has the following beneficial effects:

[0043] 1. By introducing an explicit wavelet decomposition mechanism, the wavelet decomposition module decouples the preprocessed multivariate time-series data into multi-resolution components, laying a structured multi-scale representation foundation for subsequent parallel branch processing. Multiple independent branches are constructed to process each coefficient sequence in parallel, obtaining the corresponding prediction results through a series of processes. This enhances the extraction capability of local time-series features and suppresses noise interference, achieving local feature enhancement. Adaptive fusion of information at different time granularities is achieved; efficient information exchange between the time and variable dimensions is realized, enabling cross-time and cross-variable interaction; and differentiated weight allocation is introduced to strengthen the modeling of cross-variable dependencies. The overall method, by optimizing the modeling strategy and computational path, significantly reduces computational redundancy while improving the ability to capture multi-scale dynamic patterns. It can provide more accurate, robust, and efficient prediction results in complex multivariate time-series prediction tasks, and is particularly suitable for handling practical application scenarios with high noise, non-stationarity, and coexisting multi-scale features.

[0044] 2. The electronic device provided by this invention has the same beneficial effects as the multivariate time series prediction method based on wavelet decomposition and multi-scale block hybrid provided by this invention, and will not be described in detail here. Attached Figure Description

[0045] To more clearly illustrate the technical solutions and advantages in the embodiments of the present invention or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are only some embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0046] Figure 1 The flowchart illustrates an implementation of a multivariate time series prediction method based on a hybrid wavelet decomposition and multi-scale block partitioning, as provided in one embodiment of the present invention.

[0047] Figure 2 The flowchart illustrates the steps of a multivariate time series prediction method based on wavelet decomposition and multi-scale block hybridization, as provided in one embodiment of the present invention. Detailed Implementation

[0048] To further illustrate the technical means and effects adopted by the present invention to achieve its intended purpose, the following, in conjunction with the accompanying drawings and preferred embodiments, details the specific implementation, structure, features, and effects of a multivariate time series prediction method based on wavelet decomposition and multi-scale block hybridization proposed according to the present invention. In the following description, different "one embodiment" or "another embodiment" do not necessarily refer to the same embodiment. Furthermore, specific features, structures, or characteristics in one or more embodiments can be combined in any suitable form.

[0049] Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention pertains.

[0050] The following description, in conjunction with the accompanying drawings, details a specific scheme for a multivariate time series prediction method based on wavelet decomposition and multi-scale block hybridization provided by this invention.

[0051] Please combine Figure 1 and Figure 2 The diagram illustrates a flowchart of a multivariate time series prediction method based on wavelet decomposition and multi-scale block hybridization provided in the first embodiment of the present invention. The method includes:

[0052] Step S1: Input the preprocessed multivariate time series data into the wavelet decomposition module for explicit multi-resolution decomposition to obtain the approximate coefficient sequence and the detail coefficient sequence, which correspond to the information components at different time scales.

[0053] Step S2: Construct multiple independent branches, input each coefficient sequence into the independent branches for parallel processing, and sequentially realize local feature enhancement, multi-scale block adaptive fusion, cross-time and cross-variable interaction, and future value prediction to obtain the prediction results of each independent branch;

[0054] Step S3: Integrate all prediction results into the wavelet reconstruction module for inverse transformation and fusion to generate a prediction sequence;

[0055] Step S4: Given the real sequence, construct a patch structure-aware loss function by combining it with the predicted sequence, and train it under joint supervision using mean squared error to optimize prediction accuracy and structural consistency.

[0056] The paper explains that by using a preprocessing convolution module to perform local feature enhancement and noise suppression on each input coefficient sequence, and combining attention-weighted multi-scale block embedding to achieve adaptive fusion of information at different temporal granularities, the shortcomings of existing methods in multi-scale feature modeling are effectively solved.

[0057] To better illustrate the process, the entire method was implemented in a hardware environment using an NVIDIA GeForce RTX 3090 graphics card. The Adam optimizer was used for training with an initial learning rate of 0.001. The input length was fixed at 96, and the prediction step size was 96, 192, 336, and 720. The batch size was set to 128. Dropout and early stopping strategies were introduced, and the early stopping patience value was uniformly set to 5. These settings can be adjusted according to the actual situation.

[0058] Further, step S1 includes:

[0059] Step S11: Perform instance normalization and dimension permutation processing sequentially based on multivariate time series data.

[0060] As an optional implementation, the multivariate time-series data, specifically the multivariate time-series forecasting benchmark dataset used in this embodiment, comprises five widely used datasets. Among them, the ETT (Electricity Transformer Temperature) dataset contains four subsets: ETTh1, ETTh2, ETTm1, and ETTm2, recording multivariate operating data such as oil temperature and load of power transformers, with a sampling frequency of 15 minutes. The Weather dataset contains 21 meteorological variables, sampled every 10 minutes, exhibiting strong predictability.

[0061] Specifically, instance normalization is performed on multivariate time-series data to transform features with different dimensions or numerical ranges into a unified interval, eliminating the influence of dimensions and improving model training efficiency and prediction accuracy. Next, dimensionality permutation is performed to verify the model's robustness or explore the data's intrinsic structure by adjusting the order of features or time steps, thus obtaining preprocessed time-series data, denoted as... ,in, Indicates the number of channels; This indicates the sequence length of multivariate time series data.

[0062] Step S12: Preset the wavelet basis and decomposition level, and use the wavelet decomposition module to perform explicit multi-resolution decomposition on the preprocessed multivariate time series data to obtain an approximate coefficient sequence and multiple detail coefficient sequences, corresponding to information components at different time scales.

[0063] Specifically, the preprocessed multivariate time-series data is subjected to explicit multi-resolution decomposition using a wavelet decomposition module. The corresponding calculation formula is as follows:

[0064]

[0065] in, Indicates wavelet decomposition; Denotes wavelet basis; Represents the decomposition series; Represents the approximate coefficient sequence; Indicates the first The detail coefficient sequence at the level, , , This indicates the number of wavelet coefficients at this level.

[0066] It can be noted that, in order to reduce information redundancy and highlight multi-scale characteristics, only the final level is retained. The approximate coefficient sequence is obtained, and the detailed coefficient sequences at all levels are retained for subsequent modeling, resulting in one approximate coefficient sequence and one detailed coefficient sequence. For the sake of brevity, in the subsequent description, the approximation branch and the detail branch will be consistently referred to as a sequence of detailed coefficients. This represents a coefficient sequence at any level.

[0067] Furthermore, the number of independent branches is set in accordance with the coefficient sequence, and each independent branch has the same structure, which includes a preprocessing convolution module, a multi-scale block embedding module, a two-layer Mixer module that introduces convolution branches, and a prediction module set in sequence.

[0068] To clarify, the independent branch, i.e., the independent resolution branch, is based on the aforementioned approximate coefficient sequence and Each detailed coefficient sequence corresponds to an independent branch. Each branch has a unified structure to ensure consistency and scalability of the processing flow. Each branch includes a preprocessing convolution module, a multi-scale block embedding module, a two-layer Mixer module with convolutional branches, and a prediction module, which sequentially achieve local feature enhancement, multi-scale block adaptive fusion, cross-time and cross-variable interaction, and future value prediction.

[0069] Further, step S2 includes:

[0070] Step S21: The coefficient sequence is input into the preprocessing convolution module, and is processed sequentially through the convolution backbone and the gated weight generator to obtain enhanced features.

[0071] The preprocessing convolution module consists of two parts: a convolutional backbone and a gated weight generator. Combined with residual connections and gated weighting, feature fusion is achieved. This not only preserves the integrity of the original information for the coefficient sequence, but also adaptively enhances the discriminative features extracted by convolution through the gating mechanism.

[0072] Further, step S21 includes:

[0073] Step S211: Extract local temporal features by performing deep convolution based on the coefficient sequence, and after batch normalization and activation function, obtain the initial features by pointwise convolution.

[0074] Specifically, the coefficient sequence is input into the preprocessing convolution module. First, deep convolution is performed to extract local temporal features. The output is then batch normalized and activated, followed by pointwise convolution to achieve inter-channel information exchange. The activation function used is GELU (Gaussian Error Linear Unit), which has smoother non-linear characteristics, alleviates the gradient vanishing problem, and introduces a degree of randomness, helping the model learn more complex feature representations. The corresponding calculation formula is:

[0075]

[0076]

[0077] in, Indicates the first The initial features corresponding to the coefficient sequence of the level; Indicates batch normalization; This represents pointwise convolution; Indicates intermediate variables; This represents the GELU activation function; This represents depthwise convolution.

[0078] Step S212: After performing average pooling on the initial features, the input is fed into the gate weight generator, which generates gate weights in conjunction with the activation function. The initial features and gate weights are then fused through the gate residuals to obtain enhanced features.

[0079] To clarify, the gated weight generator is a lightweight MLP network used to dynamically adjust the contribution of the output of the convolutional backbone to each channel. That is, by introducing learnable parameters and nonlinear transformations, the gated weight generator can adjust the weight of each channel in real time according to the characteristics of the input data, thereby enhancing the ability to capture key features and suppressing noise or irrelevant information.

[0080] Specifically, the output of the convolutional backbone, i.e., the initial features Average pooling is performed along the time dimension, followed by a linear transformation to reduce the initial feature dimensionality. The dimensionality-reduced features are then activated by GELU to capture fine-grained nonlinear relationships between features. Next, another linear transformation is applied to restore the feature dimensionality to the same level as the original initial features, and a sigmoid activation function is used to generate channel-level gating weights. The corresponding calculation formula is as follows:

[0081]

[0082] in, Indicates the first The gating weights corresponding to the coefficient sequence of the level; This represents the Sigmoid activation function; This represents a multilayer perceptron; This indicates average pooling.

[0083] Next, the gate weights are fused using gated residual fusion. and initial features By combining these features, we obtain the enhanced features, and the corresponding calculation formula is:

[0084]

[0085] in, Indicates the first The enhancement features corresponding to the coefficient sequence of the level; This indicates element-wise multiplication.

[0086] Step S22: Input the enhanced features into the multi-scale block embedding module, divide it into multiple patches, obtain the projected features through feature mapping, and generate attention weights to adaptively fuse the projected features to obtain the multi-scale embedding representation.

[0087] The multi-scale block embedding module can adaptively allocate attention according to the input content at different time scales, and more flexibly model short-term fluctuations and long-term trends. That is, it adopts an attention-weighted multi-scale block embedding mechanism to effectively capture local patterns at different time scales in time series data. A series of processing is performed on the enhanced features to obtain multi-scale embedding representations to enhance the ability to model multi-scale time series patterns. Then, it is sent to the subsequent Mixer module for further feature mixing and prediction generation.

[0088] Further, step S22 includes:

[0089] Step S221: Preset the patch length and patch sliding step, perform patch division operation, determine the number of patches, and obtain multiple patches.

[0090] Specifically, a patch is an overlapping local segment in the input sequence; based on enhancement features. The sequence length is Set the patch length to The patch sliding step size is The number of patches is determined by the following formula:

[0091]

[0092] in, Indicates the number of patches. Indicates the patch index, and ; This indicates the floor function.

[0093] Based on the number of patches Perform patch partitioning to enhance features The process involves dividing the data into a series of overlapping local segments, with padding performed at the end by copying the last time step to ensure all time points are covered. This results in the patched output, i.e. .

[0094] Step S223: Perform feature mapping on each patch at multiple time scales to obtain projected features.

[0095] Specifically, a multi-scale pooling strategy is employed to capture local patterns at different temporal granularities; based on each patch, a predefined set of scales is assumed. , Indicates the first For each scale of target length, firstly, adaptive pooling is performed for each scale. Then, each patch is divided from the length using one-dimensional adaptive average pooling. Adjust to target length Conversely, if In that case, the original patch will be used directly. Next, each block sequence at each scale, i.e., each patch, is projected onto a unified feature dimension through an independent linear layer. The corresponding calculation formula is:

[0096]

[0097] in, Indicates the first The patch is in the 1st Projection features at each scale; Indicates the first Each scale-independent linear layer. Similarly, projection features at all scales are obtained.

[0098] Step S224: Assign a corresponding scalar attention score to each projected feature using a shared MLP to generate attention weights.

[0099] Specifically, to dynamically assess the importance of each scale, a scalar attention score is calculated by sharing the projected features of the MLP for each scale; then, by... The scores at each scale are then Softmax normalized to obtain the corresponding attention weights, calculated using the following formula:

[0100]

[0101] in, Indicates the first The patch is in the 1st Attention weights at each scale; This represents the Softmax function; This represents a shared MLP. Similarly, attention weights for all scales can be obtained.

[0102] Step S225: Adaptively fuse the projection features and attention weights to obtain a multi-scale embedding representation.

[0103] Specifically, the multi-scale embedding representation is obtained by weighting the projection features at each scale, and the corresponding calculation formula is as follows:

[0104]

[0105] in, Indicates the first Multi-scale embedding representations of each scale in each patch; This indicates element-wise multiplication.

[0106] Step S23: Input the multi-scale embedding representation into the two-layer Mixer module, design a parallel feature interaction mechanism, fuse the time dimension and feature dimension, and obtain the output feature.

[0107] To clarify, the two-layer Mixer module is based on the MLP-Mixer architecture and aims to efficiently fuse information from the time dimension and the feature dimension through a parallel feature interaction mechanism.

[0108] Furthermore, step S23 includes:

[0109] The two-layer Mixer module adopts a two-layer hybrid structure of token mixing and feature mixing.

[0110] To clarify, Token mixing models the dependencies between different patches (Tokens) within the same channel while maintaining channel independence. The feature mixing layer operates within each patch to interact and reorganize different feature dimensions. In other words, two identical Mixer blocks are concatenated to form a mixing module. By parallelizing Token mixing and feature mixing, and by using residual connections and layer normalization to ensure training stability and feature transfer efficiency, efficient information fusion across time and features is achieved while reducing computational complexity, providing powerful representation learning capabilities for multivariate time series prediction.

[0111] Step S231: Based on the Token hybrid structure, the long-range and local dependencies of the multi-scale embedding representation are captured by fully connected layers and convolutional layers respectively, and the output feature is obtained by adaptive fusion through a learnable gating mechanism.

[0112] Specifically, the token hybrid structure adopts a dual-branch parallel structure, capturing long-range and local dependencies between patches through fully connected layers and convolutional layers respectively, and performing adaptive fusion through a learnable gating mechanism; firstly, multi-scale embedding representations are... After inputting into the Token hybrid structure, performing a two-dimensional batch normalization operation, and then replacing the dimension, the result is... This is for subsequent processing; next, the fully connected layer mixes the patch sequence with two linear transformations and GELU activation to capture the global dependencies between patches. The corresponding calculation formula is:

[0113]

[0114] in, Representation and Multi-Scale Embedding Representation The corresponding intermediate variable is the output of the fully connected layer; This indicates that the multi-scale embedding represents the permutation dimension. The dimension is restored to ; This represents the GELU activation function; This indicates that the multi-scale embedding represents the permutation dimension. from Expand to , This represents the expansion factor.

[0115] Then, the convolutional layer uses multiple one-dimensional convolutions to extract the local mode of the patch sequence, focusing on capturing short-term dependencies between adjacent patches. The corresponding calculation formula is:

[0116]

[0117] in, Representation and Multi-Scale Embedding Representation The corresponding intermediate variable is the output of the convolutional layer; This indicates that it contains several convolutional layers, each followed by GELU activation, and the convolutions are either depthwise separable convolutions or ordinary convolutions.

[0118] Finally, a gated weight generator is introduced to adaptively fuse the two-branch outputs. That is, adaptive fusion is performed through a learnable gating mechanism to obtain the first output feature of the token hybrid structure output. It is obtained through gating weighting and can dynamically adjust the contribution ratio of global and local information according to the input content. The corresponding calculation formula is as follows:

[0119]

[0120]

[0121] in, Indicates the first One output characteristic corresponding to each patch; Indicates the first The gating weights corresponding to each patch; This represents the Sigmoid activation function; This represents pointwise convolution; This indicates element-wise multiplication.

[0122] Step S232: Based on the feature fusion structure, after replacing the dimension of output feature one and performing batch normalization, the different feature dimensions are interacted and recombined to obtain output feature two.

[0123] Specifically, the feature fusion structure first combines the output feature one Replace the dimension, then perform batch normalization to become By interacting and recombining different feature dimensions, output feature two is obtained, which is obtained through two linear layers, that is, nonlinear transformation and information integration are performed on the feature dimensions. The corresponding calculation formula is:

[0124]

[0125] in, Indicates the first The output feature two corresponding to each patch; This indicates that feature 1 will be output. After processing The dimension from Expand to , Indicates the expansion factor; This represents the GELU activation function; This indicates that feature 1 will be output. After processing The dimension is restored to .

[0126] Step S233: Perform residual connection between output feature one and output feature two to obtain the output feature.

[0127] Specifically, to promote gradient flow and model stability, residual connections are used within each Mixer layer, and the corresponding calculation formula is as follows:

[0128]

[0129] in, This represents the output features of the first-layer Mixer module; similarly, the output features of each layer in the two-layer Mixer module are obtained to arrive at the final output features, i.e. .

[0130] Step S24: Input the output features into the prediction module, and map the output features into future time series prediction values ​​through feature flattening and linear projection to obtain the prediction results.

[0131] The prediction module is designed to be lightweight and consists of only two steps: feature flattening and linear projection. It maps the multi-scale features fused by the Mixer module, i.e., the output features, to future time series predictions of a specified length. While maintaining efficient computation, it ensures that discriminative time series information can be extracted from high-dimensional embeddings.

[0132] Specifically, firstly, the output features In the last two dimensions, namely the patch dimensions With feature dimension The above is flattened to convert it into a two-dimensional tensor. The corresponding calculation formula is:

[0133]

[0134] in, Indicates the corresponding output features The processed intermediate variables, and ; This indicates a flattening operation.

[0135] The flattening operation concatenates the embedding vectors of all patches within each channel to form a high-dimensional feature vector, preserving multi-scale and multi-location local temporal patterns. Then, a linear projection layer maps the high-dimensional features to a predicted sequence; that is, this projection layer reduces the feature dimension from... Directly mapped to the target prediction length The corresponding calculation formula is:

[0136]

[0137] in, Indicates the prediction result; This represents the linear projection layer. Similarly, the prediction results for all coefficient sequences are obtained.

[0138] Furthermore, step S3 includes:

[0139] Step S31: Using wavelet basis and decomposition series, the prediction results are fused through inverse discrete wavelet transform to generate a reconstructed sequence.

[0140] Specifically, this is the inverse transform of the aforementioned wavelet decomposition, which uses the same wavelet basis as the wavelet decomposition. sum decomposition series The multi-resolution coefficient sequences obtained from processing and prediction of each independent branch are recombined into a complete future time series. This is achieved by fusing the prediction results using inverse discrete wavelet transform to generate a reconstructed sequence of uniform length. The corresponding calculation formula is as follows:

[0141]

[0142] in, Represents the reconstructed sequence. ; Indicates wavelet reconstruction; and , They represent from the first Approximate coefficient sequence at each level and detail coefficient sequence at each level.

[0143] Step S32: Perform instance inverse normalization and dimension permutation operations on the reconstructed sequence to obtain the predicted sequence; that is, perform instance inverse normalization and dimension permutation operations on the reconstructed sequence. Restoring the original scale and distribution of the data yields the final prediction output, i.e., the prediction sequence, denoted as . .

[0144] It can be noted that the wavelet reconstruction module ensures a complete closed loop from multi-resolution analysis to full time series prediction, and is a key link connecting the internal representation of the model with the final prediction result.

[0145] Further, step S4 includes:

[0146] Step S41: Based on the real sequence and the predicted sequence, divide them into overlapping local patches along the time dimension, and construct the correlation coefficient loss, variance distribution loss and mean alignment loss in sequence, and integrate them to obtain the patch structure-aware loss function.

[0147] Specifically, given a real sequence Combined with predicted sequences The two sequences are divided into overlapping local patches along the time dimension, and the patches are obtained respectively. and ,and The corresponding calculation formula is:

[0148]

[0149]

[0150] in, This indicates a block operation.

[0151] Next, correlation coefficient loss, variance distribution loss, and mean alignment loss are constructed sequentially using patches corresponding to the real and predicted sequences. The corresponding calculation formulas are as follows:

[0152]

[0153]

[0154]

[0155] in, This indicates the correlation coefficient loss; Indicates the number of patches; Indicates the patch index; This represents the Pearson correlation coefficient; Indicates the variance distribution loss; Indicates KL divergence; This represents the mean alignment loss; , They represent the first The mean of the actual and predicted values ​​for each patch.

[0156] It can be explained that fixed weights are used instead of the dynamic weights in the classic patch-structure-aware loss function. Initial training is performed based on the dynamic weights, and the changing trends and convergence values ​​of the weights of each loss term are recorded. Based on experimental results, a set of fixed weight coefficients is determined through fine-tuning and remains unchanged in subsequent training. Using fixed weights instead of dynamic weights avoids the additional gradient calculations and potential instabilities introduced by dynamically adjusting weights during training, while reducing dependence on specific gradient response patterns, making the behavior of the loss function more predictable and facilitating consistency comparisons and transfer learning across different datasets and tasks. Therefore, the patch-structure-aware loss combines the correlation coefficient loss, variance distribution loss, and mean alignment loss through fixed weight coefficients, and the corresponding calculation formula is:

[0157]

[0158] in, This represents the patch structure-aware loss function; , , Both represent fixed weighting coefficients, and .

[0159] Step S42: The mean squared error loss function is combined with the patch structure-aware loss function to establish the total training loss function, which is used to optimize prediction accuracy and structural consistency.

[0160] Specifically, the mean squared error loss function, or MSE (Mean Squared Error), is denoted as... It quantifies the prediction error by calculating the average of the squares of the differences between the predicted and actual values, and then combines this with the patch structure-aware loss function to determine the total training loss function. The corresponding calculation formula is as follows:

[0161]

[0162] in, Represents the total training loss function; This indicates an adjustable fusion coefficient.

[0163] It can be seen that, based on the optimization of MSE, the correlation coefficient loss, variance distribution loss and mean alignment loss work together to further enhance the joint modeling capability of multi-scale time series patterns of short-term fluctuations, local statistical distributions and medium- and long-term trends, significantly improve the structural fidelity and overall consistency of the predicted sequence, and enhance the robustness and generalization ability of the model.

[0164] Understandably, by introducing an explicit wavelet decomposition mechanism—that is, the wavelet decomposition module decouples the preprocessed multivariate time-series prediction benchmark dataset into multi-resolution components—a structured multi-scale representation foundation is laid for subsequent parallel branch processing. Multiple independent branches are constructed to process each coefficient sequence in parallel, obtaining the corresponding prediction results through a series of processes. This enhances the extraction capability of local time-series features and suppresses noise interference, achieving local feature enhancement. Adaptive fusion of information at different time granularities is achieved; efficient information interaction is performed between the time dimension and the variable dimension, realizing cross-time and cross-variable interaction; and differentiated weight allocation is introduced to strengthen the modeling of cross-variable dependencies. The overall method, by optimizing the modeling strategy and computational path, significantly reduces computational redundancy while improving the ability to capture multi-scale dynamic patterns. It can provide more accurate, robust, and efficient prediction results in complex multivariate time-series prediction tasks, and is particularly suitable for handling practical application scenarios with high noise, non-stationarity, and coexisting multi-scale features.

[0165] To better illustrate and verify the reliability of the multivariate time series prediction method based on wavelet decomposition and multi-scale block hybrid proposed in this invention, based on the WDMPM proposed in this application, eight advanced models that have performed well in multivariate time series prediction tasks in recent years were selected as benchmarks for comparative experiments to ensure the superiority of the current model. Among them, (1) WPMixer (Wavelet-Position Mixer) decomposes the original sequence into subsequences of different frequencies through wavelet transform, and then uses a hybrid network to model the features of the subsequences respectively. (2) Fredformer (Frequency-aware Transformer) introduces a frequency domain bias correction module in the attention mechanism to eliminate frequency offset interference in time series data. (3) iTransformer adopts an inverted structure and channel-independent strategy, combined with a channel-independent feature modeling method, to reduce the information redundancy between channels of time series data. (4) Sensorformer (Sensor-aware Transformer) proposes a two-stage cross-block attention mechanism, first compressing global block information to obtain low-dimensional representation, and then simultaneously extracting cross-variable and cross-temporal dependencies. (5) PatchTST (Patch-based Time Series Transformer) divides long time series data into fixed-length patch units, combined with channel-independent self-attention modeling, significantly reducing the computational complexity of long sequence prediction. (6) MSGNet (Multi-Scale Graph Network) extracts sequence frequency features through frequency domain transformation, and simultaneously uses adaptive graph convolution to capture the dynamic dependencies of time series data. (7) TimesNet (Time-Series Network) maps one-dimensional time series data to two-dimensional tensors, and uses two-dimensional CNNs to capture the local and global features of the data. (8) DLinear (Deep Linear Model) proposes a prediction method based on the linear decomposition of trend and residual in time series data. It decomposes the original sequence into trend and residual terms, models them separately with linear models, and then fuses them for output. These models represent cutting-edge technologies in time series prediction methods based on wavelet decomposition, frequency domain processing, Transformer variants, multi-scale graph networks, and linear models.

[0166] Mean squared error (MSE) and mean absolute error (MAE) were used as quantitative indicators to measure prediction accuracy. The lower the value, the higher the prediction accuracy. Among them, MSE is more sensitive to larger errors, while MAE can more intuitively reflect the average deviation of the prediction. Table 1 shows the quantitative evaluation results indicators.

[0167] Table 1 Quantitative Evaluation Results Indicators

[0168]

[0169] Experimental results on five public datasets show that the prediction accuracy of this invention is superior to existing mainstream methods: it achieves the best prediction performance on four datasets: ETTh1, ETTh2, ETTm1 and Weather, and achieves the best in 20 out of a total of 25 MSE evaluation metrics, as well as the best in 20 out of MAE evaluation metrics, which intuitively demonstrates its overall advantage in prediction accuracy.

[0170] It can be noted that although the performance gap between this invention and the suboptimal method varies depending on the dataset and prediction length, it still demonstrates a stable advantage in most cases. Taking the ETT12 dataset as an example, the average MSE of this invention is reduced by 3.05% and the average MAE is reduced by 2.29% compared to the suboptimal model WPMixer; on the Weather dataset, when the prediction length is 96, the MSE is reduced by 5.63% compared to Fredformer. These data fully demonstrate that the method proposed in this invention maintains a robust leading advantage on the ETT series datasets, while achieving significant accuracy improvements on complex multivariate sequence prediction tasks such as Weather, verifying its effectiveness as a powerful solution for multivariate time series prediction.

[0171] Next, compared with WPMixer, which is also based on wavelet decomposition, this invention reduces GFLOPS (given number of floating-point operations per second) from 12.4292 to 5.3327 on the ETTm1 dataset when the prediction length is 192, a reduction of approximately 57.1%; when the prediction length is 720, GFLOPS decreases from 23.6044 to 10.1108, a reduction of approximately 57.2%. This improved model efficiency makes this invention more practical for processing long sequences and high-dimensional time-series data, enabling efficient prediction with limited computing resources.

[0172] This invention employs discrete wavelet transform to explicitly decompose time-series data, effectively separating multi-scale components such as short-term fluctuations and long-term trends. A parallel multi-branch architecture sequentially enhances local features through preprocessing convolution, achieves adaptive scale fusion through attention-weighted multi-scale block embedding, and utilizes a two-layer Mixer module to efficiently interact with the time and feature dimensions. The introduced patch structure-aware loss further constrains the consistency between the predicted sequence and the real data in terms of local structure and statistical distribution. The proposed WDMPM exhibits significant advantages in explicit multi-scale feature modeling and adaptive fusion, especially in complex multi-dimensional time-series scenarios, where its prediction accuracy and robustness significantly outperform existing time-series prediction methods.

[0173] A second embodiment of the present invention provides an electronic device, which includes a processor, a communication interface, a memory, and a communication bus. The processor, the communication interface, and the memory communicate with each other through the communication bus. The processor calls logical instructions in the memory to execute a multivariate time series prediction method based on wavelet decomposition and multi-scale block hybridization as described in any embodiment of the present invention.

[0174] When it is in operation, it needs to use a multivariate time series prediction method based on wavelet decomposition and multi-scale block hybridization. Therefore, whether the device and program data are integrated or different hardware is configured to produce a function with similar effect to that achieved by the present invention, it is within the protection scope of the present invention. The device has the same beneficial effect as the aforementioned multivariate time series prediction method based on wavelet decomposition and multi-scale block hybridization, which will not be elaborated here.

[0175] It should be noted that the order of the above embodiments of the present invention is merely for descriptive purposes and does not represent the superiority or inferiority of the embodiments. The processes depicted in the accompanying drawings do not necessarily require a specific or sequential order to achieve the desired result. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.

[0176] The various embodiments in this specification are described in a progressive manner. The same or similar parts between the various embodiments can be referred to each other. Each embodiment focuses on describing the differences from other embodiments.

Claims

1. A multivariate time series prediction method based on a hybrid wavelet decomposition and multi-scale block partitioning, characterized in that, The method includes: The preprocessed multivariate time series data is input into the wavelet decomposition module for explicit multi-resolution decomposition, resulting in an approximate coefficient sequence and a detail coefficient sequence, which correspond to information components at different time scales. Multiple independent branches are constructed, and each coefficient sequence is input into the independent branch for parallel processing. Local feature enhancement, multi-scale block adaptive fusion, cross-time and cross-variable interaction, and future value prediction are realized in sequence to obtain the prediction results of each independent branch. All prediction results are integrated and input into the wavelet reconstruction module for inverse transformation and fusion to generate a prediction sequence; Given a real sequence, a patch structure-aware loss function is constructed by combining it with the predicted sequence, and the mean squared error is used for joint supervised training to optimize prediction accuracy and structural consistency.

2. The multivariate time series prediction method based on wavelet decomposition and multi-scale block hybridization according to claim 1, characterized in that, The preprocessed multivariate time-series data is input into the wavelet decomposition module for explicit multi-resolution decomposition, yielding approximate coefficient sequences and detail coefficient sequences, which correspond to information components at different time scales, including: Based on multivariate time-series data, instance normalization and dimension permutation are performed sequentially. By pre-setting the wavelet basis and decomposition level, the preprocessed multivariate time series data is explicitly decomposed into multiple resolutions using the wavelet decomposition module, resulting in an approximate coefficient sequence and multiple detail coefficient sequences, corresponding to information components at different time scales.

3. The multivariate time series prediction method based on wavelet decomposition and multi-scale block hybridization according to claim 1, characterized in that, The number of independent branches corresponds to the coefficient sequence, and each independent branch has the same structure, which includes a preprocessing convolution module, a multi-scale block embedding module, a two-layer Mixer module that introduces convolution branches, and a prediction module arranged in sequence.

4. The multivariate time series prediction method based on wavelet decomposition and multi-scale block hybridization according to claim 3, characterized in that, Each coefficient sequence is input into an independent branch for parallel processing, sequentially implementing local feature enhancement, multi-scale block adaptive fusion, cross-time and cross-variable interaction, and future value prediction, yielding the prediction results for each independent branch, including: The coefficient sequence is input into the preprocessing convolution module, and is processed sequentially through the convolutional backbone and the gated weight generator to obtain enhanced features; The enhanced features are input into the multi-scale block embedding module, divided into multiple patches, and projected features are obtained through feature mapping. Attention weights are generated to adaptively fuse the projected features to obtain a multi-scale embedding representation. The multi-scale embedding representation is input into a two-layer Mixer module, and a parallel feature interaction mechanism is designed to fuse the time dimension and the feature dimension to obtain the output feature. The output features are input into the prediction module, and the output features are mapped to future time series prediction values ​​through feature flattening and linear projection to obtain the prediction results.

5. The multivariate time series prediction method based on wavelet decomposition and multi-scale block hybridization according to claim 4, characterized in that, The coefficient sequence is input into the preprocessing convolution module, and is processed sequentially through the convolutional backbone and the gated weight generator to obtain enhanced features, including: Local temporal features are extracted by deep convolution based on coefficient sequences. After batch normalization and activation function, initial features are obtained by pointwise convolution. After average pooling of the initial features, they are input into the gate weight generator, which generates gate weights in conjunction with the activation function. The enhanced features are obtained by fusing the initial features and gate weights through the gated residuals.

6. The multivariate time series prediction method based on wavelet decomposition and multi-scale block hybridization according to claim 4, characterized in that, The enhanced features are input into the multi-scale block embedding module, divided into multiple patches, and projected features are obtained through feature mapping. Attention weights are then generated to adaptively fuse the projected features to obtain a multi-scale embedding representation, including: Preset the patch length and patch sliding step size, perform patch division operation, determine the number of patches, and obtain multiple patches; Projected features are obtained by performing feature mapping on each patch at multiple time scales; Attention weights are generated by assigning a corresponding scalar attention score to each projected feature through a shared MLP. Multi-scale embedding representations are obtained by adaptively fusing projection features and attention weights.

7. The multivariate time series prediction method based on wavelet decomposition and multi-scale block hybridization according to claim 4, characterized in that, Multi-scale embedding representations are input into a two-layer Mixer module. A parallel feature interaction mechanism is designed to fuse the temporal and feature dimensions to obtain output features, including: The dual-layer Mixer module adopts a dual-layer hybrid structure of token mixing and feature mixing; Based on the Token hybrid structure, long-range and local dependencies of multi-scale embedding representations are captured through fully connected layers and convolutional layers respectively, and the output feature is obtained by adaptive fusion through a learnable gating mechanism. Based on the feature fusion structure, after replacing the dimension of the output feature one and performing batch normalization, the different feature dimensions are interacted and recombined to obtain the output feature two. Output feature 1 and output feature 2 are residually connected to obtain the output feature.

8. The multivariate time series prediction method based on wavelet decomposition and multi-scale block hybridization according to claim 2, characterized in that, All prediction results are integrated into the wavelet reconstruction module for inverse transform and fusion to generate a prediction sequence, including: By employing wavelet basis and decomposition series, the prediction results are fused through inverse discrete wavelet transform to generate a reconstructed sequence. The reconstructed sequence is subjected to instance inverse normalization and dimension permutation to obtain the predicted sequence.

9. The multivariate time series prediction method based on wavelet decomposition and multi-scale block hybridization according to claim 1, characterized in that, Given a real sequence, a patch structure-aware loss function is constructed by combining it with the predicted sequence, and joint supervised training is performed using mean squared error to optimize prediction accuracy and structural consistency, including: Based on the real sequence and the predicted sequence being divided into overlapping local patches along the time dimension, correlation coefficient loss, variance distribution loss and mean alignment loss are constructed in sequence and integrated to obtain the patch structure-aware loss function. The mean squared error loss function is combined with the patch structure-aware loss function to establish the total training loss function, which is used to optimize prediction accuracy and structural consistency.

10. An electronic device, characterized in that, The device includes a processor, a communication interface, a memory, and a communication bus. The processor, the communication interface, and the memory communicate with each other through the communication bus. The processor calls logical instructions in the memory to execute the multivariate time series prediction method based on wavelet decomposition and multi-scale block hybridization as described in any one of claims 1 to 9.