Lightweight long-term time series prediction method and system for industrial time series data
By using adaptive multi-scale decomposition and geometric sparse attention mechanism to process industrial time series data, this method solves the problem of insufficient prediction performance of traditional methods in complex scenarios, and achieves high-precision long-term multivariate time series prediction, which is applicable to fields such as energy monitoring and industrial scheduling.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- HANGZHOU NORMAL UNIVERSITY
- Filing Date
- 2026-05-21
- Publication Date
- 2026-06-19
AI Technical Summary
Existing industrial time series forecasting methods are difficult to effectively handle modern industrial data that is high-dimensional, nonlinear, has abrupt change points, or has long-term dependence. The prediction performance of traditional methods drops significantly in complex scenarios, and deep learning models are insufficient in dealing with the complexity, time-varying nature, and interpretability of industrial time series data.
An adaptive multi-scale decomposition module is adopted, which combines stacked stationary wavelet transform and geometric sparse attention mechanism. Through multi-scale decomposition and sparsification, trend and residual features are extracted. Discrete Fourier transform and one-dimensional convolution are combined to extract the correlation between variables, thus constructing a lightweight industrial time series prediction model.
It improves the accuracy of long-term multivariate time series forecasting, can capture complex fine-grained correlations between variables, and is applicable to fields such as energy monitoring and industrial scheduling. It has good practicality and promotion prospects and is suitable for integration into existing forecasting systems.
Smart Images

Figure CN122241386A_ABST
Abstract
Description
Technical Field
[0001] This invention belongs to the field of industrial time series forecasting technology, specifically relating to a lightweight long-term time series forecasting method and system for industrial time series data. Background Technology
[0002] Modern industrial systems are rapidly developing towards high informatization, integration, and intelligence. In various industrial production processes, such as high-end equipment manufacturing, petrochemicals, and semiconductor production, sensor networks are constantly collecting massive amounts of time-series operational parameter data, forming a vast amount of industrial time-series data. In this process, hundreds of millions of sensors are deployed in every corner of the industrial site, forming the "nerve endings" of the industrial system, continuously generating TB to PB levels of time-series data. This data—from the vibration spectrum of rotating equipment to the temperature curves of reactors, from the cycle time of production lines to the real-time power consumption of energy—is not only an operational record but also a "digital mine" for understanding equipment lifespan, process bottlenecks, energy efficiency shortcomings, and key aspects of quality control. This data not only contains a complete picture of equipment operating status but also harbors crucial information predicting future trends. How to accurately extract patterns and predict the future from this data has become one of the core challenges driving the intelligent upgrading of industry. Industrial time-series prediction technology has emerged in this context and become a current research hotspot, and it is already closely related to daily life.
[0003] Traditional time series forecasting methods are primarily based on mathematical statistics. Classic methods, such as autoregressive moving average models and their seasonal extensions, have been industry standard tools for decades by modeling the linear dependencies of time series. These methods offer advantages such as clear theory and strong interpretability, but their linear assumptions, strict requirements for data stationarity, and limitations of manual feature engineering make them ill-suited to increasingly complex real-world scenarios. Especially when dealing with high-dimensional, nonlinear, abrupt, or long-period dependent modern industrial and internet data, the predictive performance of traditional methods often declines significantly. The rise of machine learning has brought a new paradigm to time series forecasting. Support vector regression captures nonlinear relationships through kernel function techniques, while ensemble learning methods such as gradient boosting and decision trees can model complex feature interactions. While these methods improve nonlinear fitting capabilities, they essentially still process time series data into independent, identically distributed feature-label pairs, failing to fully exploit the inherent temporal dynamic dependencies between data points. The limitations of this modeling approach become increasingly apparent, particularly when dealing with long series, multi-period superposition, or scenarios influenced by external variables.
[0004] In recent years, deep learning technology, especially neural network architectures specifically designed for sequential data, has greatly promoted the development of the field of time series prediction. Long Short-Term Memory (LSTM) networks and gated recurrent units (GRUs), through their gating mechanisms, effectively solve the problem of long-term dependency learning, becoming milestone models for handling sequence prediction tasks. Temporal convolutional networks employ causal dilated convolution structures, achieving efficient parallel computation while maintaining temporal order constraints. Meanwhile, the Transformer architecture and its variants, originating from natural language processing, capture global dependencies within sequences through self-attention mechanisms, demonstrating breakthrough performance in long sequence prediction tasks. Models such as Informer and Autoformer, through innovative mechanisms such as ProbSparse self-attention and sequence decomposition architectures, have significantly improved the accuracy and efficiency of long-term time series prediction. Summary of the Invention
[0005] The purpose of this invention is to provide a lightweight long-term time series forecasting method and system for industrial time series data.
[0006] In a first aspect, the present invention provides a lightweight long-term time series forecasting method for industrial time series data, the method comprising:
[0007] Obtain historical observations of the target object containing different variables, and preprocess the historical observations;
[0008] An industrial time-series data prediction model is constructed. This model includes an adaptive multi-scale decomposition module, a trend term feature extraction module, a residual term feature extraction module, and a fusion module. The adaptive multi-scale decomposition module splits the input features into trend terms and residual terms. The trend term feature extraction module extracts trend term features from the trend terms. The residual term feature extraction module incorporates a geometrically sparse attention module to process the residual terms. The residual term feature extraction module extracts the correlation between variables in the residual terms across different frequency bands using the geometrically sparse attention module to obtain residual term features. The fusion module fuses the trend term features and residual term features to obtain the output of the industrial time-series data prediction model.
[0009] The preprocessed historical observations are input into the industrial time series data prediction model to obtain the variables of the measured target at future times, thus completing the prediction of industrial time series data.
[0010] Preferably, in the geometric sparse attention module, the norm squared tensors of the value matrix and the key matrix, as well as the similarity matrix between them, are obtained respectively; the wedge product norm is obtained based on the two norm squared tensors and the similarity matrix, and the similarity matrix and the wedge product norm are weighted and fused to obtain a mixed score matrix; the mixed score matrix is sparsified to obtain a sparse score matrix; the sparse score matrix is weighted and normalized, and then multiplied with the query matrix to obtain the attention features output by the geometric sparse attention module.
[0011] Preferably, the wedge product norm The method to obtain it is as follows:
[0012]
[0013] in, The norm squared tensor of the value matrix; Let be the norm squared tensor of the key matrix; This is a similarity matrix.
[0014] The similarity matrix is the product of the transpose of the key matrix and the value matrix.
[0015] Preferably, the residual term feature extraction module includes a stacked stationary wavelet transform module, a geometrically sparse attention module, an inverse stacked stationary wavelet transform module, and a linear mapping layer connected in sequence. The stacked stationary wavelet transform module is used to perform stacked stationary wavelet transform on the residual term and obtain multi-scale signal variables based on the decomposition results. The geometrically sparse attention module is used to dynamically focus on key parts of the multi-scale signal variables to obtain attention features. The inverse stacked stationary wavelet transform module is used to inversely transform the attention features into data augmentation features. The linear mapping layer is used to map the data augmentation features to the prediction range to obtain residual term features.
[0016] Preferably, the trend feature extraction module includes a spatiotemporal feature extraction module and a fusion submodule connected in series; the spatiotemporal feature extraction module includes a time domain feature extraction module and a spatial domain feature extraction module;
[0017] In the time-domain feature extraction module, the trend term is transformed from the time domain to the frequency domain to obtain the frequency domain features; complex weight parameters are introduced to modulate the frequency domain features element by element, and the modulated features are restored to the time domain and fused with the trend term through residual connection to obtain the time-domain features;
[0018] In the spatial domain feature extraction module, the spatial domain features are obtained by exchanging the variable dimension and the time dimension of the time domain features and then applying one-dimensional convolution.
[0019] The fusion submodule is used to fuse time-domain features and spatial-domain features to obtain trend feature.
[0020] As a preferred embodiment, in the adaptive multi-scale decomposition module, multiple depthwise separable convolutional layers of different scales are used in parallel to extract input features at different time steps to obtain trend features corresponding to multiple time steps; the trend features corresponding to multiple time steps are fused to obtain a trend term; and the residual term is obtained by subtracting the trend term from the input features.
[0021] Preferably, the trend features are fused by introducing multiple weight parameters; by calculating the Softmax distribution of the weight parameters, the trend features corresponding to multiple time steps are weighted and aggregated to obtain the trend item.
[0022] Preferably, the industrial time-series data prediction model is trained using a dataset constructed from historical observations at different times, and a total loss function including a basic loss term and a regularization term is constructed to guide the updating of model parameters; the basic loss term is used to supervise prediction accuracy; and the regularization term is used to improve the interpretability and sparsity of the geometric sparse attention mechanism.
[0023] Preferably, the preprocessing method is as follows: the historical observations are reversibly instance-normalized in the input industrial time series data prediction model, and then inverted and embedded after normalization.
[0024] Secondly, the present invention provides a lightweight long-term time series forecasting system for industrial time series data, which is used to perform the above-mentioned lightweight long-term time series forecasting method; the lightweight long-term time series forecasting system includes a data extraction module, a preprocessing module, and an industrial data forecasting module;
[0025] The data extraction module is used to extract historical observation values of different variables of the target under test; the preprocessing module is used to convert the historical observation values into data suitable for input into the industrial data prediction module; the industrial data prediction module is used to predict the variables of the target under test at future times based on the data output by the preprocessing module.
[0026] Thirdly, the present invention provides a computer device comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the memory stores the computer program; and the processor executes the aforementioned lightweight long-term time series prediction method.
[0027] Fourthly, the present invention provides a readable storage medium storing a computer program; when executed by a processor, the computer program is used to implement the aforementioned lightweight long-term time series prediction method.
[0028] The beneficial effects of this invention are:
[0029] 1. This invention combines stacked stationary wavelet transform and geometric sparse attention mechanism to extract residual term features from the residual term. The time-domain signal is transformed to the multi-scale frequency domain through stacked stationary wavelet transform, and the geometric sparse attention mechanism combines dot product and wedge product to calculate the geometric correlation between variables, thereby effectively capturing the temporal pattern of the residual term under short-term fluctuations. At the same time, this invention uses Top_k sparsity to process features in the geometric sparse attention module. The Top_k sparsity strategy highlights key dependencies and reduces computational complexity.
[0030] 2. In the adaptive multi-scale decomposition module, this invention uses multiple depthwise separable convolutional layers of different scales to extract trend features under different receptive fields in parallel, and integrates them using a Softmax distribution of weight parameters. This enables the accurate handling of variable specificity in multivariate time series and enhances the model's ability to capture trend changes. Simultaneously, in the time domain, this invention combines discrete Fourier transform and complex weight parameters to extract trend features from the trend term, allowing the model to selectively focus on different frequency components. In the spatial domain, one-dimensional convolution is used to extract the correlation between variables and fuse it with the features extracted in the time domain to obtain the trend term output.
[0031] 3. While ensuring the model's lightweight nature, this invention effectively improves the accuracy of long-term multivariate time series forecasting, particularly excelling at capturing complex fine-grained correlations and multi-scale time series patterns among variables. It can be widely applied in fields such as energy monitoring, industrial scheduling, and traffic flow forecasting. Furthermore, based on the needs of intelligent manufacturing and the actual pain points of industrial sites, this invention addresses the shortcomings of existing deep learning models in handling the complexity, time-varying nature, and interpretability of industrial time series data. Its modular design also facilitates integration into existing forecasting systems or edge computing devices, demonstrating excellent practicality and promising prospects. It not only possesses significant theoretical innovation value but also has urgent practical significance for promoting the process of industrial intelligence. Attached Figure Description
[0032] Figure 1 This is the overall flowchart of the present invention.
[0033] Figure 2 This is a framework diagram of the trend item feature extraction module in this invention.
[0034] Figure 3 A framework diagram for stacked stationary wavelet transform.
[0035] Figure 4 This is a framework diagram of the geometric sparse attention module in this invention.
[0036] Figure 5 This is a heatmap showing the correlation between features in this invention. Detailed Implementation
[0037] The present invention will be further described below with reference to the accompanying drawings.
[0038] A lightweight long-term time series forecasting method for industrial time series data is proposed. The lightweight long-term time series forecasting system includes a data extraction module, a preprocessing module, and an industrial data forecasting module. The data extraction module extracts historical observations of different variables of the measured target. The preprocessing module converts the historical observations into data suitable for input into the industrial data forecasting module. The industrial data forecasting module predicts the variables of the measured target at future times based on the data output from the preprocessing module.
[0039] like Figure 1 As shown, this lightweight long-term time series forecasting method includes the following steps:
[0040] Step 1: Building the dataset
[0041] Historical observations based on time series Construct a dataset; where X ; The length of the input sequence; This represents the number of variables. After unifying all variables in the dataset to the same data distribution using RevIN (Reversible Instance Normalization), the normalized data is obtained. It is represented as:
[0042]
[0043] in, The first in historical observations The variable in the first... The value of each time step; and They represent the first The mean and standard deviation of each variable at all time steps.
[0044] Step 2: Data Embedding
[0045] For normalized historical observations After inverting and embedding it, high-dimensional time series data is obtained. It is represented as:
[0046]
[0047] in, Represents a mapping; This indicates the size of the higher-dimensional dimension after mapping.
[0048] Step 3: Construct an industrial time-series data prediction model
[0049] The industrial time-series data prediction model includes an Adaptive Multi-Scale Decomposition (AMSD) module, a trend term feature extraction module, a residual term feature extraction module, and a fusion module. The AMSD module decomposes the input features into trend and residual terms. The trend and residual term feature extraction modules extract trend and residual features from the trend and residual terms, respectively. The fusion module combines the trend and residual features to obtain the output of the industrial time-series data prediction model.
[0050] (1) Adaptive multi-scale decomposition module
[0051] In the adaptive multi-scale decomposition module, multiple depthwise separable convolutional layers with different scales (kernel size) are used. Parallel extraction of trend features under different receptive fields By employing channel-independent convolutional design, it can accurately handle the variable specificity of multivariate time series, enhancing the model's ability to capture trend changes and trend features. Represented as:
[0052]
[0053] in, For the first Trend characteristics corresponding to each variable; This represents a depthwise separable convolutional layer; This indicates the kernel size; different kernel sizes correspond to different time scales. N represents the number of depth-separable convolutional layers. This represents the size of a set of convolutional kernels.
[0054] To integrate trend features across different time scales, a set of trainable weight parameters is introduced. By calculating weight parameters The trend term is obtained by weighting and aggregating the trend features extracted at each scale using the Softmax distribution. It is represented as:
[0055]
[0056] in, and These are weight parameters; Indicates the kernel size as It aggregates the trend characteristics of all variables.
[0057] In obtaining trend items Then, the residual term is obtained by subtracting the trend term from the high-dimensional time series data. It is represented as:
[0058]
[0059] (2) Trend Item Feature Extraction Module
[0060] like Figure 2 As shown, the trend term feature extraction module includes a cascaded spatiotemporal feature extraction module and a fusion submodule. The spatiotemporal feature extraction module includes a time domain feature extraction module and a spatial domain feature extraction module. In the time domain feature extraction module, the discrete Fourier transform is used to extract the trend term. Transform from the time domain to the frequency domain to obtain frequency domain features. It is represented as:
[0061]
[0062] in, This represents the Discrete Fourier Transform.
[0063] To enable the model to selectively focus on different frequency components, a set of learnable complex weight parameters is introduced. These weights are independent of the variable channels and shared across all channels, used to adjust the frequency domain features. Element-by-element modulation; frequency domain characteristics after modulation Represented as:
[0064]
[0065] in, The weighting parameter is a complex number. This indicates element-wise multiplication.
[0066] Frequency domain features are obtained through inverse discrete Fourier transform. The data is restored to the time domain, and residual connections are introduced to preserve the original time domain information, thus obtaining the time domain features. It is represented as:
[0067]
[0068] in, Representation layer normalization; This represents the inverse fast Fourier transform.
[0069] In the spatial domain feature extraction module, temporal domain features are extracted. After swapping the variable and time dimensions, a one-dimensional convolution operation is applied to obtain the spatial domain features. It is represented as:
[0070]
[0071] in, This represents a one-dimensional convolution operation.
[0072] In the fusion submodule, spatial domain features are... After exchanging the variable dimension and the time dimension, and then combining them with time-domain features... The trend term features are obtained by fusing through residual connections. It is represented as:
[0073]
[0074] in, Representation layer normalization.
[0075] (3) Residual Term Feature Extraction Module
[0076] The residual term feature extraction module includes a stacked stationary wavelet transform (SSWT) module, a geometrically sparse attention module, an inverse stacked stationary wavelet transform (ISSWT) module, and a linear mapping layer. The stacked stationary wavelet transform module is used to extract features from the residual term. A stacked stationary wavelet transform is performed, and multi-scale signal variables are obtained based on the decomposition results. A geometrically sparse attention module is used to dynamically focus on key components of the multi-scale signal variables to obtain attention features. An inverse stacked stationary wavelet transform module is used to inversely transform the attention features into data augmentation features (time series data). A linear mapping layer is used to map the data augmentation features to the prediction range to obtain residual term features.
[0077] like Figure 3 As shown, in the stacked stationary wavelet transform module, the stacked stationary wavelet transform is used to process the residual terms. Perform decomposition and extract all detail coefficients obtained from the decomposition. Approximate coefficients from the last decomposition Stacking is performed to obtain multi-scale signal variables. The stacking operation adds a new dimension, resulting in multi-scale signal variables. Represented as:
[0078]
[0079] in, This represents the stacked stationary wavelet transform; For the core size is Learnable low-pass and high-pass filters; Represents a stacking operation; This is the decomposition scale.
[0080] like Figure 4 As shown, in the geometrically sparse attention module, a one-to-one "scale-attention head" design is adopted to handle multi-scale signal variables. As input, obtain the value matrix for each attention head. AND key matrix The squared L2 norm and the similarity matrix between them It is represented as:
[0081]
[0082]
[0083] in, and Value matrices Bond matrix The corresponding norm squared tensor; The value matrix represents the first One vector; Represents the first in the key matrix A vector.
[0084] To extract correlations between variables, this invention combines dot product with wedge product, calculating a norm-based squared tensor. Sum-norm square tensor and similarity matrix Obtaining the wedge product norm , similarity matrix Sum of wedge product norm Weighted fusion is performed to obtain a mixed score matrix. It is represented as:
[0085]
[0086]
[0087] in, These are weighting coefficients. .
[0088] For mixed fractional matrix Top_k sparsification is used (for mixed fractional matrices) (For each dimension of the score, only the k highest scores are retained) to obtain a sparse score matrix. ;use functions and Layer-to-sparse fractional matrix After performing weight normalization and preventing overfitting, the attention weight matrix is obtained. ;The attention weight matrix Multiplying the result by the query matrix V yields the attention features. Its formula is expressed as follows:
[0089]
[0090]
[0091]
[0092] in, These are elements in a sparse fractional matrix; Represents the mixed fraction matrix The score in the i-th row and j-th column; This indicates taking the highest score. One value; ; For querying the matrix.
[0093] The inverse stacked stationary wavelet transform module uses inverse stacked stationary wavelet transform to apply attention features. Inverse transformation into data augmentation features The data augmentation features are then mapped to the prediction range through a linear mapping layer to obtain residual term features. It is represented as:
[0094]
[0095]
[0096] in, These are wavelet reconstruction filters, corresponding to the "low-frequency reconstruction kernel" and the "high-frequency reconstruction kernel," respectively, used to resynthesize the decomposed wavelet coefficients (low-frequency approximation coefficients and high-frequency detail coefficients) back into the original signal.
[0097] (4) Fusion module
[0098] In the fusion module, trend item features are... and residual characteristics The results are summed, and the sum is then inversely normalized back to the original data distribution to obtain the output of the industrial time series data prediction model. It is represented as:
[0099]
[0100] in, This indicates the inverse normalization operation; Indicates the predicted length.
[0101] Step 4: Model Training
[0102] This invention trains an industrial time-series data prediction model using a dataset. The invention utilizes the Adam optimizer to learn the total loss function. To simultaneously optimize the model's prediction accuracy and structural sparsity, the designed total loss function includes a basic loss term that supervises prediction accuracy, and a regularization term aimed at improving the interpretability and sparsity of the geometrically sparse attention mechanism. Total Loss Function Defined as:
[0103]
[0104] in, Basic loss term; For regularization terms; and These represent the actual value and the predicted value, respectively. This represents the attention weight matrix learned by the model; It is an adjustable hyperparameter used to balance the contributions of the two losses.
[0105] The base loss term uses mean squared error (MSE) to better penalize larger prediction errors, and it is obtained as follows:
[0106]
[0107] in, Indicates the number of samples.
[0108] Sparse attention regularization term Acting on the attention weight matrix It is represented as:
[0109] (25)
[0110] in, This indicates taking the average value.
[0111] Step 5: Model Evaluation
[0112] This invention validates its method on six long-term industrial time series forecasting benchmark datasets, including the Solar Energy dataset, the Electricity Dataset (ECL), and four ETT datasets (ETTh1, ETTh2, ETTm1, and ETTm2). For the four ETT datasets, the training, validation, and test sets are partitioned in a 6:2:2 ratio; for the other two datasets, a 7:1:2 ratio is used. All datasets use a sequence length of 96 and a batch size of 256. The wavelet transform decomposition scale is set to 3 by default (corresponding to high, medium, and low frequency signals). This invention (GeoSAT) is compared with multi-time series data forecasting models such as TimeKAN, MSGNet, and TimeMixer, and the comparison is performed using MSE (mean squared error) and MAE (mean absolute error) evaluation metrics. The specific comparison results are shown in Table 1 (red represents the best performance, and blue represents the next best).
[0113] Table 1. Multivariate long-term time series prediction results of different models on 6 real public datasets.
[0114]
[0115]
[0116] As can be seen from Table 1, the method proposed in this invention achieves the best overall performance.
[0117] The model performance and resource utilization of this invention and existing multi-time series data prediction models were evaluated based on mean squared error (MSE), total number of parameters, GPU memory usage (MB), and training time (seconds). The results are shown in Table 2.
[0118] Table 2 Comparison of model performance and resource utilization on different datasets
[0119]
[0120] As can be seen from the data in Table 2, the present invention has achieved the best results in terms of both model size and training speed, which shows that the present invention has the best lightweight effect.
[0121] To further verify the effectiveness of the present invention, the fitting results of the present invention and existing multi-time series data prediction models were compared. The Solar dataset was selected, and the selected backtracking window size was 96. The prediction lengths from top to bottom were {96, 192, 336, 720}.
[0122] To verify the effectiveness of the model's attention mechanism in capturing correlations between variables, a correlation comparison was performed on features before and after attention. The ETTh1 dataset was selected, specifically as follows: Figure 5 As shown. By Figure 5 It can be seen that the correlation between variables 4 and 6 in the dataset was negative before attention was applied, but the correlation was significantly improved after attention enhancement. Variables 4 and 6 represent MULL and OT (oil temperature), respectively. Physically speaking, medium-level loads are the most common loads in the real world. Unused loads lead to additional energy losses, which are ultimately dissipated as heat, thus heating the transformer oil and causing the oil temperature to rise. This means that when MULL increases, it affects the change in OT. The attention function of this invention can capture and amplify this correlation, increasing the weight of the correlation coefficient between the variables.
Claims
1. A lightweight long-term time series forecasting method for industrial time series data, characterized in that: The method includes: Obtain historical observations of the target object containing different variables, and preprocess the historical observations; An industrial time-series data prediction model is constructed. This model includes an adaptive multi-scale decomposition module, a trend term feature extraction module, a residual term feature extraction module, and a fusion module. The adaptive multi-scale decomposition module splits the input features into trend terms and residual terms. The trend term feature extraction module extracts trend term features from the trend terms. The residual term feature extraction module incorporates a geometrically sparse attention module to process the residual terms. The residual term feature extraction module extracts the correlation between variables in the residual terms across different frequency bands using the geometrically sparse attention module to obtain residual term features. The fusion module fuses the trend term features and residual term features to obtain the output of the industrial time-series data prediction model. The preprocessed historical observations are input into the industrial time series data prediction model to obtain the variables of the measured target at future times, thus completing the prediction of industrial time series data.
2. The lightweight long-term time series forecasting method for industrial time series data according to claim 1, characterized in that: In the geometric sparse attention module, the norm squared tensors of the value matrix and the key matrix, as well as the similarity matrix between them, are obtained respectively. The wedge product norm is obtained based on the two norm squared tensors and the similarity matrix, and the similarity matrix and the wedge product norm are weighted and fused to obtain the mixed fraction matrix. The mixed fraction matrix is then sparsified to obtain the sparse fraction matrix. After normalizing the sparse fraction matrix by weights, multiply it by the query matrix to obtain the attention features output by the geometric sparse attention module.
3. The lightweight long-term time series forecasting method for industrial time series data according to claim 2, characterized in that: The wedge norm The method to obtain it is as follows: in, The norm squared tensor of the value matrix; Let be the norm squared tensor of the key matrix; This is a similarity matrix.
4. The lightweight long-term time series forecasting method for industrial time series data according to claim 1, characterized in that: The residual term feature extraction module includes a stacked stationary wavelet transform module, a geometric sparse attention module, an inverse stacked stationary wavelet transform module, and a linear mapping layer connected in sequence; the stacked stationary wavelet transform module is used to perform stacked stationary wavelet transform on the residual term and obtain multi-scale signal variables based on the decomposition results; The geometrically sparse attention module is used to dynamically focus on key parts of multi-scale signal variables to obtain attention features; The inverse stacked smooth wavelet transform module is used to inversely transform attention features into data augmentation features; The linear mapping layer is used to map data augmentation features to the prediction range to obtain residual term features.
5. A lightweight long-term time series forecasting method for industrial time series data according to claim 1, characterized in that: The trend feature extraction module includes a spatiotemporal feature extraction module and a fusion submodule connected in series; the spatiotemporal feature extraction module includes a time domain feature extraction module and a spatial domain feature extraction module; In the time-domain feature extraction module, the trend term is transformed from the time domain to the frequency domain to obtain the frequency domain features; complex weight parameters are introduced to modulate the frequency domain features element by element, and the modulated features are restored to the time domain and fused with the trend term through residual connection to obtain the time-domain features; In the spatial domain feature extraction module, the spatial domain features are obtained by exchanging the variable dimension and the time dimension of the time domain features and then applying one-dimensional convolution. The fusion submodule is used to fuse time-domain features and spatial-domain features to obtain trend feature.
6. A lightweight long-term time series forecasting method for industrial time series data according to claim 1, characterized in that: In the adaptive multi-scale decomposition module, multiple depthwise separable convolutional layers of different scales are used in parallel to extract input features at different time steps, thereby obtaining trend features corresponding to multiple time steps. By fusing trend features corresponding to multiple time steps, a trend term is obtained; the residual term is obtained by subtracting the trend term from the input features.
7. A lightweight long-term time series forecasting method for industrial time series data according to claim 6, characterized in that: The trend features are fused as follows: multiple weight parameters are introduced; the trend features corresponding to multiple time steps are weighted and aggregated by calculating the Softmax distribution of the weight parameters to obtain the trend item.
8. A lightweight long-term time series forecasting method for industrial time series data according to claim 1, characterized in that: The industrial time-series data prediction model is trained using a dataset constructed from historical observations at different times, and a total loss function including a basic loss term and a regularization term is constructed to guide the update of model parameters; the basic loss term is used to supervise prediction accuracy; the regularization term is used to improve the interpretability and sparsity of the geometric sparse attention mechanism.
9. A lightweight long-term time series forecasting method for industrial time series data according to claim 1, characterized in that: The preprocessing method is as follows: the historical observations are reversibly normalized in the input industrial time series data prediction model, and then inverted and embedded after normalization.
10. A lightweight long-term time series forecasting system for industrial time series data, characterized in that: This system is used to execute a lightweight long-term time series forecasting method for industrial time series data as described in claim 1. The lightweight long-term time series forecasting system includes a data extraction module, a preprocessing module, and an industrial data forecasting module. The data extraction module is used to extract historical observations of different variables of the target being measured. The preprocessing module is used to convert the historical observations into data suitable for input into the industrial data forecasting module. The industrial data forecasting module is used to predict the variables of the target being measured at future times based on the data output by the preprocessing module.