An industrial time series data anomaly detection method based on a space-time diagram attention network

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
By constructing a spatiotemporal graph and using a spatiotemporal graph attention network to extract the spatiotemporal features of time-series data, the problem of unconsidered sensor relationships is solved, and higher anomaly detection accuracy and sensor anomaly localization capability are achieved.

CN117272196BActive Publication Date: 2026-06-16ZHEJIANG UNIV OF TECH

View PDF 0 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Patents(China)
Current Assignee / Owner: ZHEJIANG UNIV OF TECH
Filing Date: 2023-08-23
Publication Date: 2026-06-16

Application Information

Patent Timeline

23 Aug 2023

Application

16 Jun 2026

Publication

CN117272196B

IPC: G06F18/2433; G06F18/25; G06N3/042; G06N3/0464; G06F123/02

CPC: G06F18/2433; G06F18/253; G06N3/042; G06N3/0464; G06F2123/02

AI Tagging

Application Domain

Biological models

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

⚠Technical Problem

Existing technologies fail to effectively consider the potential relationships between sensors when processing industrial time-series data, resulting in insufficient accuracy in detecting anomalies when processing time-series data with paired coupling characteristics.

⚗Method used

A spatiotemporal graph is constructed from time-series data of industrial sensors. The spatiotemporal features of the time-series data are extracted through a spatiotemporal graph attention network, and spatial features are extracted using a graph convolutional network. Combined with the periodic variation features of the time-series data, the features are fused through the spatiotemporal graph attention network to establish an anomaly detection model.

🎯Benefits of technology

It improves the accuracy of anomaly detection in industrial time-series data, effectively identifies abnormal situations and locates abnormal sensors, and enhances the stability and reliability of the model.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN117272196B_ABST

Patent Text Reader

Abstract

The application discloses an industrial time series data anomaly detection method based on a space-time graph attention network, comprising the following steps: 1) constructing a sensor time series full connection graph, connecting sensor data nodes two by two to form a complete graph, wherein the node represents the sensor data, and the edge weight represents the dependence relationship between the sensors; 2) adopting a graph convolution network to extract the space information of the time series full connection graph to generate a space feature vector; 3) adopting TimesNet to extract the time information of the time series full connection graph to generate a time feature vector; 4) adopting a space-time graph attention network to fuse the time feature vector and the space feature vector; 5) using historical normal time series data to train the model, and inputting the space-time feature vector into a full connection layer to obtain a predicted value of each time step; and 6) for abnormal data in a test set, calculating an anomaly score according to the difference between the predicted value and the observed value, and adaptively calculating a threshold value of the abnormal data by a Boxplot method.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the fields of industrial time-series data and anomaly detection, and specifically to an anomaly detection method for industrial time-series data based on spatiotemporal graph attention networks. Background Technology

[0002] With the rapid development of digitalization in manufacturing, production lines and intelligent equipment have achieved real-time perception of their operating status and environment through sensors, controllers, and smart instruments, generating a large amount of industrial time-series data. Anomaly detection using time-series data can promptly identify abnormal situations, allowing for appropriate measures to prevent accidents.

[0003] In recent years, scholars have proposed prediction-based anomaly monitoring methods using autoregressive models (ARIMA, VAR) and recurrent neural networks (LSTM, GRU). These methods use historical time-series data for model training to capture the trends, periodicities, and other regularities inherent in the time-series data. Then, they compare the actual observed values with the model's predicted regularities and detect anomalies by calculating the error. Maya et al. used a hierarchical long short-term memory network (International Journal of Data Science and Analytics, 2019) to select the data closest to the measured value from multiple candidate predicted values for model training, detecting anomalies based on the error between the predicted and measured values. Liu et al. combined a Bayesian model with the isolated forest algorithm (Journal of Cleaner Production, 2020) and used autoregressive methods to predict residuals for tap water quality anomaly detection. Wang et al. cascaded a moving autoregressive model with an artificial neural network (Applied Energy, 2020) to achieve high accuracy in both linear and nonlinear power load anomaly detection. Vos et al. combined Long Short-Term Memory networks with Support Vector Machines (Mechanical Systems and Signal Processing, 2021) to enable the model to better capture statistical characteristics during regression prediction. However, the above method faces difficulties in processing time-series data with potentially paired coupling characteristics, as it does not consider the potential relationships between sensors in the model construction.

[0004] Graph Neural Networks (GNNs) have received widespread attention in fields such as time series prediction and fault diagnosis. For example, Feng et al. used a diffuse graph convolutional network (IEEE Transaction on Instrumentation and Measurement, 2020) to accurately predict the final composition by utilizing the correlation between element concentrations during steelmaking. Deng et al.'s GDN (The Association for the Advance of Artificial Intelligence, 2021) simultaneously inputs the learned graph structure and time series data into a multi-layer graph attention network to obtain predicted values at each time step, and uses the L2 distance between outliers and predicted values as the outlier score.

[0005] By utilizing GNNs to accurately extract rich spatiotemporal features from industrial time-series data and capturing regular patterns in the time-series data based on these features, the performance and accuracy of anomaly detection in industrial time-series data can be improved, which has significant application value. Summary of the Invention

[0006] To improve the accuracy of anomaly detection by mining the temporal and spatial features contained in industrial time-series data, this invention proposes an anomaly detection method for industrial time-series data based on a spatiotemporal graph attention network. First, the industrial sensor time-series data is constructed into a spatiotemporal graph, and then the spatiotemporal features of the time-series data are extracted through the spatiotemporal graph attention network.

[0007] A method for anomaly detection in industrial time-series data based on spatiotemporal graph attention networks includes the following steps:

[0008] 1) Data preprocessing:

[0009] 1.1) Data normalization

[0010] Since different sensors have different data characteristics, the data from each sensor needs to be normalized separately. The calculation formula is as follows:

[0011]

[0012] in s is the normalized value of the i-th sensor. i This represents the data collected by the i-th sensor. and Let represent the mean and variance of the original measurement values of the i-th sensor, respectively;

[0013] 1.2) Time-series data partitioning

[0014] The sliding window segmentation technique divides sensor time-series data into multiple windows, making the data distribution within each window more stable, which helps improve the stability and reliability of the model. At time t, the input of this model is the historical time-series data of each sensor before time t. It uses a sliding window of size w to calculate the average value of the data within the window, which is expressed as:

[0015]

[0016] 2) Construction of the fully connected temporal graph:

[0017] A temporal fully connected graph is a fully connected graph composed of pairwise connected data nodes, which can be represented as G = (V, E), where V is the set of nodes, each node representing a time window of sensor data; E is the set of edges, e ij ∈E represents a weighted edge between node i and node j. The weight of the edge is calculated using the following formula:

[0018]

[0019]

[0020] Where C i represents the set of candidate nodes excluding node i; x represents the time window data.

[0021] The adjacency matrix of the time-series fully connected graph of N sensors is represented as follows: The values of the elements in the matrix are the weights calculated above, represented as follows:

[0022] A ij =e ij for j∈C i (5)

[0023] 3) Spatiotemporal graph attention network framework:

[0024] 3.1) Spatial Feature Extraction of Time Series Data

[0025] This invention first employs a Graph Convolutional Network (GCN) to extract spatial features from time-series data. Through adaptive aggregation of node features, it integrates different types of sensor data into a unified vector representation. The calculation formula for graph convolution is as follows:

[0026]

[0027]

[0028] Where v (l)The δ represents the characteristic of each node in the l-th layer, δ represents the nonlinear transformation, and I represents the identity matrix. Represents the adjacency matrix. express The corresponding degree matrix, W (l) This represents the weight matrix of the l-th layer.

[0029] The spatial feature vectors of each sensor obtained from the fully connected timing graph using GCN are represented as follows:

[0030]

[0031] in Let d represent the spatial feature vector of the i-th sensor at time t, d represent the dimension of the vector, and N represent the number of sensor nodes.

[0032] 3.2) Extraction of temporal features from time series data

[0033] Industrial time-series data often consists of the superposition of multiple change processes, exhibiting both intra-cycle and inter-weekly time-series variations. This invention uses TimesNet to extract the periodic variation features from time-series data. The specific steps are as follows:

[0034] 3.2.1): The temporal data is input into a 1x1 convolutional layer and an embedding layer respectively, and then their respective embedding results are summed to obtain a one-dimensional temporal vector X. 1D ;

[0035] 3.2.2): Extracting a one-dimensional time-series vector X using Fast Fourier Transform. 1D The periodic information is then superimposed onto a one-dimensional time series vector. The change process of a one-dimensional time series vector of length T is represented as follows:

[0036] Z = Avg(FFT(X) 1D (9)

[0037] {f1,…,f k}=argTopk(Z) (10)

[0038]

[0039] in This represents the intensity of each frequency component in a one-dimensional time series vector. FFT stands for Fast Fourier Transform, and argTok represents the periodic information {f1,…,f...} selected from the k highest-intensity frequencies. k Padding(·) means padding the end of the one-dimensional time series vector with zeros so that the length of the reshaped two-dimensional time series vector matches the length of the selected period information.

[0040] Using formulas (9)-(11), we can obtain k two-dimensional time-series vectors containing different periodic information. Its columns represent adjacent moments, and its rows represent adjacent periods;

[0041] 3.2.3): For each obtained two-dimensional time series vector The ResNext convolution strategy can be used to integrate residual information to better extract sensor temporal features. The extracted two-dimensional temporal feature vector is represented as follows:

[0042]

[0043] 3.2.4): Transform the two-dimensional time-series feature vector After transforming back to one-dimensional space and performing information aggregation, its representation is as follows:

[0044]

[0045]

[0046]

[0047] In formula (13), Trunc means removing the zeros added in formula (11), Reshape means converting the two-dimensional time series feature vector back to one-dimensional form, and formulas (14) and (15) mean performing SoftMax weighted summation using the corresponding frequency intensities.

[0048] Through the above steps, the time feature vector X is obtained. i .

[0049] 3.3) Spatiotemporal feature fusion

[0050] This invention designs a Spatial-Temporal Fusion Graph Attention Network (STFGAT) to fuse spatial and temporal features, obtaining a vector z that fuses spatiotemporal features. i The fusion process is represented as follows:

[0051]

[0052] Among them W z X is a learnable weight matrix. i For the time feature vector learned through TimesNet in (3.2), N(i) = Topk(A) ij The expression represents the extraction of the k neighbors with the highest weights from node i in the weight matrix A; k is a hyperparameter that determines the number of neighbors selected for node i; the spatiotemporal fusion attention coefficient α. i,j The calculation formula is as follows:

[0053]

[0054]

[0055]

[0056] in W represents the vector concatenation operation. g The weight matrix is linearly changing, v i The spatial feature vector obtained by GCN convolution in step (3.1) is denoted as a, which represents the coefficient vector used for attention mechanism learning. The attention coefficients θ(i,j) are calculated using the LeakyReLU() nonlinear activation function, and exp() is the softmax() normalization function.

[0057] After the above feature extraction steps, the final spatiotemporal fusion feature vector of N sensor nodes is represented as follows:

[0058] 4) Anomaly detection in industrial time-series data based on prediction:

[0059] 4.1) Model Training

[0060] This invention trains a model using historical normal data to accurately capture the trends, periodicity, and patterns of normal time-series data. When the trained model encounters anomalous data in the test set, it compares the predicted normal data with the actual observed data, identifying sensor data that significantly deviates from expected behavior, thus determining whether an anomaly has occurred. Furthermore, by comparing the expected behavior and observed behavior of each sensor, it can also detect the anomalous state of specific sensors.

[0061] Spatiotemporal fusion feature vectors The sensor values input to a stacked fully connected layer with output dimension N, for prediction at time step t, are represented as follows:

[0062]

[0063] Where MLP represents stacked fully connected layers, It is the predicted value of the i-th sensor at time step t;

[0064] Based on the sensor prediction values calculated above By comparing the loss function with the sensor's true value Loss calculation is performed; the mean squared error loss function is adopted and minimized using the Adam optimizer, so that the model can accurately learn the distribution pattern of normal data. The loss function calculation formula is as follows:

[0065]

[0066] Where w is the size of the sliding window, and T is the training set;

[0067] By continuously iterating through the above training process, the spatiotemporal fusion vector representation of the sensor converges, resulting in the final accurate prediction model.

[0068] 4.2) Threshold setting

[0069] For outlier data in the test set, the Boxplots statistical method is used to adaptively calculate the outlier threshold. This method describes the overall distribution of normal data by observing the statistical median, 25th percentile, 75th percentile, upper boundary, and lower boundary. By calculating these statistics, a box plot containing most of the normal data is generated to detect outliers exceeding the upper and lower boundaries of the box. The calculation method is as follows:

[0070] UpperThreshold=Q3+1.5IQR (22)

[0071] LowerThreshold=Q1-1.5IQR (23)

[0072] Where Q3 is the 75th percentile, Q1 is the 25th percentile, IQR is the interquartile range, UpperThreshold represents the upper boundary of the outlier threshold, and LowerThreshold represents the lower boundary of the outlier threshold.

[0073] 4.3) Anomaly Detection

[0074] The trained model calculates individual outlier values for each sensor based on predicted values and abnormal observations, and combines these values into an overall outlier score for each time scale. When the overall outlier score exceeds a set outlier threshold, an anomaly can be identified at that time point. This method allows for the integration of outlier information from multiple sensors to determine the occurrence of abnormal events, and it can locate abnormal sensors through individual outlier values. The calculation process for individual outlier values and outlier scores is as follows:

[0075]

[0076]

[0077] Error i (t) represents an individual outlier, A s (t) represents the overall outlier score at time t, and IQR and m represent the interquartile range and median of the outlier on the time scale, respectively.

[0078] Ultimately, when the abnormal score As If (t) is outside the threshold range, an anomaly is considered to have occurred at time t, and the anomaly label is output as 1. After the above steps, the anomaly detection of multivariate time series data is completed.

[0079] The beneficial effects of this invention are:

[0080] This invention proposes an anomaly detection method for industrial time series data based on a spatiotemporal graph attention network. First, the industrial time series data is modeled as a spatiotemporal graph. Then, the spatiotemporal graph attention network is used to simultaneously and iteratively extract the temporal and spatial features in the spatiotemporal graph. Finally, the spatiotemporal features are deeply fused, which improves the accuracy of anomaly detection for time series data. Attached Figure Description

[0081] Figure 1 This invention provides an industrial time-series data anomaly detection framework.

[0082] Figure 2 This is the structure of the periodic time feature extraction module of the present invention;

[0083] Figure 3 This is for calculating the spatiotemporal fusion attention coefficient in this invention. Detailed Implementation

[0084] The invention will now be further described with reference to the accompanying drawings.

[0085] Reference Figure 1 , Figure 2 , Figure 3 Based on the SWaT dataset of safe water treatment systems, this paper further explains the specific implementation of an anomaly detection method for industrial time-series data based on spatiotemporal graph attention networks, including the following steps:

[0086] Step (1) Dataset partitioning and preprocessing:

[0087] The SWaT dataset includes time-series data collected continuously from 51 sensors. This time-series data is normalized using equation (1) to ensure that sensor data with different attributes and ranges have the same metric. All processed data is divided into three subsets: 60% as subset T1, 20% as subset T2, and 20% as subset T3. Subset T1 is used for model training and contains only normal data generated by the device under normal operating conditions. Subset T2 is used for hyperparameter tuning and threshold selection. Subset T3 is used to verify the model's anomaly detection performance; both T2 and T3 contain both normal and abnormal data generated by the device under abnormal operating conditions. Before being input into the model, the data needs to be segmented by a time window using equation (2), with the time window size set to 25.

[0088] Step (2) Construction of the fully connected temporal graph:

[0089] A temporal fully connected graph is a fully connected graph composed of pairwise connected sensor nodes, which can be represented as G = (V, E), where V is the set of nodes, each node representing a time window of sensor data; E is the set of edges, and e... ij ∈E represents the weighted edge between node i and node j, and the edge weight is calculated using equation (4); the adjacency matrix of the time-series fully connected graph including N sensors is represented as follows: Each element of the adjacency matrix is calculated using equation (5).

[0090] Step (3) Spatiotemporal graph attention network framework:

[0091] Step (3.1) Spatial feature extraction: Spatial feature extraction is performed on the constructed time series graph using GCN. The time series data and the adjacency matrix of the time series graph obtained in step (2) are input into equation (6)-equation (7) to obtain the sensor spatial feature vector v.

[0092] Step (3.2) Temporal Feature Extraction: In Figure 2 In the TimesNet module shown, time-series data is input into a 1x1 convolutional layer and an embedding layer, respectively, and then their respective embedding results are summed to obtain a one-dimensional time-series vector X. 1D .

[0093] Extract the one-dimensional time series vector X using equations (9)-(10) 1D The top k most frequent periodic information {f1,…,f k}. Using equation (11), the frequency information of each cycle is superimposed onto the one-dimensional time-series vector X. 1D The above yields k two-dimensional time-series vectors. The columns represent adjacent time points, and the rows represent adjacent periods.

[0094] For each two-dimensional time vector The ResNext convolution strategy is used to integrate residual information to better extract the temporal features of the sensor. The extracted two-dimensional temporal feature vector is represented as follows:

[0095] Using equations (13)-(15), the k two-dimensional time-series feature vectors are aggregated to obtain the final time feature vector X of this module. i .

[0096] Step (3.3) Spatiotemporal feature fusion: such as Figure 3 As shown, the temporal feature vector X obtained in step (3.2) and the spatial feature vector v obtained in step (3.1) are input into formula (17-19) to calculate the spatiotemporal fusion attention coefficient α. ij .

[0097] The temporal feature vector X, spatial feature vector v, and attention coefficient α obtained in the above steps are used as the basis for... ij The adjacency matrix A is input into equation (16) to obtain the spatiotemporal fusion feature vectors of the N sensors. Step (4) Anomaly detection in industrial time-series data based on prediction:

[0098] Fuse the spatiotemporal feature vector of each sensor The sensor values are input into a stacked fully connected layer with an output dimension of N to predict the sensor values at time step t.

[0099] The training process of the model uses the normal dataset T1 divided in step (1) as training data, and utilizes the obtained sensor prediction values. The loss function in equation (21) is continuously optimized so that the model learns the distribution trend of normal time-series data from each sensor. The trained model can make good predictions for sensors in normal states, but the prediction error for abnormal data is large. Therefore, the prediction error is used to determine whether the data is abnormal.

[0100] Anomaly detection is performed on the T3 dataset. Anomaly scores are calculated using equations (24-25). The obtained anomaly scores are compared with the adaptive thresholds in equations (22-23) to complete the anomaly detection process.

Claims

1. A method for detecting anomalies in industrial time-series data based on a spatiotemporal graph attention network, characterized in that, Includes the following steps: 1) Data preprocessing: Normalize the data from each sensor and divide the data into time series data; 2) Construction of a fully connected temporal graph: Connect industrial sensor data nodes pairwise to form a fully connected graph, where nodes represent sensor data and edge weights represent the dependencies between sensors; 3) Spatiotemporal graph attention network framework: Extract spatial features from temporal data, extract temporal features from temporal data, and perform spatiotemporal feature fusion; Step 3) is as follows: 3.1) Spatial feature extraction of time series data: First, a Graph Convolutional Network (GCN) is used to extract spatial features from the time-series data. Through adaptive aggregation of node features, data from different types of sensors are integrated into a unified vector representation. The calculation formula for graph convolution is as follows: ；； in Indicates that each node is at the 1st rank. Features of the layer Represents a nonlinear transformation. Represents the identity matrix. Represents the adjacency matrix. express The corresponding degree matrix, Indicates the first Layer weight matrix; The spatial feature vectors of each sensor obtained from the fully connected timing graph using GCN are represented as follows: ； in Let d represent the spatial feature vector of the i-th sensor at time t, where d represents the dimension of the vector and N represents the number of sensor nodes. 3.2) Extraction of temporal features from time series data: The following are the specific steps for using TimesNet to extract periodic variation features from time-series data: 3.2.1): The temporal data is input into a 1x1 convolutional layer and an embedding layer respectively, and then their respective embedding results are summed to obtain a one-dimensional temporal vector. ; 3.2.2): Extracting one-dimensional time series vectors using Fast Fourier Transform. The periodic information is then superimposed onto a one-dimensional time-series vector, for a length of... The transformation process of a one-dimensional time-series vector can be represented as follows: ；；； in This represents the intensity of each frequency component in a one-dimensional time-series vector. FFT stands for Fast Fourier Transform, and argTok represents selecting the k highest-intensity periodic information. , This means padding the end of the one-dimensional time series vector with zeros so that the length of the reshaped two-dimensional time series vector matches the length of the selected periodic information. Using formulas (9)-(11), we obtain k two-dimensional time-series vectors containing different periodic information. Its columns represent adjacent time points, and its rows represent adjacent periods; 3.2.3): For each obtained two-dimensional time series vector The ResNext convolution strategy is used to integrate residual information to better extract sensor temporal features. The extracted two-dimensional temporal feature vector is represented as follows: ； 3.2.4): The two-dimensional time-series feature vector After transforming back to one-dimensional space and performing information aggregation, its representation is as follows: ；；； In formula (13), Trunc means removing the zeros added in formula (11), Reshape means converting the two-dimensional time-series feature vector back to one-dimensional form, and formulas (14) and (15) mean performing SoftMax weighted summation using the corresponding frequency intensities. The time feature vector is obtained through the above steps. ; 3.3) Spatiotemporal feature fusion: The Spatiotemporal Graph Attention Network (STFGAT) fuses spatial and temporal features to obtain a vector that integrates spatiotemporal features. The fusion process is represented as follows: ； in The weight matrix is a learnable matrix. The time feature vector learned through TimesNet in step 3.2) = This represents the extraction of the k neighbors with the highest weights from node i in the weight matrix A; k is a hyperparameter that determines the number of neighbors selected for node i; the spatiotemporal fusion attention coefficient. The calculation formula is as follows: ；；； in This represents a vector concatenation operation. The weight matrix is a linearly changing matrix. Let 'a' be the spatial feature vector obtained through GCN convolution in step 3.1), and 'a' be the coefficient vector used for attention mechanism learning. The attention coefficients are calculated using the LeakyReLU() nonlinear activation function. exp() is the softmax() normalization function; After the above feature extraction steps, the final spatiotemporal fusion feature vector of N sensor nodes is represented as follows: ; 4) Anomaly detection in industrial time-series data based on prediction: Establish a prediction model that can capture the trend, periodicity and pattern of time-series data, train the model through the training set, set thresholds for abnormal data in the test set, and perform anomaly detection based on the abnormal thresholds.

2. The method for detecting anomalies in industrial time-series data based on a spatiotemporal graph attention network according to claim 1, characterized in that, Step 1) is as follows: 1.1) Data normalization: The data from each sensor are normalized separately, and the calculation formula is as follows: ； in It is the normalized value of the i-th sensor. This represents the data collected by the i-th sensor. and Let represent the mean and variance of the original measurement values of the i-th sensor, respectively; 1.2) Time-series data partitioning: At time t, the model input consists of historical time series data from each sensor prior to time t. A sliding window of size w is used to extract the historical time series data within the window, which is represented as: 。 3. The method for detecting anomalies in industrial time-series data based on a spatiotemporal graph attention network according to claim 1, characterized in that, Step 2) is as follows: The time-series fully connected graph is represented as , where V is a set of nodes, and each node represents the time window data of the sensor; E is the edge set. Let $\mathbf$ represent the weighted edge between node $i$ and node $j$. The weight of the edge is calculated using the following formula: ；； in This represents the set of candidate nodes excluding node i; x represents the time window data. The adjacency matrix of the time-series fully connected graph of N sensors is represented as follows: The values of the elements in the matrix are the weights calculated above, represented as follows: 。 4. The method for detecting anomalies in industrial time-series data based on a spatiotemporal graph attention network according to claim 1, characterized in that, Step 4) is as follows: 4.1) Model Training: Spatiotemporal fusion feature vectors The sensor values input to a stacked fully connected layer with output dimension N, for prediction at time step t, are represented as follows: ； Where MLP represents stacked fully connected layers, It is the predicted value of the i-th sensor at time step t; Based on the sensor prediction values calculated above By comparing the loss function with the sensor's true value Loss calculation is performed; the mean squared error loss function is used, and the Adam optimizer is used to minimize the loss function. The formula for calculating the loss function is as follows: ； Where w is the size of the sliding window, and T is the training set; By continuously iterating through the above training process, the spatiotemporal fusion vector representation of the sensor converges, resulting in the final accurate prediction model. 4.2) Threshold setting: For the abnormal data in the test set, i.e., the data generated under abnormal operating conditions, which includes abnormal values detected by sensors, the Boxplots statistical method is used to adaptively calculate the abnormal threshold. The calculation method is as follows: ；； in, It is the 75th percentile. 25th percentile, IQR is the interquartile range, UpperThreshold represents the upper boundary of the outlier threshold, and LowerThreshold represents the lower boundary of the outlier threshold. 4.3) Anomaly Detection: Based on the predicted values and abnormal monitoring values, the individual abnormal value of each sensor is calculated, and these values are combined to form an overall abnormality score for each time scale. When the overall abnormality score exceeds a set abnormality threshold, an abnormal situation is determined to have occurred at that time point. The calculation process for individual abnormal values and abnormality scores is as follows: ；； in Indicates individual outliers. The total outlier score at time t is represented by IQR, and IQR and m represent the interquartile range and median of the outlier on the time scale, respectively. Ultimately, when abnormal scores If the value is outside the threshold range, it is considered that an anomaly has occurred at time t, and the anomaly label is output as 1. After the above steps, the anomaly detection of multivariate time series data is completed.