A method for jointly determining anomalies based on reconstruction and prediction

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
By combining a multivariate time-series anomaly reconstruction model with prediction and reconstruction methods, the accuracy problem of sensor anomaly detection in semiconductor manufacturing process is solved, the false alarm and missed alarm rates are reduced, and equipment maintenance costs are decreased.

CN116340872BActive Publication Date: 2026-06-30SHENZHEN ZHIXIAN FUTURE IND SOFTWARE CO LTD

View PDF 3 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Patents(China)
Current Assignee / Owner: SHENZHEN ZHIXIAN FUTURE IND SOFTWARE CO LTD
Filing Date: 2023-03-29
Publication Date: 2026-06-30

Application Information

Patent Timeline

29 Mar 2023

Application

30 Jun 2026

Publication

CN116340872B

IPC: G06F18/2433; G06F18/214

AI Tagging

Technology Topics

Algorithm Multiple sensor

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

Technical Problem

In the semiconductor manufacturing process, existing univariate time-series prediction models are prone to generating false alarms or missed alarms, lack robustness to disturbances and noise, leading to redundancy in the monitoring system and increased maintenance costs.

Method used

A multivariate time-series anomaly reconstruction model is adopted, which combines prediction and reconstruction methods to train data. The reconstructed data from multiple sensors is compared with real data to improve the accuracy of anomaly detection.

Benefits of technology

By comprehensively utilizing reconstruction and prediction models, the accuracy of anomaly detection has been improved, false alarms and missed alarms have been reduced, and the maintenance costs of production equipment have been lowered.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN116340872B_ABST

Patent Text Reader

Abstract

This invention relates to a method for determining anomalies based on joint reconstruction and prediction, comprising: acquiring multivariate time-series data, wherein the time-series data is data generated by multiple sensors of equipment in a semiconductor manufacturing process; inputting a first target subsequence corresponding to a first time window from the time-series data into a reconstruction model to obtain reconstructed data for the first time window, wherein the cutoff time point of the first time window is the (n+1)th time point; inputting a second target subsequence from the time-series data up to the nth time point into a prediction model to obtain predicted data for the (n+1)th time point; aggregating the target data corresponding to the (n+1)th time point from the reconstructed data with the predicted data to obtain total predicted data for the (n+1)th time point; and comparing the total predicted data with the measured data corresponding to the time-series data at the (n+1)th time point to determine whether an anomaly occurs at the (n+1)th time point.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of semiconductor manufacturing, and more particularly to a method for determining anomalies based on a combination of reconstruction and prediction. Background Technology

[0002] In semiconductor manufacturing, multiple sensors typically monitor the production status simultaneously on the same machine. These sensors on the same machine are interconnected; when an anomaly occurs, some of these interconnected sensors may simultaneously generate abnormal data, thus triggering an alarm.

[0003] Existing technologies model these sensor data separately using univariate time-series prediction, predicting data individually for each sensor and then comparing it with the actual data to determine whether to trigger an alarm. However, this method may produce false alarms, meaning that a sensor may erroneously generate an alarm message due to its own malfunction or environmental noise when no actual anomaly has occurred. This creates redundancy in the entire monitoring system and increases the maintenance costs of production equipment.

[0004] When choosing a modeling method, if only prediction is used to train the data, the model will be very sensitive to the randomness of the time series and lack robustness to disturbances and noise. If only reconstruction is used to train the data, the model may ignore individual data features, especially when the overall value still conforms to the original window data distribution, which may lead to missed reports. Summary of the Invention

[0005] This specification describes one or more embodiments of a method for anomaly identification based on joint reconstruction and prediction. It simultaneously models data from multiple sensors using a multivariate time-series anomaly reconstruction model, comparing the reconstructed data from all sensors at any given time point with the actual data to obtain better prediction results. Furthermore, when selecting a modeling method, both prediction and reconstruction methods are used to train the data to better uncover the inherent relationships within the multivariate time-series data, leading to improved prediction outcomes.

[0006] This specification provides a method for identifying anomalies based on a joint approach of reconstruction and prediction, including:

[0007] Acquire multivariate time-series data, which is data generated by multiple sensors of equipment during semiconductor manufacturing;

[0008] The first target subsequence corresponding to the first time window in the time series data is input into the reconstruction model to obtain the reconstruction data for the first time window, where the cutoff time of the first time window is the (n+1)th time point.

[0009] The second target subsequence up to the nth time point in the time series data is input into the prediction model to obtain the prediction data for the (n+1)th time point;

[0010] The target data corresponding to the (n+1)th time point in the reconstructed data is aggregated with the predicted data to obtain the total predicted data for the (n+1)th time point.

[0011] The total predicted data is compared with the measured data corresponding to the time series data at the (n+1)th time point to determine whether an anomaly occurs at the (n+1)th time point.

[0012] In one possible implementation, it also includes:

[0013] When an anomaly occurs at time point n+1, the abnormal sensor and the corresponding anomaly type are determined based on the total predicted data and the measured data.

[0014] In one possible implementation, it also includes:

[0015] Based on the anomaly sensor, the anomaly type, the device number, and the number of the wafer being processed by the device, corresponding knowledge points are determined. These knowledge points are used to generate or update a knowledge graph in the semiconductor field.

[0016] In one possible implementation, the total predicted data is compared with the measured data corresponding to the time series data at time point n+1 to determine whether an anomaly occurs at time point n+1, including:

[0017] Calculate the error between the total predicted data and the measured data, and determine whether an anomaly occurs at the (n+1)th time point based on the comparison result of the error and a preset first threshold.

[0018] In one possible implementation, determining the abnormal sensor and its corresponding abnormality type based on the total predicted data and the measured data includes:

[0019] At time point n+1, the error between the data corresponding to any target sensor among the plurality of sensors in the total predicted data and the measured data is calculated. Based on the comparison result of the error and the preset threshold corresponding to the target sensor, it is determined whether the target sensor has an anomaly and the corresponding anomaly type.

[0020] In one possible implementation, the reconstruction model is a variational autoencoder (VAE); the first target subsequence corresponding to a first time window in the time series data is input into the reconstruction model to obtain reconstructed data for the first time window, including:

[0021] The first target subsequence is encoded using the encoder of a variational autoencoder (VAE) to obtain a sequence of latent space variables.

[0022] The latent space variable sequence is decoded using a Variational Autoencoder (VAE) to obtain the reconstructed data.

[0023] In one possible implementation, the prediction model includes a self-attention model, a graph attention network (GAT), and a fully connected neural network; the second target subsequence up to time point n in the time series data is input into the prediction model to obtain the prediction data for time point n+1, including:

[0024] The second target subsequence is encoded in the time dimension using a self-attention model to obtain an encoded sequence.

[0025] The second target subsequence is input into the graph attention network GAT to determine the graph relationship information between the multiple sensors, wherein each sensor corresponds to a node in the graph;

[0026] The encoded sequence and the graph relationship information are concatenated and then input into the fully connected neural network to obtain the predicted data.

[0027] In one possible implementation, the reconstruction model and the prediction model are trained using sample time-series data, the training including:

[0028] The data sequence in the sample time series data is divided into several sequence segments of the same size;

[0029] The reconstructed model and the prediction model are trained in multiple rounds using the aforementioned sequence fragments. Each round of training uses one sample sequence fragment from the aforementioned sequence fragments. Any round of training includes:

[0030] The third target subsequence corresponding to the third time window in the sample sequence segment is input into the reconstruction model to obtain the reconstruction data for the third time window, where the cutoff time point of the third time window is the (m+1)th time point.

[0031] The fourth target subsequence up to time point m in the sample sequence segment is input into the prediction model to obtain the prediction data for time point m+1.

[0032] The reconstruction error is determined based on the reconstructed data and the data corresponding to the sample sequence fragment at time point m+1.

[0033] The prediction error is determined based on the predicted data and the data corresponding to the sample sequence segment at time point m+1.

[0034] The total error is obtained by combining the reconstruction error and the prediction error. The values of the parameters in the reconstruction model and the prediction model are adjusted by minimizing the total error.

[0035] In one possible implementation, the total error is obtained by combining the reconstruction error with the prediction error, including:

[0036] Add the reconstruction error to the prediction error to obtain the total error; or

[0037] The total error is obtained by averaging the reconstruction error and the prediction error; or

[0038] The total error is obtained by maximizing the reconstruction error and the prediction error; or

[0039] The total error is obtained by minimizing the reconstruction error and the prediction error.

[0040] In one possible implementation, the types of error include at least: root mean square error (RMSE), mean square error (MSE), and mean absolute error (MAE).

[0041] This invention proposes a method for anomaly identification based on joint reconstruction and prediction. It simultaneously models data from multiple sensors using a multivariate temporal anomaly reconstruction model, comparing the reconstructed data from all sensors at any given time point with the actual data to obtain better prediction results. Furthermore, when selecting a modeling method, both prediction and reconstruction methods are used to train the data, thereby better uncovering the inherent relationships within the multivariate time-series data and achieving superior prediction outcomes. Attached Figure Description

[0042] To more clearly illustrate the technical solutions of the various embodiments disclosed in this specification, the accompanying drawings used in the description of the embodiments will be briefly introduced below. Obviously, the accompanying drawings described below are only a few embodiments disclosed in this specification. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0043] Figure 1 This is a framework diagram of a method for jointly determining anomalies based on reconstruction and prediction, as disclosed in an embodiment of the present invention.

[0044] Figure 2 This is a flowchart of a method for jointly determining anomalies based on reconstruction and prediction, as disclosed in an embodiment of the present invention;

[0045] Figure 3 This is a flowchart of the method for training and reconstructing a model and a prediction model disclosed in an embodiment of the present invention. Detailed Implementation

[0046] To make the objectives, technical solutions, and advantages of the embodiments of the present invention clearer, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.

[0047] According to one embodiment, Figure 1 This presents a framework for a method based on joint reconstruction and prediction to identify anomalies. For example... Figure 1 As shown, the framework used in this method mainly consists of three parts: a reconstruction model, a prediction model, and an anomaly evaluation layer. The reconstruction model can be a variational autoencoder (VAE), including an encoder and a decoder; the prediction model can be composed of a self-attention model, a graph attention network (GAT), and a fully connected neural network.

[0048] During the training phase, the multivariate time series data of the training samples are input into the two models respectively, and the reconstruction error and prediction error of the two models are calculated separately. Then, a joint training method is adopted to minimize the reconstruction error and prediction error at the same time, and the parameters of the two models are adjusted.

[0049] During the prediction phase, multivariate time-series data are input into two models. Each model provides a predicted value for each sensor at the next time point. The anomaly assessment layer aggregates the sensor predictions from the two models to obtain a more accurate overall prediction value for each sensor. Then, the overall predicted value is compared with the actual value to determine whether an anomaly has occurred at that time point.

[0050] The following will provide further explanation and description with reference to the accompanying drawings and specific embodiments. These embodiments do not constitute a limitation on the embodiments of the present invention.

[0051] Figure 2 This is a flowchart illustrating a method for jointly determining anomalies based on reconstruction and prediction, as disclosed in an embodiment of the present invention. Figure 2As shown, the method includes at least the following steps: Step 201, acquiring multivariate time-series data, wherein the time-series data is data generated by multiple sensors of equipment during semiconductor manufacturing; Step 202, inputting a first target subsequence corresponding to a first time window from the time-series data into a reconstruction model to obtain reconstructed data for the first time window, wherein the cutoff time point of the first time window is the (n+1)th time point; Step 203, inputting a second target subsequence up to the nth time point from the time-series data into a prediction model to obtain predicted data for the (n+1)th time point; Step 204, aggregating the target data corresponding to the (n+1)th time point from the reconstructed data with the predicted data to obtain total predicted data for the (n+1)th time point; Step 205, comparing the total predicted data with the measured data corresponding to the time-series data at the (n+1)th time point to determine whether an anomaly occurs at the (n+1)th time point.

[0052] In step 201, multivariate time-series data is acquired, which is data generated by multiple sensors of the equipment during the semiconductor manufacturing process.

[0053] Time-series data refers to the raw data generated by sensors in semiconductor manufacturing equipment, processed by a Functional Data Conversion (FDC) system. A semiconductor manufacturing machine is equipped with many sensors, each monitoring a specific parameter such as temperature, humidity, voltage, current, or pressure. The value of any parameter output by a sensor over a given period is processed by the FDC system to obtain a set of univariate time-series data. By combining the time-series data from all sensors, multivariate time-series data is obtained.

[0054] Multivariate time series data can be represented as D = {X} 1 X 2 , ..., X n}, X i ∈R k X represents the value of k sensors at the i-th time point. i It is a k-dimensional vector, where k is the number of sensors, and i = 1, 2, ..., n.

[0055] In step 202, the first target subsequence corresponding to the first time window in the time series data is input into the reconstruction model to obtain the reconstruction data for the first time window, where the cutoff time of the first time window is the (n+1)th time point.

[0056] Specifically, if a first time window of length ω is preset, then the first target subsequence up to the (n+1)th time point can be represented as S. 1 ={X n-ω+2 X n-ω+3 , ..., X n+1}. The first target subsequence S 1 The data is input into the reconstruction model to obtain the reconstruction data for the first time window. Reconstructing data R 1 Each data point in the sequence represents the first target subsequence S. 1 The reconstructed data corresponding to the data in the middle. For example, For X n-ω+2 The corresponding reconstructed data, For X n+1 The corresponding reconstructed data.

[0057] In one embodiment, the reconstruction model is a variational autoencoder (VAE). In this case, step 202 specifically includes: encoding the first target subsequence S using the encoder of the VAE. 1 The hidden space variable sequence V is obtained. 1 The decoder of the latent space variable sequence V is used with a variational autoencoder (VAE). 1 Decoding is performed to obtain the reconstructed data R. 1 .

[0058] In step 203, the second target subsequence up to time point n in the time series data is input into the prediction model to obtain the prediction data for time point n+1.

[0059] Specifically, the second target subsequence up to time point n can be represented as S. 2 S 2 The initial data can be with S 1 They can be the same, or they can be different, as long as S 2 The last data is X n That's it. Take the second target subsequence S... 2 The data is input into the prediction model to obtain the predicted data for the (n+1)th time point.

[0060] In one embodiment, the prediction model includes a self-attention model, a graph attention network (GAT), and a fully connected neural network. In this case, step 203 specifically includes: using the self-attention model to process the second target subsequence S. 2 Encoding is performed along the time dimension using a self-attention mechanism to obtain an encoded sequence; the second target sub-sequence is input into a graph attention network (GAT) to determine the graph relationship information between the multiple sensors, wherein each sensor corresponds to a node in the graph; the encoded sequence and the graph relationship information are concatenated and then input into the fully connected neural network to obtain the predicted data.

[0061] In a more specific embodiment, the self-attention model is the self-attention module of the Transformer model.

[0062] Step 204: Extract the target data corresponding to the (n+1)th time point from the reconstructed data. With the predicted data Aggregate the data to obtain the total prediction data for the (n+1)th time point.

[0063] right and There are various ways to aggregate the values, such as calculating the arithmetic mean, geometric mean, or simply summing them. No specific method is specified here.

[0064] Step 205, the total predicted data The measured data X corresponding to the time series data at the (n+1)th time point n+1 Compare the results to determine if an anomaly occurred at time point n+1.

[0065] In one embodiment, the total prediction data is calculated. With the measured data X n+1 The error is compared with a preset first threshold to determine whether an anomaly occurs at the (n+1)th time point.

[0066] There are several ways to calculate the error, such as using the root mean square error (RMSE), mean square error (MSE), or mean absolute error (MAE), and no specific method is used here.

[0067] In some possible implementations, the method further includes: step 206, when an anomaly occurs at time point n+1, determining the anomaly sensor and the corresponding anomaly type based on the total predicted data and the measured data.

[0068] In one embodiment, at time point n+1, the total predicted data for any target sensor among the plurality of sensors is calculated. and the measured data X n+1 The error between the corresponding data, i.e. and X n+1 The error between components in the dimension corresponding to the target sensor is used to determine whether the target sensor is malfunctioning and the corresponding malfunction type based on the comparison result of the error with a preset threshold corresponding to the target sensor. For example, when the error is greater than the threshold, it is determined that the target sensor is malfunctioning at the (n+1)th time point, and at the same time, the corresponding malfunction type is determined based on the type of machine parameter detected by the target sensor.

[0069] The preset thresholds for the aforementioned target sensors can be set by engineers based on actual conditions or experience, or calculated by data models based on historical data. Different thresholds can also be set for different sensors based on actual conditions.

[0070] Since the error described in step 206 is the error between scalars, when calculating the error, the difference between the two can be directly calculated, or the absolute value of the difference can be calculated; there is no limitation here.

[0071] In some possible implementations, the method further includes: step 207, determining corresponding knowledge points based on the anomaly sensor, the anomaly type, the device number of the device, and the number of the wafer being processed by the device, wherein the knowledge points are used to generate or update a knowledge graph in the semiconductor field.

[0072] In some embodiments, since multiple anomaly types may exist simultaneously at a certain point in time, corresponding to multiple target sensors malfunctioning, the knowledge point can be in the form of a tuple. Specifically, the tuple is in the form of (anomaly type 1, ... anomaly type m, device number, wafer number). For example, a specific knowledge point can be (excessive temperature, excessive pressure, machine 2, wafer 3), where wafer represents a wafer.

[0073] Figure 2 The steps included are those for using a trained model to perform anomaly detection on the data; the method for training the model is as follows: Figure 3 As shown.

[0074] Figure 3 This is a flowchart illustrating the method for training and reconstructing a model and a prediction model disclosed in an embodiment of the present invention. Figure 3As shown, the reconstruction model and the prediction model are trained using sample time-series data. The training includes: step 310, dividing the data sequence in the sample time-series data into several sequence segments of the same size; step 320, using the several sequence segments to train the reconstruction model and the prediction model in multiple rounds, with each round using one sample sequence segment from the several sequence segments. Any round of training includes: step 321, inputting the third target subsequence corresponding to the third time window from the sample sequence segment into the reconstruction model to obtain the reconstruction data for the third time window, where the cutoff time of the third time window is the m+th time. 1. Time point; Step 322: Input the fourth target subsequence up to time point m in the sample sequence segment into the prediction model to obtain the prediction data for time point m+1; Step 323: Determine the reconstruction error based on the reconstructed data and the data corresponding to the sample sequence segment at time point m+1; Step 324: Determine the prediction error based on the prediction data and the data corresponding to the sample sequence segment at time point m+1; Step 325: Combine the reconstruction error and the prediction error to obtain the total error, and adjust the values of the parameters in the reconstruction model and the prediction model by minimizing the total error.

[0075] The time series data of the samples used to train the model can be represented as This represents the value of k sensors at time point i. It is a k-dimensional vector, where k is the number of sensors, i = 1, 2, ..., n. The time-series data used to train the model contains no outliers.

[0076] In step 310, the data sequence in the sample time series data is divided into several sequence segments of the same size, denoted as D. 1 D 2 , ..., D m .

[0077] In step 320, the reconstructed model and the prediction model are trained multiple times using the plurality of sequence fragments, with each training round using one sample sequence fragment D from the plurality of sequence fragments. i Any round of training includes steps 321 to 325.

[0078] In step 321, the third target subsequence corresponding to the third time window in the sample sequence segment is input into the reconstruction model to obtain the reconstruction data for the third time window, where the cutoff time point of the third time window is the (m+1)th time point.

[0079] Specifically, if a first time window of length μ is preset, then the third target subsequence up to the (m+1)th time point can be represented as S. 3={X m-μ+2 X m-μ+3 , ..., X m+1}. The third target subsequence S 3 The data is input into the reconstruction model to obtain the reconstruction data for the third time window. Reconstructing data R 3 Each data point in the sequence represents the third target subsequence S. 3 The reconstructed data corresponding to the data in the middle.

[0080] In one embodiment, the reconstruction model is a variational autoencoder (VAE). In this case, the implementation method of step 321 can refer to step 202, and will not be detailed here.

[0081] In step 322, the fourth target subsequence S up to the m-th time point in the sample sequence segment is... 4 The data is input into the prediction model to obtain the predicted data for the (m+1)th time point.

[0082] In one embodiment, the prediction model includes a self-attention model, a graph attention network (GAT), and a fully connected neural network. In this case, the implementation method of step 322 can refer to step 203, and will not be detailed here.

[0083] In a more specific embodiment, the self-attention model is the self-attention module of the Transformer model.

[0084] In step 323, the reconstruction error is determined based on the reconstructed data and the data corresponding to the sample sequence fragment at time point m+1.

[0085] Specifically, according to and X m+1 Determine the reconstruction error loss re .

[0086] In step 324, the prediction error is determined based on the predicted data and the data corresponding to the sample sequence fragment at time point m+1.

[0087] Specifically, according to and X m+1 Determine the prediction error loss pr .

[0088] In step 325, the reconstruction error is... re With the prediction error loss pr The combination of these factors yields the total error, lass. total The values of the parameters in the reconstructed model and the prediction model are adjusted by minimizing the total error.

[0089] There are several ways to combine the reconstruction error and the prediction error to obtain the total error. For example, the total error can be obtained by adding the reconstruction error and the prediction error; or by averaging the reconstruction error and the prediction error; or by finding the maximum value of the reconstruction error and the prediction error; or by finding the minimum value of the reconstruction error and the prediction error. No specific method is used here.

[0090] There are several ways to calculate the error mentioned above, such as using root mean square error (RMSE), mean square error (MSE), or mean absolute error (MAE), and no specific method is used here.

[0091] Using sequence fragment D 1 D 2 ,…,D m After multiple rounds of training on the reconstruction model and the prediction model, the trained reconstruction model and prediction model can be obtained, which can then be used for... Figure 2 Anomaly detection in the relevant steps described herein.

[0092] It should be noted that, in this document, relational terms such as "first" and "second" are used merely to distinguish one entity or operation from another, and do not necessarily require or imply any such actual relationship or order between these entities or operations. Furthermore, the terms "comprising," "including," or any other variations thereof are intended to cover non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such a process, method, article, or apparatus. Without further limitations, an element defined by the phrase "comprising one..." does not exclude the presence of other identical elements in the process, method, article, or apparatus that includes the element.

[0093] Those skilled in the art will understand that all or part of the steps of the above embodiments can be implemented by hardware or by a program instructing related hardware. The program can be stored in a computer-readable storage medium, such as a read-only memory, a disk, or an optical disk.

[0094] The specific embodiments described above further illustrate the purpose, technical solution, and beneficial effects of the present invention. It should be understood that the above description is only a specific embodiment of the present invention and is not intended to limit the scope of protection of the present invention. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of the present invention should be included within the scope of protection of the present invention.

Claims

1. A method for jointly identifying anomalies based on reconstruction and prediction, comprising: Acquire multivariate time-series data, which is data generated by multiple sensors of equipment during semiconductor manufacturing; The first target subsequence corresponding to the first time window in the time series data is input into the reconstruction model to obtain the reconstructed data for the first time window, where the cutoff time of the first time window is the (n+1)th time point; wherein, the reconstruction model is a variational autoencoder (VAE); inputting the first target subsequence corresponding to the first time window in the time series data into the reconstruction model to obtain the reconstructed data for the first time window includes: encoding the first target subsequence using the encoder of the variational autoencoder (VAE) to obtain a latent space variable sequence; and decoding the latent space variable sequence using the decoder of the variational autoencoder (VAE) to obtain the reconstructed data; The second target subsequence up to time point n in the time series data is input into the prediction model to obtain the predicted data for time point n+1. The prediction model includes a self-attention model, a graph attention network (GAT), and a fully connected neural network. The process involves: encoding the second target subsequence along the time dimension using a self-attention model to obtain an encoded sequence; inputting the second target subsequence into the graph attention network (GAT) to determine the graph relationship information between the multiple sensors, where each sensor corresponds to a node in the graph; and concatenating the encoded sequence and the graph relationship information, then inputting the concatenated sequence into the fully connected neural network to obtain the predicted data. The target data corresponding to the (n+1)th time point in the reconstructed data is aggregated with the predicted data to obtain the total predicted data for the (n+1)th time point; wherein, the aggregation includes calculating the arithmetic mean of the target data and the predicted data; The total predicted data is compared with the measured data corresponding to the time series data at the (n+1)th time point to determine whether an anomaly occurs at the (n+1)th time point.

2. The method according to claim 1, characterized in that, Also includes: When an anomaly occurs at time point n+1, the abnormal sensor and the corresponding anomaly type are determined based on the total predicted data and the measured data.

3. The method according to claim 2, characterized in that, Also includes: Based on the anomaly sensor, the anomaly type, the device number, and the number of the wafer being processed by the device, corresponding knowledge points are determined. These knowledge points are used to generate or update a knowledge graph in the semiconductor field.

4. The method according to claim 1, characterized in that, The total predicted data is compared with the measured data corresponding to the time series data at time point n+1 to determine whether an anomaly occurs at time point n+1, including: Calculate the error between the total predicted data and the measured data, and determine whether an anomaly occurs at the (n+1)th time point based on the comparison result of the error and a preset first threshold.

5. The method according to claim 2, characterized in that, Based on the total predicted data and the measured data, the abnormal sensors and their corresponding abnormality types are determined, including: At time point n+1, the error between the data corresponding to any target sensor among the plurality of sensors in the total predicted data and the measured data is calculated. Based on the comparison result of the error and the preset threshold corresponding to the target sensor, it is determined whether the target sensor has an anomaly and the corresponding anomaly type.

6. The method according to claim 1, characterized in that, The reconstruction model and the prediction model are trained using sample time-series data, and the training includes: The data sequence in the sample time series data is divided into several sequence segments of the same size; The reconstructed model and the prediction model are trained in multiple rounds using the aforementioned sequence fragments. Each round of training uses one sample sequence fragment from the aforementioned sequence fragments. Any round of training includes: The third target subsequence corresponding to the third time window in the sample sequence segment is input into the reconstruction model to obtain the reconstruction data for the third time window, where the cutoff time point of the third time window is the (m+1)th time point. The fourth target subsequence up to time point m in the sample sequence segment is input into the prediction model to obtain the prediction data for time point m+1. The reconstruction error is determined based on the reconstructed data and the data corresponding to the sample sequence fragment at time point m+1. The prediction error is determined based on the predicted data and the data corresponding to the sample sequence segment at time point m+1. The total error is obtained by combining the reconstruction error and the prediction error. The values of the parameters in the reconstruction model and the prediction model are adjusted by minimizing the total error.

7. The method according to claim 6, characterized in that, The total error is obtained by combining the reconstruction error and the prediction error, including: Add the reconstruction error to the prediction error to obtain the total error; or The total error is obtained by averaging the reconstruction error and the prediction error; or The total error is obtained by maximizing the reconstruction error and the prediction error; or The total error is obtained by minimizing the reconstruction error and the prediction error.

8. The method according to claim 4 or 6, characterized in that, The types of errors include at least: root mean square error (RMSE), mean square error (MSE), and mean absolute error (MAE).