A bundle type device fault response method and system based on graph network optimization

By using a graph network optimization method, an efficient fault response strategy is generated, which solves the problems of scheduling strategy redundancy and complex process handling in fault response of clustered equipment, achieves high-quality fault response, reduces equipment loss and is applicable to multiple operating modes.

CN115859079BActive Publication Date: 2026-06-16TSINGHUA UNIVERSITY +1

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
TSINGHUA UNIVERSITY
Filing Date
2022-11-15
Publication Date
2026-06-16

AI Technical Summary

Technical Problem

Existing fault response methods for clustered equipment suffer from redundant scheduling strategies and difficulty in handling complex processes, resulting in significant losses when equipment fails. Furthermore, existing methods are limited to specific operating modes.

Method used

A graph network-based optimization approach is adopted to collect equipment state features, establish a graph model, and use a pre-trained policy model to generate efficient fault response strategies. This includes undirected graph models with dynamic and static features and graph attention networks for fault response strategy generation and equipment control.

🎯Benefits of technology

It implements a high-quality fault response strategy, reduces the transition time before and after a fault, reduces equipment losses, and is applicable to a variety of processing conditions with strong scalability.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN115859079B_ABST
    Figure CN115859079B_ABST
Patent Text Reader

Abstract

The application belongs to the technical field of equipment fault processing, and relates to a cluster type equipment fault response method and system based on graph network optimization, comprising the following steps: collecting a processing state of a cluster type equipment; if a certain equipment fails, determining a dynamic characteristic f d and a static characteristic f s of the equipment according to the processing state; corresponding the equipment to a node in a graph model according to the dynamic characteristic f d and the static characteristic f s of the equipment, and marking the node; inputting the graph model with the marked node into a pre-trained strategy model to generate a fault response strategy; and controlling the cluster type equipment based on the fault response strategy to achieve a response to the fault. The application can respond to the fault in the scheduling process of the cluster type equipment, can obtain a fault response strategy with high quality, and is more suitable for different operation modes of the cluster type equipment.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to a cluster-type equipment fault response method and system based on graph network optimization, belonging to the field of equipment fault handling technology. Background Technology

[0002] Cluster-type equipment has been widely and successfully used in the semiconductor manufacturing industry. A single cluster-type device integrates multiple computer-controlled processing units, giving it high efficiency, flexibility, and reconfigurability. When processing a large number of wafers in a single batch, cluster-type equipment typically employs a relatively easy-to-control periodic steady-state scheduling. However, in actual production, processing modules (or processing chambers) of cluster-type equipment may fail due to reasons such as machine malfunction or material shortages. If cluster-type equipment fails and is immediately shut down for repair, it will result in significant losses. The main reasons are: when downtime occurs, there are still wafers being processed in the processing modules, so direct shutdown will damage the wafers and cause losses; and there is usually a long time interval between the moment of failure and the moment the engineer responsible for equipment repair begins. Therefore, a fault response method and system for cluster-type equipment is needed, enabling uninterrupted processing through the remaining undamaged chambers after a failure, to minimize losses. However, scheduling of clustered wafer assemblies is highly complex, primarily because there are no intermediate buffers between processing units. Therefore, deadlocks must be constantly avoided during scheduling. Furthermore, for some processing techniques, the dwell time of completed wafers within the clustered wafer assemblies needs to be strictly limited to minimize damage from residual heat and gases. These factors present challenges to the fault response capabilities of clustered wafer assemblies.

[0003] Currently, there are two common methods for responding to clustered equipment failures: a virtual wafer-based method and a strategy analysis-based method. The virtual wafer-based method gradually transitions the equipment to an idle state by introducing a virtual wafer at a time after a failure; then, it gradually transitions to a full-load state by introducing a real wafer, thus completing the transition between the pre- and post-failure states. This method successfully responds to processing module failures by introducing virtual wafers and employing pre- and post-failure scheduling schemes. While simple and feasible, this method suffers from numerous redundant scheduling steps during actual failure response, limiting the quality of the resulting response strategy. The strategy analysis-based method first determines the robot's action sequence and then adjusts the robot's waiting time for retrieving wafers from each processing module to best meet the dwell time constraints of each module. However, this method is difficult to handle fault response of clustered equipment with complex processing technology, and it is limited to clustered equipment that is scheduled using backward or exchange strategies. Summary of the Invention

[0004] To address the aforementioned problems, the present invention aims to provide a fault response method and system for clustered devices based on graph network optimization. This method and system can respond to faults during the scheduling process of clustered devices, obtain high-quality fault response strategies, and have a wider applicability to different operating modes of clustered devices.

[0005] To achieve the above objectives, the present invention proposes the following technical solution: a fault response method for clustered equipment based on graph network optimization, comprising the following steps: collecting the processing status of the clustered equipment; if a fault occurs in a certain piece of equipment, determining the dynamic characteristics f of that equipment based on the processing status. d and static features f s Based on the dynamic characteristics of the device f d and static features f s The process involves associating devices with nodes in a graph model and marking the nodes; inputting the graph model with marked nodes into a pre-trained strategy model to generate a fault response strategy; and controlling the clustered devices based on the fault response strategy to respond to their faults.

[0006] Furthermore, dynamic feature f d This includes: the remaining processing time or dwell time St of the wafer in each chamber i. i In each chamber i, there is a Boolean variable Bw related to the presence or absence of a wafer. i The number of wafers M that can be placed in each chamber i by the robotic arm ri And the remaining resources K of the robotic armr .

[0007] Furthermore, static feature f s Including the processing time r for each chamber i i The time w for the robot to pick up and place wafers, the rotation time v of the robot, the remaining processing time or dwell time of the target wafer in each chamber i, and the Boolean variables related to the presence of the target wafer in each chamber i.

[0008] Furthermore, the method for establishing the graph model is as follows: A graph model is established based on the topology of the wafer fabrication process in the clustered equipment, and the static feature f of each node in the graph model is measured. s and dynamic features f d ; the static feature f s and dynamic features f d The model is an undirected graph, where the feature of node i is h. i =[f s ||f d ], || represents the concatenation of vectors; for each edge of an undirected graph model, it is split into two edges of a directed graph, where each edge represents the action of a robot arm picking up and placing wafers.

[0009] Furthermore, the strategy model employs a graph attention network.

[0010] Furthermore, the method for generating fault response strategies is as follows: a pre-trained strategy model is input into a directed graph of the graph model; features are extracted from the directed graph to obtain high-dimensional features of each node; the similarity between the high-dimensional features of each node is calculated, and then an action decision is output; based on the similarity, the action to be executed is sampled, and the connection information of the edge corresponding to the action and the high-dimensional features of the node obtained in the previous embedding are input into a graph attention network for secondary node embedding; the high-dimensional features after secondary node embedding are input into a multilayer perceptron to predict the fault response strategy required in the current step.

[0011] Furthermore, the fault response strategy includes: the robot arm action to be executed in the current step and the waiting time before the robot arm executes the action.

[0012] Furthermore, the policy model is trained using the policy gradient method.

[0013] This invention also discloses a fault response system for clustered equipment based on graph network optimization, comprising: a data acquisition module for acquiring the processing status of clustered equipment; if a fault occurs in a certain piece of equipment, determining the dynamic characteristics f of that equipment based on the processing status. d and static features f s The state calculation module is used to calculate the dynamic characteristics f of the device.d and static features f s The system maps devices to nodes in the graph model and marks the nodes; the graph network optimization module is used to input the graph model with marked nodes into the pre-trained policy model to generate a fault response policy; the fault response module is used to control the clustered devices according to the fault response policy to respond to their faults.

[0014] The present invention also discloses a computer-readable storage medium storing a computer program, which is executed by a processor to implement any of the above-described clustered device fault response methods based on graph network optimization.

[0015] The present invention has the following advantages due to the adoption of the above technical solutions:

[0016] 1. The solution of the present invention has a wide range of applications, is applicable to various processing conditions, and has strong scalability.

[0017] 2. The graph network optimization method of this invention fully considers the operating status of clustered equipment and the topology of the process flow during manufacturing, resulting in a scheduling strategy with higher quality. A high-quality scheduling scheme can significantly reduce losses incurred by clustered equipment after a failure by reducing the transition time before and after a connection failure, and can be widely applied to fault response of clustered equipment in actual wafer manufacturing processes. Attached Figure Description

[0018] Figure 1 This is a schematic diagram of a cluster-type device fault response method based on graph network optimization in one embodiment of the present invention;

[0019] Figure 2 This is a schematic diagram of the structure of a cluster-type device according to an embodiment of the present invention, wherein PM represents the processing chamber of the cluster-type device, and the processing flow is PM1 / PM2→PM3 / PM4, where PM... i / PM j This indicates that chamber i and chamber j are processed in parallel.

[0020] Figure 3 This is a schematic diagram of the structure of an undirected graph model in one embodiment of the present invention;

[0021] Figure 4 This is a schematic diagram of a cluster-type device fault response system based on graph network optimization in one embodiment of the present invention. Detailed Implementation

[0022] To enable those skilled in the art to better understand the technical solutions of the present invention, the present invention is described in detail through specific embodiments. However, it should be understood that the specific embodiments are provided only for a better understanding of the present invention and should not be construed as limiting the present invention. In the description of the present invention, it should be understood that the terminology used is for descriptive purposes only and should not be construed as indicating or implying relative importance.

[0023] To address the problem of numerous redundant scheduling steps in existing scheduling strategies, this invention proposes a fault response method and system for clustered equipment based on graph network optimization. The method collects the processing status of the clustered equipment; if a fault occurs in a particular piece of equipment, its dynamic characteristics f are determined based on the processing status. d and static features f s Based on the dynamic characteristics of the device f d and static features f s The invention maps devices to nodes in a graph model and marks the nodes. The graph model with marked nodes is then input into a pre-trained policy model to generate a fault response policy. Based on this policy, the clustered devices are controlled to respond to their faults. This method can respond to faults during the scheduling of clustered devices, yields high-quality fault response policies, and has a wider applicability to different operating modes of clustered devices. The invention is further described in detail below with reference to the accompanying drawings and embodiments.

[0024] Example 1

[0025] This embodiment discloses a fault response method for clustered devices based on graph network optimization, such as... Figure 1 and Figure 2 As shown, it includes the following steps:

[0026] S1 collects the processing status of clustered equipment. If a certain piece of equipment malfunctions, the dynamic characteristics f of that equipment are determined based on the processing status. d and static features f s ;

[0027] The status of the cluster-type equipment during the current processing is monitored in real time. If a fault is detected in any of the devices within the cluster-type equipment, the static characteristic f of that device is calculated based on the received status data. s and dynamic features f d If none of the equipment malfunctions, continue to monitor the overall malfunction status of the clustered equipment.

[0028] In this implementation, the dynamic feature f d Including but not limited to: the remaining processing time or dwell time St of the wafer in each chamber i iIn each chamber i, there is a Boolean variable Bw related to the presence or absence of a wafer. i The number of wafers M that can be placed in each chamber i by the robotic arm ri And the remaining resources K of the robotic arm r Static feature f s Including but not limited to: the processing time r for each chamber i i The time w for the robot to pick up and place wafers, the rotation time v of the robot, the remaining processing time or dwell time of the target wafer in each chamber i, and the Boolean variables related to the presence of the target wafer in each chamber i.

[0029] S2 is based on the dynamic characteristics of the device. d and static features f s Map the devices to nodes in the graph model and label the nodes;

[0030] The method for establishing a graphical model is as follows:

[0031] S2.1 Establish a graph model based on the topology of the wafer fabrication process in the clustered equipment, and measure the static characteristics f of each node in the graph model. s and dynamic features f d ;

[0032] S2.2 will use static features f s and dynamic features f d Model it as an undirected graph, such as Figure 3 As shown, nodes 1, 2, 3, and 4 correspond to chambers PM1, PM2, PM3, and PM4, respectively; nodes 0 and 5 correspond to the vacuum locks in the cluster-type device. The relevant process parameters for the cluster-type device are as follows: processing time for PM1 / PM2 is 139 seconds, with a dwell time constraint of 180 seconds; processing time for PM3 / PM4 is 120 seconds, with a dwell time constraint of 180 seconds; robot rotation time is 1 second; and robot wafer pick-and-place time is 1 second. Furthermore, chamber PM1 malfunctions at a certain moment. The characteristic of node i in the undirected graph model is h. i =[f s ||f d ], || denotes the concatenation of vectors;

[0033] S2.3 Each edge of the undirected graph model is split into two edges of a directed graph, where each edge represents the action of the robot arm picking up and placing wafers.

[0034] S3 inputs the graph model with the marked nodes into the pre-trained policy model to generate a fault response policy.

[0035] In this embodiment, a graph attention network is used as the strategy model. The specific computational expression for the graph attention network is:

[0036] e ij =α(Wh i Wh j )

[0037]

[0038]

[0039] Among them, h i Features of node i; e ij The correlation between nodes i and j is used to calculate the attention coefficient α between nodes i and j. ij h′ i Let be the updated features of node i; || represents the vector concatenation operation; W represents the network weights. Furthermore, the specific expression for LeakyReLU is:

[0040] y i =x i ifx i ≥0elsex i / a i

[0041] Among them, a i ∈(1, +∞) is a constant. And σ represents the Sigmoid function, which has the following form:

[0042] σ(x)=1 / (1+e -x )

[0043] Where e is the natural logarithm.

[0044] The method for generating fault response strategies is as follows:

[0045] The S3.1 graph model takes a directed graph as input and a pre-trained policy model as input. It then extracts features from the directed graph to obtain high-dimensional features for each node in the directed graph.

[0046] S3.2 Calculates the similarity between the high-dimensional features of each node, and then outputs the action decision; when calculating the node feature similarity, an attention mechanism is introduced to calculate the inner product, that is, the expression for calculating the feature similarity between node i and node j is:

[0047]

[0048] a ij =MLP(h′) i ||h′ j )

[0049] Among them, s ij h′ represents the feature similarity between node i and node j. i With h′j Let a represent the high-dimensional features of node i and node j respectively; ij The vector represents the attention coefficients of node i for node j; || represents the concatenation operation of vectors. This represents element-wise multiplication of vectors; MLP stands for Multilayer Perceptron. Through the above calculations, the attached... Figure 2 The similarity s of the 8 edges in ij , where node i and node j are connected by an edge.

[0050] S3.3 Based on the similarity, the execution action is sampled, and the connection information of the edge corresponding to the execution action and the high-dimensional features of the node obtained from the previous embedding are input into the graph attention network for secondary node embedding.

[0051] S3.4 The high-dimensional features after secondary node embedding are input into the multilayer perceptron to predict the fault response strategy required for the current step. The fault response strategy includes: the robot arm action to be performed in the current step and the waiting time before the robot arm performs the action.

[0052] In this embodiment, the policy model is trained using the policy gradient method, and its optimization objective is defined as maximizing the value of the initial state, i.e., the objective function is:

[0053]

[0054] in, For strategy π θ The value function of the lower state s0.

[0055] S4 uses a fault response strategy to control clustered devices in order to respond to their faults.

[0056] The scheduling scheme quality obtained by the existing fault response methods for clustered devices is compared with that obtained by the fault response method for clustered devices in this embodiment. The comparison results are shown in Table 1.

[0057] Table 1. Comparison of scheduling scheme quality obtained from fault response methods for several cluster-type devices.

[0058]

[0059] As can be seen from Table 1, the cluster equipment fault response method of this embodiment responds to faults in cluster equipment. Under the same number of complete processed wafers, the transition time of the cyclic steady state before and after the connection fault is significantly lower than that of existing cluster equipment fault response methods. The quality of its scheduling scheme is greatly improved compared with existing methods.

[0060] The graph network optimization method in this embodiment fully considers the operating status of clustered devices and the topology of the process flow during manufacturing, resulting in a scheduling strategy with higher quality. A high-quality scheduling scheme can significantly reduce losses incurred by clustered devices after a failure by reducing the transition time before and after a connection failure, and can be widely applied to fault response of clustered devices in actual wafer manufacturing processes.

[0061] Example 2

[0062] Based on the same inventive concept, this embodiment discloses a cluster-type device fault response system based on graph network optimization, such as... Figure 4 As shown, it includes:

[0063] The data acquisition module collects the processing status of the cluster-type equipment and digitizes it into process data and status data. The fault detection module detects whether a fault has occurred in the current cluster-type equipment according to a pre-set fault detection algorithm and transmits the detection results to the judgment module. If a fault occurs in a certain piece of equipment, the dynamic characteristics f of that equipment are determined based on the processing status. d and static features f s ;

[0064] The determination module will decide whether to activate the status calculation module to start calculating the fault response strategy based on whether the current clustered device has failed.

[0065] The state calculation module is used to calculate the dynamic characteristics f of the device. d and static features f s Map the devices to nodes in the graph model and label the nodes;

[0066] The graph network optimization module is used to input the graph model with marked nodes into the pre-trained policy model to generate fault response policies.

[0067] Example 3

[0068] Based on the same inventive concept, this embodiment discloses a computer-readable storage medium storing a computer program, which is executed by a processor to implement any of the above-mentioned clustered device fault response methods based on graph network optimization.

[0069] Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention and not to limit it. Although the present invention has been described in detail with reference to the above embodiments, those skilled in the art should understand that modifications or equivalent substitutions can still be made to the specific embodiments of the present invention. Any modifications or equivalent substitutions that do not depart from the spirit and scope of the present invention should be covered within the protection scope of the claims of the present invention. The above content is only a specific embodiment of this application, but the protection scope of this application is not limited thereto. Any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope disclosed in this application should be covered within the protection scope of this application. Therefore, the protection scope of this application should be determined by the protection scope of the claims.

Claims

1. A fault response method for clustered equipment based on graph network optimization, characterized in that, Includes the following steps: The processing status of clustered equipment is collected. If a fault occurs in a certain piece of equipment, the dynamic characteristics of that equipment are determined based on the processing status. and static features ; Based on the dynamic characteristics of the device and static features The devices are mapped to nodes in the graph model, and the nodes are marked. The graph model that marks the nodes is input into the pre-trained policy model to generate a fault response policy. Based on the fault response strategy, the cluster-type equipment is controlled to respond to its faults; The dynamic features Includes: each chamber Remaining processing time or dwell time of the wafer Each chamber Boolean variables related to the existence of wafers Each chamber can be inserted into the robotic arm. Number of wafers And the remaining resources of the robotic arm ; The static features Includes: each chamber Processing time Time for robotic arms to pick up and place wafers Rotation time of the robotic arm Each chamber The target wafer's remaining processing time or dwell time and each chamber Boolean variables related to the presence of the target wafer ; The method for establishing the graph model is as follows: A graph model is established based on the topology of wafer fabrication processes in clustered equipment, and the static characteristics of each node in the graph model are measured. and dynamic features ; The static features and dynamic features The model is an undirected graph, where the nodes are... The characteristics are , Indicates the concatenation of vectors; Each edge of the undirected graph model is split into two edges of a directed graph, where each edge represents the action of a robotic arm picking up and placing wafers.

2. The cluster-type equipment fault response method based on graph network optimization as described in claim 1, characterized in that, The strategy model employs a graph attention network.

3. The cluster-type equipment fault response method based on graph network optimization as described in claim 2, characterized in that, The method for generating the fault response strategy is as follows: The directed graph model is input to a pre-trained policy model, and features are extracted from the directed graph to obtain high-dimensional features of each node in the directed graph. Calculate the similarity between the high-dimensional features of each node, and then output the action decision; Based on the similarity, the execution action is sampled, and the connection information of the edge corresponding to the execution action and the high-dimensional features of the node obtained from the previous embedding are input into the graph attention network for secondary node embedding. The high-dimensional features after secondary node embedding are input into a multilayer perceptron to predict the fault response strategy required in the current step.

4. The cluster-type equipment fault response method based on graph network optimization as described in claim 3, characterized in that, The fault response strategy includes: the robot arm action to be executed in the current step and the waiting time before the robot arm executes the action.

5. The cluster-type equipment fault response method based on graph network optimization as described in claim 4, characterized in that, The policy model is trained using the policy gradient method.

6. A cluster-type equipment fault response system based on graph network optimization, characterized in that, include: The data acquisition module is used to collect the processing status of the cluster-type equipment. If a piece of equipment malfunctions, the dynamic characteristics of that equipment are determined based on the processing status. and static features ; The state calculation module is used to calculate the dynamic characteristics of the device. and static features The devices are mapped to nodes in the graph model, and the nodes are marked. The graph network optimization module is used to input the graph model with the nodes marked into the pre-trained policy model to generate a fault response policy. The fault response module is used to control the cluster-type device according to the fault response strategy in order to respond to its faults. The dynamic features Includes: each chamber Remaining processing time or dwell time of the wafer Each chamber Boolean variables related to the existence of wafers Each chamber can be inserted into the robotic arm. Number of wafers And the remaining resources of the robotic arm ; The static features Includes: each chamber Processing time Time for robotic arms to pick up and place wafers Rotation time of the robotic arm Each chamber The target wafer's remaining processing time or dwell time and each chamber Boolean variables related to the presence of the target wafer ; The method for establishing the graph model is as follows: A graph model is established based on the topology of wafer fabrication processes in clustered equipment, and the static characteristics of each node in the graph model are measured. and dynamic features ; The static features and dynamic features The model is an undirected graph, where the nodes are... The characteristics are , Indicates the concatenation of vectors; Each edge of the undirected graph model is split into two edges of a directed graph, where each edge represents the action of a robotic arm picking up and placing wafers.

7. A computer-readable storage medium, characterized in that, The computer-readable storage medium stores a computer program that is executed by a processor to implement the clustered device fault response method based on graph network optimization as described in any one of claims 1-5.