Reinforcement learning based industrial process spatiotemporal causal directed graph modeling method
By constructing a spatiotemporal causal directed graph based on reinforcement learning, the problem of neglecting the time dimension in existing technologies is solved, and a complete causal description and efficient prediction of industrial processes are achieved.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- ZHEJIANG UNIV
- Filing Date
- 2023-07-03
- Publication Date
- 2026-06-30
AI Technical Summary
Existing causal directed graph modeling methods for industrial processes only consider causal relationships in the spatial dimension, ignoring time delays in the temporal dimension, which results in an inability to fully describe the causality of industrial processes.
A spatiotemporal causal directed graph is constructed using a reinforcement learning-based approach. Through data preprocessing, Markov decision processes, and graph attention networks, causal relationships in spatiotemporal process data are mined. The graph structure is updated using policy networks and proximal policy optimization algorithms to obtain a spatiotemporal causal directed graph that best matches the target industrial process.
It improves the interpretability and predictive accuracy of process modeling, can reveal the underlying mechanisms of industrial processes, and is applicable to downstream tasks such as data prediction and fault detection.
Smart Images

Figure CN116880381B_ABST
Abstract
Description
Technical Field
[0001] This invention belongs to the field of industrial artificial intelligence, specifically involving a spatiotemporal causal directed graph modeling method for industrial processes based on reinforcement learning. Background Technology
[0002] With the development of deep learning and artificial intelligence, observational data has become a valuable resource for manufacturing enterprises. Data-driven industrial process modeling methods offer a variety of black-box models for estimation or prediction based on observational data. However, industrial processes mostly involve risk-sensitive tasks, where any unexpected event can lead to catastrophic consequences. In this context, process models must not only demonstrate high-accuracy predictions but also be interpretable. For real industrial processes, due to the existence of equipment connections and control loops, process variables interact and exhibit causal relationships. Causality reveals the underlying mechanisms of industrial processes, enhancing the interpretability of process modeling. Specifically, causality can be characterized using directed graphs. However, existing industrial process causal directed graph modeling methods only consider causal relationships in the spatial dimension, neglecting time delays in the temporal dimension. This time delay between causal variables is crucial for describing the complete causality of industrial processes. Therefore, there is an urgent need to develop a new spatiotemporal causal directed graph modeling method to capture the complete causality of industrial processes in the spatiotemporal dimensions. Summary of the Invention
[0003] The purpose of this invention is to provide a method for spatiotemporal causal directed graph modeling of industrial processes based on reinforcement learning, which can mine the causal relationships within spatiotemporal process data and obtain a spatiotemporal causal directed graph of a specific industrial process.
[0004] To achieve the above-mentioned objectives, the technical solution adopted by this invention is as follows:
[0005] In a first aspect, the present invention provides a spatiotemporal causal directed graph modeling method for industrial processes based on reinforcement learning, comprising:
[0006] S1. For the target industrial process, the spatiotemporal process data of all process variables collected during the operation of the target industrial process are preprocessed to obtain a standardized time series data for each process variable.
[0007] S2. Construct the target industrial process as a spatiotemporal causal directed graph for causal description. The structure of the spatiotemporal causal directed graph includes process nodes, virtual nodes, and directed edges. Each process node corresponds to a process variable. The node attribute of the process node is the time series data of the corresponding process variable. The causal relationship between process nodes is represented by directed edges, and the time delay between process nodes is represented by virtual nodes. If two process variables have a causal relationship, there is a directed edge or a directed path composed of virtual nodes between the corresponding process nodes. The number of virtual nodes between process nodes with causal relationship is positively correlated with the time delay between the corresponding process variables. The initialized spatiotemporal causal directed graph only contains process nodes and has no virtual nodes or directed edges.
[0008] S3. For spatiotemporal causal directed graphs, construct a spatiotemporal causal directed graph identification process based on Markov decision processes, including an environment and an agent. The environment consists of states, actions, state transitions, and rewards. The state is a graph structure of the spatiotemporal causal directed graph, and the action is a discrete four-dimensional action space. The state transition is constrained by the maximum number of virtual nodes and the fact that the cause node and the result node are distinct. The reward is calculated by the change in transit entropy with a prediction range. The agent includes a policy network GAPN and a proximal policy optimization algorithm PPO.
[0009] S4. Spatiotemporal causal directed graph identification is performed based on reinforcement learning. During the identification process, after initializing the environment and agent, actions are generated based on the policy network GAPN to adjust the graph structure of the spatiotemporal causal directed graph. The agent continuously interacts with the environment by performing actions, observes new states, and obtains corresponding rewards. After collecting a predetermined number of state, action, and reward interaction data, the policy network GAPN is updated through backpropagation using the proximal policy optimization algorithm PPO to complete one round of iteration. Then, the updated policy network GAPN is used for the next round of iteration until convergence, completing the training of the policy network GAPN.
[0010] S5. Based on the trained policy network GAPN, the spatiotemporal causal directed graph constructed in S2 is re-identified to obtain the spatiotemporal causal directed graph that best matches the target industrial process.
[0011] As a preferred embodiment of the first aspect above, in S1, data preprocessing includes outlier removal, missing value imputation, and maximum-minimum standardization. The spatiotemporal process data of all process variables, after data preprocessing, are in the form of... in This represents the process data at time t. Let n represent the time series of the i-th process variable. p N represents the total number of process variables in the target industrial process, and N represents the time length of the spatiotemporal process data.
[0012] As a preferred embodiment of the first aspect above, in S2, if there is a causal relationship between two process variables in the spatiotemporal causal directed graph, then the corresponding cause process node p i To the result process node p j Time delay τ i→j for:
[0013] τ i→j =(n ij +1)Δt
[0014] Where: n ij From p i to p j The number of virtual nodes between them, where Δt is the preset minimum time delay between causal process variables.
[0015] As a preferred embodiment of the first aspect above, in S3, the graph structure of the spatiotemporal causal directed graph representing states in the environment is determined by a structure matrix. To characterize each element w of the structure matrix ij This indicates a process node p i To another process node p j The causal structure is defined as follows:
[0016]
[0017] State transitions in the environment also utilize the transformation of the structure matrix W. k+1 ← <W k ,a k >to replace the transformation of the representation graph structure G k+1 ← <G k ,a k >, where W k and W k+1 They represent action a respectively k The structure matrices W and G before and after the execution of the action. k and G k+1 They represent action a respectively k The spatiotemporal causal directed graph structure before and after the execution of the action.
[0018] As a preferred embodiment of the first aspect above, in S3, in the discrete four-dimensional action space of the environment as an action, the first dimension a of the action space... first The process node used to instruct the agent to select the cause process node The second dimension a second Another process node used to indicate the agent's selection is the result process node. The third dimension a operate Used to indicate the cause process node To the result process node Adding or removing directed edges or virtual nodes adjusts the graph structure; the fourth dimension a stop This is used to indicate whether the spatiotemporal causal directed graph identification process can be stopped after the current action is completed.
[0019] As a preferred embodiment of the first aspect above, in S3, for two process nodes p in the environment... i and p j Its propagation entropy with predictive range is defined as:
[0020]
[0021] Where x i and x j It is process node p i and p j The node attributes are the standardized time-series data of the process variables corresponding to the two process nodes; h is the prediction range, representing the range from p. i to p j The time delay.
[0022] As a preferred embodiment of the first aspect above, in S3, the policy network GAPN in the agent consists of a graph attention network, a multilayer perceptron (MLP), and a softmax layer. First, the initialized spatiotemporal causal directed graph is passed through a graph attention network with L layers. Each graph attention layer uses an attention mechanism to aggregate information from neighboring nodes. The output of the final graph attention layer is embedded into the multilayer perceptron (MLP). The probability distribution of the selectable actions is predicted by the multilayer perceptron (MLP) and the softmax function, and the final selected action is obtained by sampling.
[0023] As a preferred embodiment of the first aspect mentioned above, the policy network GAPN has four multilayer perceptrons (MLPs). first MLP second MLP operate MLP stop These are used to predict the four action dimensions in a discrete four-dimensional action space, respectively. first ,a second ,a operate ,a stop The inputs to the four multilayer perceptron (MLP) layers are all the final node embeddings of the output of the last graph attention layer. L The four action dimensions of the final output of the policy network GAPN are calculated by the following formula:
[0024] f first (s k = softmax(MLP) first (H L ))
[0025]
[0026]
[0027] f stop (s k = softmax(MLP) stop (H L ))
[0028] In the formula: a first Cause process node The index is based on a probability distribution. From all n p Sampled from each process node; a second For the result process node The index is based on a probability distribution. From the process node of removing the cause The remaining n besides p Sampled from -1 process nodes; and According to a first and a second Determined cause process nodes and result process nodes The node attributes; a operate and a stop Based on probability distributions respectively and Sampled from {0,1}.
[0029] As a preferred embodiment of the first aspect above, the target industrial process is a sulfur recovery unit (SRU) in petrochemical refining.
[0030] Secondly, this invention provides a soft measurement method for industrial processes based on spatiotemporal causal directed graphs, as detailed below:
[0031] First, following any of the reinforcement learning-based spatiotemporal causal directed graph modeling methods for industrial processes described in the first aspect above, the spatiotemporal causal directed graph G that best matches the target industrial process is obtained. final ;
[0032] Then, spatiotemporal process data X of all process variables in the target industrial process and time-series data y of the quality variable to be predicted are collected and standardized, and the spatiotemporal process data X is used as a spatiotemporal causal directed graph G. final Using the node attributes and the time series data y of the quality variable as label values, train a spatiotemporal causal directed graph G with node attributes. final Graph neural networks (GNNs) are used as inputs to predict quality variables;
[0033] Finally, using the trained graph neural network GNN, the spatiotemporal process data and the spatiotemporal causal directed graph G are analyzed in real time. final To predict quality variables at future moments.
[0034] Compared with the prior art, the beneficial effects of the present invention are as follows:
[0035] This invention employs spatiotemporal causal directed graphs to describe the spatiotemporal causality between industrial process variables, considering the causal relationship and time delay from causal variables to outcome variables, thereby improving the interpretability of process modeling. To discover spatiotemporal causal relationships from process data, this invention describes the spatiotemporal causal directed graph identification process as a Markov decision process and develops a reinforcement learning-based method for identifying spatiotemporal causal directed graphs to obtain the optimal spatiotemporal causal directed graph for the industrial process. Attached Figure Description
[0036] Figure 1 This is a schematic diagram of a spatiotemporal causal directed graph modeling method for industrial processes based on reinforcement learning;
[0037] Figure 2 This is an example of a spacetime causal directed graph;
[0038] Figure 3 This is a schematic diagram of the spatiotemporal causal directed graph identification process based on Markov decision processes;
[0039] Figure 4 This is a flowchart of the application of spatiotemporal causal directed graph recognition based on reinforcement learning;
[0040] Figure 5 This is a process flow diagram of the sulfur recovery unit in an embodiment of the present invention;
[0041] Figure 6 This is a graph showing the reward changes during the training process of this invention.
[0042] Figure 7 The graph structure of the SRU process obtained by the comparison method SSAM;
[0043] Figure 8 The graph structure of the SRU process obtained by the comparison method STSGCN;
[0044] Figure 9 The graph structure of the SRU process obtained based on the method proposed in this invention;
[0045] Figure 10 This is the forecast effect in the embodiment of the present invention. Detailed Implementation
[0046] To make the above-mentioned objects, features, and advantages of the present invention more apparent and understandable, the specific embodiments of the present invention will be described in detail below with reference to the accompanying drawings. Many specific details are set forth in the following description to provide a thorough understanding of the present invention. However, the present invention can be practiced in many other ways different from those described herein, and those skilled in the art can make similar modifications without departing from the spirit of the present invention. Therefore, the present invention is not limited to the specific embodiments disclosed below. Technical features in the various embodiments of the present invention can be combined accordingly without mutual conflict.
[0047] In a preferred embodiment of the present invention, a spatiotemporal causal directed graph modeling method for industrial processes based on reinforcement learning is provided, which includes the following steps S1 to S5, the specific process of which is as follows: Figure 1 As shown below, the implementation method of each step will be described in detail.
[0048] S1. For the target industrial process, perform data preprocessing on the spatiotemporal process data of all process variables collected during the operation of the target industrial process, and obtain a standardized time series data for each process variable.
[0049] It should be noted that the target industrial process in this invention can theoretically include any industrial process with causal process variables, such as a sulfur recovery unit (SRU) in petrochemical refining in the following examples. After determining the target industrial process, the process flow to be modeled is first understood, the process variables involved in industrial process modeling are clarified, and then spatiotemporal process data for a period of time is collected through a distributed control system, and data preprocessing is performed, including abnormal data processing and standardization.
[0050] In embodiments of the present invention, the above data preprocessing includes outlier removal, missing value imputation, and max-min standardization. The spatiotemporal process data of all process variables, after data preprocessing, are in the form of… in
[0051] This represents the process data at time t. Let n represent the time series of the i-th process variable. p N represents the total number of process variables in the target industrial process, and N represents the time length of the spatiotemporal process data.
[0052] S2. Construct the target industrial process as a spatiotemporal causal directed graph for causal description. The structure of the spatiotemporal causal directed graph includes process nodes, virtual nodes, and directed edges. Each process node corresponds to a process variable. The node attribute of the process node is the time series data of the corresponding process variable. The causal relationship between process nodes is represented by directed edges, and the time delay between process nodes is represented by virtual nodes. If two process variables have a causal relationship, there is a directed edge or a directed path composed of virtual nodes between the corresponding process nodes. The number of virtual nodes between process nodes with causal relationship is positively correlated with the time delay between the corresponding process variables. The initialized spatiotemporal causal directed graph only has process nodes and no virtual nodes or directed edges.
[0053] It should be noted that a complete causal relationship should include both spatial and temporal factors. Spatially, the cause and its effect are physically linked; temporally, the cause always precedes its effect. Therefore, this invention introduces a directed graph to describe the spatiotemporal causality of industrial processes, using process nodes to represent process variables and virtual nodes to characterize time delays. If two process variables are causally related, there are directed edges or directed paths composed of virtual nodes between the corresponding process nodes. The number of virtual nodes represents the different time delays between causally related process variables.
[0054] In an embodiment of the present invention, if there is a causal relationship between two process variables in a spatiotemporal causal directed graph, then the corresponding cause process node p i To the result process node p j Time delay τ i→j for:
[0055] τ i→j =(n ij +1)Δt
[0056] Where: n ij From p i to p j The number of virtual nodes between them, where Δt is the preset minimum time delay between causal process variables.
[0057] Figure 2 An example of a spacetime causal directed graph (STCG) is given, where solid circles represent process nodes and dashed circles represent virtual nodes.
[0058] S3. For spatiotemporal causal directed graphs, construct a spatiotemporal causal directed graph identification process based on Markov decision processes, including an environment (RL environment) and an agent (RL agent). The RL environment consists of states, actions, state transitions, and rewards. The state is the graph structure of the spatiotemporal causal directed graph, the action is a discrete four-dimensional action space, the state transition is constrained by the maximum number of virtual nodes and the fact that the cause node and the result node are different, and the reward is calculated by the change in the propagation entropy with a prediction range. The RL agent includes a policy network GAPN and a proximal policy optimization algorithm PPO.
[0059] In this invention, such as Figure 3 As shown, the spatiotemporal causal directed graph identification process is described as a Markov decision process, including an RL environment and an RL agent. The specific forms of the RL environment and RL agent in the embodiments of this invention will be elaborated below.
[0060] In the RL environment, the spatiotemporal causal directed graph structure, which serves as the state, is represented by the structure matrix. To characterize each element w of the structure matrix ij This indicates a process node p i To another process node p j The causal structure is defined as follows:
[0061]
[0062] Furthermore, since the graph contains node numbers, the state transitions in the RL environment also utilize the transformation W of the structure matrix. k+1 ← <W k ,a k >to replace the transformation of the representation graph structure G k+1 ← <G k ,a k >, where W k and W k+1 They represent action a respectively k The structure matrices W and G before and after the execution of the action. k and G k+1 They represent action a respectively k The graph structure G of the spatiotemporal causal directed graph before and after the execution of the action.
[0063] In the aforementioned RL environment, within the discrete four-dimensional action space that represents actions, the first dimension 'a' of the action space... first The process node used to instruct the RL agent to select as the cause process node The second dimension a second Another process node used to instruct the RL agent to select is the result process node. The third dimension a operateUsed to indicate the cause process node To the result process node Adding or removing directed edges or virtual nodes adjusts the graph structure; the fourth dimension a stop This is used to indicate whether the spatiotemporal causal directed graph identification process can be stopped after the current action is completed.
[0064] In the above RL environment, for two process nodes p i and p j Its propagation entropy with predictive range is defined as:
[0065]
[0066] Where x i and x j It is process node p i and p j The node attributes are the standardized time-series data of the process variables corresponding to the two process nodes; h is the prediction range, representing the range from p. i to p j The time delay.
[0067] The policy network GAPN in the above RL agent consists of a graph attention network, a multilayer perceptron (MLP), and a softmax layer. First, the initial spatiotemporal causal directed graph is passed through a graph attention network with L layers. Each graph attention layer uses an attention mechanism to aggregate information from neighboring nodes. The output of the last graph attention layer is embedded into the MLP. The MLP and the softmax function predict the probability distribution of the available actions and sample them to obtain the final selected action.
[0068] The aforementioned policy network GAPN contains four multilayer perceptrons (MLPs). first MLP second MLP operate MLP stop These are used to predict the four action dimensions in a discrete four-dimensional action space, respectively. first ,a second ,a operate ,a stop The inputs to the four multilayer perceptron (MLP) layers are all the final node embeddings of the output of the last graph attention layer. L The four action dimensions of the final output of the policy network GAPN are calculated by the following formula:
[0069] f first (s k = softmax(MLP) first (H L ))
[0070]
[0071]
[0072] f stop (s k = softmax(MLP) stop (H L ))
[0073] In the formula: a first Cause process node The index is based on a probability distribution. From all n p Sampled from each process node; a second For the result process node The index is based on a probability distribution. From the process node of removing the cause The remaining n besides p Sampled from -1 process nodes; and According to a first and a second Determined cause process nodes and result process nodes The node attributes; a operate and a stop Based on probability distributions respectively and Sampled from {0,1}.
[0074] S4. Spatiotemporal causal directed graph identification is performed based on reinforcement learning. The identification process requires the use of the Markov decision process framework constructed in S3. Specifically, after initializing the RL environment and RL agent, actions to adjust the graph structure of the spatiotemporal causal directed graph are generated based on the policy network GAPN. The RL agent continuously interacts with the environment by executing actions, observes new states, and obtains corresponding rewards. After collecting a predetermined number of state, action, and reward interaction data, the policy network GAPN is updated through backpropagation using the proximal policy optimization algorithm PPO, completing one round of iteration. Then, the updated policy network GAPN is used for the next round of iteration until convergence, completing the training of the policy network GAPN.
[0075] It should be noted that each iteration in S4 requires the collection of a predetermined amount of state, action, and reward interaction data before execution. The specific number can be optimized based on actual conditions. Since the action in the fourth dimension of the policy network GAPN may cause the spatiotemporal causal directed graph identification process to stop before the predetermined amount of data collection is completed, if this occurs, the RL environment and RL agent can be reinitialized, and the iteration can continue until the predetermined amount of data collection is completed.
[0076] S5. Based on the trained policy network GAPN, the spatiotemporal causal directed graph constructed in S2 is re-identified to obtain the spatiotemporal causal directed graph that best matches the target industrial process.
[0077] The training and inference processes in S4 and S5 above are as follows: Figure 4 As shown. It should be noted that the identification process in S5 is consistent with that in S4, and also uses the Markov decision process framework constructed in S3. The specific process will not be elaborated further, but a fourth dimension, 'a', is added to the action space output by the policy network GAPN. stop If the instruction to stop the spatiotemporal causal directed graph identification process is given, then the identification process will stop, and there is no need to continue to execute the next action. The spatiotemporal causal directed graph obtained after stopping the identification can be regarded as the spatiotemporal causal directed graph that best matches the target industrial process.
[0078] Based on the spatiotemporal causal directed graph modeling method for industrial processes shown in S1 to S5 above, a spatiotemporal causal directed graph of industrial processes can be obtained. This directed graph reflects the causal relationships exhibited by the interaction of process variables due to the existence of equipment connections and control loops in real industrial processes. The causality in this spatiotemporal causal directed graph can reveal the potential mechanisms of industrial processes, improve the interpretability of process modeling, and thus be used for downstream tasks such as data prediction or fault detection.
[0079] Therefore, the present invention can also provide a soft measurement method for industrial processes based on spatiotemporal causal directed graphs, the specific implementation of which is as follows:
[0080] First, following the reinforcement learning-based spatiotemporal causal directed graph modeling method for industrial processes shown in S1 to S5 above, the spatiotemporal causal directed graph G that best matches the target industrial process is obtained. final ;
[0081] Then, spatiotemporal process data X of all process variables in the target industrial process and time-series data y of the quality variable to be predicted are collected and standardized, and the spatiotemporal process data X is used as a spatiotemporal causal directed graph G. final Using the node attributes and the time series data y of the quality variable as label values, train a spatiotemporal causal directed graph G with node attributes. finalGraph neural networks (GNNs) are used as inputs to predict quality variables;
[0082] Finally, using the trained graph neural network GNN, the spatiotemporal process data and the spatiotemporal causal directed graph G are analyzed in real time. final (Real-time collected spatiotemporal process data is used as a spatiotemporal causal directed graph G) final (Node attributes) to predict quality variables at future moments.
[0083] The above-mentioned reinforcement learning-based spatiotemporal causal directed graph modeling method for industrial processes, as well as the above-mentioned spatiotemporal causal directed graph-based soft measurement method for industrial processes, will be applied to a specific example to demonstrate their specific implementation and technical effects.
[0084] Example
[0085] like Figure 1 As shown, this embodiment uses a specific industrial process as an example to illustrate the effectiveness of the invention. The sulfur recovery unit (SRU) is a key process in petroleum refining and chemical engineering. Its main function is to recover elemental sulfur from acidic gases before they are released into the atmosphere, thereby mitigating environmental pollution. The proposed method is evaluated using a publicly available SRU dataset, and its process flow is as follows: Figure 5 As shown in Table 1, detailed information on the five process variables involved in SRU is provided.
[0086] Table 1. Process variables involved in SRU
[0087] Serial Number Process variable description 1 MEA zone gas flow 2 MEA Zone First Air Flow 3 MEA Zone Second Airflow 4 SWS zone gas flow 5 SWS zone airflow
[0088] For SRU industrial processes, this embodiment provides a spatiotemporal causal directed graph modeling method based on reinforcement learning. The implementation steps are as follows:
[0089] Step 1: Data Acquisition and Preprocessing
[0090] Data containing n is collected through a distributed control system. p Given N process variables and N time length spatiotemporal process data, outlier removal, missing value imputation, and max-min standardization were performed to obtain... in This represents the process data at time t. This represents the time series of the i-th process variable.
[0091] Step 2: Causal description based on spatiotemporal causal directed graph
[0092] To describe the spatiotemporal causal dependencies between process variables in a SRU, this embodiment constructs a spatiotemporal causal directed graph (STCG), denoted as G = (P, V, M), where is np A set of process nodes. is n v A set of virtual nodes It is a set of directed edges. Each process node in STCG corresponds to a process variable in the SRU industrial process, and the time series associated with the process variable is considered a node attribute of the corresponding process node. If two process variables are causally related, there is a directed edge or a directed path consisting of virtual nodes between the corresponding process nodes. The number of virtual nodes between causally related process variables indicates the existence of different time delays. Based on the above analysis, from the causal process node p i To the result process node p j Time delay τ i→j for
[0093] τ i→j =(n ij +1)Δt
[0094] Where n ij From p i to p j The number of virtual nodes, where Δt is the minimum time delay between variables in a causal process.
[0095] Step 3: Spatiotemporal Causal Directed Graph Identification Process Based on Markov Decision Process
[0096] This embodiment describes the STCG identification process as a Markov decision process, including an RL environment and an RL agent. The RL environment consists of states, actions, state transitions, and rewards; the RL agent consists of a policy network GAPN and an optimization algorithm PPO.
[0097] 3.1 Status
[0098] Constructing a complete STCG requires multiple decision-making steps within a Markov decision process. At each step, the RL agent modifies the STCG's graph structure by adding / removing a directed edge or a dummy node. Therefore, the graph structure of the intermediate STCG is defined as the state, i.e., s. k =G k A structure matrix is introduced. To characterize the graph structure of the intermediate STCG. Each element w of the structure matrix ij This indicates a process node p i To another process node p j The causal structure is defined as follows:
[0099]
[0100] 3.2 Actions
[0101] A discrete four-dimensional action space is defined, namely
[0102] a k =[a first ,a second ,a operate ,a stop ]
[0103] The first dimension a of the action space first The process node selected by the RL agent is used as the cause. The second dimension, a... second Instructs the RL agent to select another process node as the result. The third dimension, a. operate Indicates from the cause process node To the result process node Add / remove directed edges or virtual nodes: If arrive There are no directed edges or directed paths consisting of virtual nodes between them, a operate =1 indicates the addition of a directed edge; otherwise, it indicates the addition of a virtual node. arrive There is only one directed edge between them, a operate A value of 0 indicates the deletion of the directed edge; otherwise, it indicates the deletion of a virtual node. The fourth dimension, a... stop This indicates whether the current action can stop the STCG recognition process. A value of 0 means the recognition process can continue, and a value of 1 means the recognition process can be stopped.
[0104] 3.3 State Transition
[0105] Given the current state and the chosen action, the state transition specifies the potential next state, i.e., G. k+1 ← <G k ,a k This embodiment can be converted to W using a structure matrix. k+1 ← <W k ,a k Furthermore, RL environments adhere to specific rules; for example, the maximum number of virtual nodes between two process nodes is constrained, and causal relationships must not involve the same process node. Reasonable actions proposed by the policy network will be executed to update the current state, while any infeasible actions will be rejected, leaving the current state unchanged.
[0106] 3.4 Rewards
[0107] This embodiment uses the transfer entropy with a prediction range (TE-PH) to quantify the process node p. i To another process node p j Spatiotemporal causal relationships:
[0108]
[0109] Where x i and x j It is p i and p j The node attributes, representing time-series properties, are derived from collected process data. h is the prediction range, indicating the distance from p... i to p j The time delay. Effective operations in the STCG identification process can modify the existence of causality or the magnitude of time delay between two process nodes. To quantify the effect of each action, this embodiment considers the change in TE-PH caused by the action as a reward. If the action increases the TE-PH value compared to the previous state, a positive reward is assigned; otherwise, a negative reward is assigned.
[0110] 3.4 Policy Network
[0111] This embodiment employs a Graph Attention Policy Network (GAPN) to predict the probability distribution of each possible action in the current state, thereby helping the RL agent make optimal decisions. GAPN takes an intermediate STCG graph structure with a variable number of nodes as input and produces predicted actions as output. The total node attributes of the intermediate STCG are used as the initial node embedding, i.e. Here, 0 represents a virtual node initialized to empty. A graph attention network consisting of L graph attention (GAT) layers is used to compute the final node embedding. At each GAT layer, an attention mechanism is used to aggregate information from neighboring nodes.
[0112]
[0113]
[0114] Where LeakyReLU is the activation function, and a and W are the trainable neural network parameters of the GAT layer.
[0115] Given state s t This embodiment can obtain the final node embedding H of the intermediate STCG output at the last graph attention layer. L Then, the probability distribution of the optional action a is predicted using a multilayer perceptron (MLP) and a softmax function. k It is obtained by sampling each component based on a probability distribution.
[0116]
[0117]
[0118]
[0119] f stop (s k = softmax(MLP) stop (H L )),a stop ~f stop (s k )∈{0,1}
[0120] It should be noted that: a first Cause process node The index is based on a probability distribution. From all n p Sampled from each process node; a second For the result process node The index is based on a probability distribution. From the process node of removing the cause The remaining n besides p Sampled from -1 process nodes; and According to a first and a second Determined cause process nodes and result process nodes The node attributes; a operate and a stop Based on probability distributions respectively and Sampled from {0,1}.
[0121] 3.5 Optimization Algorithm
[0122] The Proximal Policy Optimization (PPO) used in this embodiment is a state-of-the-art policy gradient-based algorithm for optimizing the GAPN policy network. PPO employs an Actor-Critic architecture, where the Actor estimates the value function, and the Critic selects actions based on the learned policy. The PPO algorithm is existing technology, and its principles will not be elaborated further.
[0123] Step 4: Application of Spatiotemporal Causal Directed Graph Recognition Based on Reinforcement Learning
[0124] like Figure 4 As shown, the spatiotemporal causal directed graph recognition application based on reinforcement learning is divided into an offline training stage and an inference stage.
[0125] 4.1 Offline Training Phase:
[0126] (1) Based on the above definitions, initialize the RL environment and RL agent;
[0127] (2) Based on the policy network GAPN, the RL agent interacts with the environment by performing actions, observes new states, and obtains corresponding rewards;
[0128] (3) Collect several state-action-reward interaction data, update the neural network parameters based on backpropagation, and minimize the loss function defined by the PPO optimization algorithm;
[0129] (4) Repeat (1)-(3) to perform reinforcement learning iterative training until the cumulative reward converges and the trained policy network is obtained.
[0130] 4.2 Reasoning Stage:
[0131] Based on the optimal identification strategy provided by the trained policy network, multi-step decision-making is performed on the STGG within the Markov decision process framework until the stopping condition is met, thus obtaining the optimal spatiotemporal causal directed graph STCG for the industrial process.
[0132] Step 5: Application Method of Soft Sensing in Industrial Processes Based on Spatiotemporal Causal Directed Graphs
[0133] Obtain the optimal spatiotemporal causal directed graph G for the industrial process final Then, using STCG as input to a graph neural network (GNN) can accomplish industrial tasks such as soft measurement and fault diagnosis. Taking soft measurement as an example, the specific implementation steps are as follows:
[0134] (1) Collect and standardize data of M quality variables through manual sampling and laboratory analysis. Collect corresponding process variable data through a distributed control system.
[0135] (2) The identified spatiotemporal causal directed graph G final Process variable data is used as the input to the GNN, and quality variable data is used as the output of the GNN. The GNN is trained using the training data with mean squared error as the loss function.
[0136]
[0137] (3) Deploy and apply the trained GNN based on the new process variable data x. new This allows for real-time forecasting of quality variables.
[0138] In this embodiment, 200 process data points from the publicly available SRU dataset are used to identify the spatiotemporal causal directed graph of the SRU. Comparison methods include SSAM (Spatial Self-Attention Mechanism), STSGCN (Spatiotemporal Synchronous Graph Convolutional Network), and this invention (a reinforcement learning-based method for modeling spatiotemporal causal directed graphs of industrial processes).
[0139] Figure 6 The graph shows the reward variation during the training process of this invention, indicating that the training algorithm converges relatively stably. During the STCG construction process, the RL agent maximizes the reward by altering the spatiotemporal causal relationships between process variables, thereby obtaining the optimal spatiotemporal causal directed graph STCG for the SRU process.
[0140] Figure 7 To compare the graph structure of the SRU process obtained by the SSAM method, since it does not consider the time dimension and does not have causal constraints, the graph structure of the SRU process obtained by SSAM only describes the spatial correlation.
[0141] Figure 8 To compare the graph structure of the SRU process obtained by the STSGCN method, hollow and solid circles represent process nodes at two consecutive time steps, respectively. STSGCN establishes a spatiotemporal synchronization graph by adding additional time edges between two consecutive time steps, but it is limited to only two time steps, and the graph learning process lacks causal constraints. Therefore, the graph structure of the SRU process obtained by STSGCN can only describe local spatiotemporal correlations.
[0142] Figure 9 The graph structure of the SRU process obtained based on the method proposed in this invention is a spatiotemporal causal directed graph (STCG), where hollow circles represent process nodes and solid circles represent virtual nodes. Two process nodes with a causal relationship are connected by a directed edge or a directed path composed of virtual nodes. During the STCG identification process, causality, constraints, and guidance are provided by a reward design with a predictive range of transfer entropy. Therefore, STCG can simultaneously capture the spatiotemporal causal structure of the SRU process on a directed graph, and further causal relationships can be utilized using graph neural networks.
[0143] Using the obtained spatiotemporal causal directed graph and graph attention network GAT, a soft measurement of SO2 content in the SRU process was performed. The first 5000 process data points were used for training the soft measurement model, and the remaining data were used for testing. The model's root mean square error on the test set was 0.0268, and the coefficient of determination was 0.7305. The prediction performance was as follows: Figure 10 As shown.
[0144] As described above, the spatiotemporal causal directed graph modeling method for industrial processes based on reinforcement learning proposed in this invention has satisfactory causal process modeling results, and the spatiotemporal causal directed graph obtained by this invention can be used to complete practical industrial tasks.
[0145] The above embodiments are not intended to limit the present invention, and the present invention is not limited to the above embodiments. Any embodiment that meets the requirements of the present invention is within the protection scope of the present invention.
Claims
1. A method for modeling spatiotemporal causal directed graphs of industrial processes based on reinforcement learning, characterized in that, include: S1. For the target industrial process, the spatiotemporal process data of all process variables collected during the operation of the target industrial process are preprocessed to obtain a standardized time series data for each process variable. S2. Construct the target industrial process as a spatiotemporal causal directed graph for causal description. The structure of the spatiotemporal causal directed graph includes process nodes, virtual nodes, and directed edges. Each process node corresponds to a process variable. The node attribute of the process node is the time series data of the corresponding process variable. The causal relationship between process nodes is represented by directed edges, and the time delay between process nodes is represented by virtual nodes. If two process variables have a causal relationship, there is a directed edge or a directed path composed of virtual nodes between the corresponding process nodes. The number of virtual nodes between process nodes with causal relationship is positively correlated with the time delay between the corresponding process variables. The initialized spatiotemporal causal directed graph only contains process nodes and has no virtual nodes or directed edges. S3. For spatiotemporal causal directed graphs, construct a spatiotemporal causal directed graph identification process based on Markov decision processes, including an environment and an agent. The environment consists of states, actions, state transitions, and rewards. The state is a graph structure of the spatiotemporal causal directed graph, and the action is a discrete four-dimensional action space. The state transition is constrained by the maximum number of virtual nodes and the fact that the cause node and the result node are distinct. The reward is calculated by the change in transit entropy with a prediction range. The agent includes a policy network GAPN and a proximal policy optimization algorithm PPO. S4. Spatiotemporal causal directed graph identification is performed based on reinforcement learning. During the identification process, after initializing the environment and agent, actions are generated based on the policy network GAPN to adjust the graph structure of the spatiotemporal causal directed graph. The agent continuously interacts with the environment by performing actions, observes new states, and obtains corresponding rewards. After collecting a predetermined number of state, action, and reward interaction data, the policy network GAPN is updated through backpropagation using the proximal policy optimization algorithm PPO to complete one round of iteration. Then, the updated policy network GAPN is used for the next round of iteration until convergence, completing the training of the policy network GAPN. S5. Based on the trained policy network GAPN, the spatiotemporal causal directed graph constructed in S2 is re-identified to obtain the spatiotemporal causal directed graph that best matches the target industrial process. In S2, if there is a causal relationship between two process variables in the spatiotemporal causal directed graph, then the corresponding cause process node... To the result process node Time delay for: ; in: From arrive The number of virtual nodes between them It is the minimum time delay between the pre-defined causal process variables; In S3, the graph structure of the spatiotemporal causal directed graph representing states in the environment is constituted by a structure matrix. To characterize each element of the structure matrix Indicates a process node To another process node The causal structure is defined as follows: ; State transitions in the environment also utilize transformations of the structure matrix. To replace the transformation of the representation graph structure ,in and Each represents an action Structure matrix before and after execution of actions , and Each represents an action The spatiotemporal causal directed graph structure before and after the action is executed. ; In S3, the first dimension of the discrete four-dimensional action space in the environment, which represents the action, is... The process node used to instruct the agent to select the cause process node The second dimension Another process node used to indicate the agent's selection is the result process node. The third dimension Used to indicate the cause process node To the result process node Adding or removing directed edges or virtual nodes adjusts the graph structure; the fourth dimension. Used to indicate whether the spatiotemporal causal directed graph identification process can be stopped after the current action is completed; The target industrial process is the treatment process of the sulfur recovery unit (SRU) in oil refining and chemical engineering.
2. The reinforcement learning-based spatiotemporal causal directed graph modeling method for industrial processes as described in claim 1, characterized in that, In step S1, data preprocessing includes outlier removal, missing value imputation, and maximum-minimum standardization. The spatiotemporal process data of all process variables, after preprocessing, are in the following form: ,in This represents the process data at time t. Let n represent the time series of the i-th process variable. p N represents the total number of process variables in the target industrial process, and N represents the time length of the spatiotemporal process data.
3. The reinforcement learning-based spatiotemporal causal directed graph modeling method for industrial processes as described in claim 1, characterized in that, In S3, for two process nodes in the environment and Its propagation entropy with predictive range is defined as: ; in and It is a process node and The node attributes are the standardized time-series data of the process variables corresponding to the two process nodes; h is the prediction range, representing the time series data from... arrive The time delay; N represents the time length of the spatiotemporal process data.
4. The reinforcement learning-based spatiotemporal causal directed graph modeling method for industrial processes as described in claim 1, characterized in that, In S3, the policy network GAPN in the agent consists of a graph attention network, a multilayer perceptron (MLP), and a softmax layer. First, the initialized spatiotemporal causal directed graph is passed through a graph attention network with L layers. Each graph attention layer uses an attention mechanism to aggregate information from neighboring nodes. The output of the last graph attention layer is embedded into the MLP. The MLP and the softmax function predict the probability distribution of the available actions and sample them to obtain the final selected action.
5. The reinforcement learning-based spatiotemporal causal directed graph modeling method for industrial processes as described in claim 4, characterized in that, The policy network GAPN has four multilayer perceptrons (MLPs). , , , They are used to predict the four action dimensions in a discrete four-dimensional action space. The inputs to the four multilayer perceptron (MLP) layers are the final node embeddings of the output of the last graph attention layer. The four action dimensions of the final output of the policy network GAPN are calculated by the following formula: ; ; ; ; In the formula: Cause process node The index is based on a probability distribution. From all n p Sampled from each process node; For the result process node The index is based on a probability distribution. From the process node of removing the cause The remaining n besides p Sampled from -1 process nodes; and According to and Determined cause process nodes and result process nodes Node attributes; and Based on probability distributions respectively and from Obtained from sampling.
6. A soft measurement method for industrial processes based on spatiotemporal causal directed graphs, characterized in that: First, according to the reinforcement learning-based spatiotemporal causal directed graph modeling method for industrial processes as described in any one of claims 1 to 5, the spatiotemporal causal directed graph that best matches the target industrial process is obtained. ; Then, spatiotemporal process data of all process variables in the target industrial process are collected and standardized. and time series data of the quality variables to be predicted spatiotemporal process data As a directed graph of spacetime causality The node attributes, in terms of quality variable time series data As label values, train a spatiotemporal causal directed graph with node attributes. Graph neural networks (GNNs) are used as inputs to predict quality variables; Finally, the trained graph neural network (GNN) is used to analyze the real-time spatiotemporal process data and the spatiotemporal causal directed graph. To predict quality variables at future moments.