A control method and system for PEMFC injection gas supply system based on deep reinforcement learning

By using a model-based deep reinforcement learning approach, a dynamic system model of the PEMFC jet gas supply system is established using a deep neural network. The model is then combined with an actor-critic framework for interaction, which solves the problems of difficult application of control strategies and low sample sampling efficiency in traditional methods, and achieves real-time optimal control of the PEMFC jet gas supply system.

CN117613311BActive Publication Date: 2026-06-30SHANDONG UNIV

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
SHANDONG UNIV
Filing Date
2023-11-28
Publication Date
2026-06-30

Smart Images

  • Figure CN117613311B_ABST
    Figure CN117613311B_ABST
Patent Text Reader

Abstract

This invention belongs to the technical field of fuel cell gas supply systems and provides a control method and system for PEMFC injection gas supply systems based on deep reinforcement learning. Addressing the difficulty in establishing accurate control models for PEMFC injection gas supply systems, this invention proposes a control method for PEMFC injection gas supply systems based on deep reinforcement learning. First, a dynamic system model of the PEMFC injection gas supply system is established using a deep neural network. Second, an actor-critic framework is used to interact with the learned dynamic system model of the PEMFC injection gas supply system and maximize the cumulative reward within the prediction interval, thus learning a neural network strategy based on model predictive control. Finally, by fixing the parameters of the actor network model and deploying the actor network in the controller of the PEMFC injection gas supply system, real-time optimal control of the PEMFC injection gas supply system can be achieved.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention belongs to the technical field of fuel cell gas supply system, and particularly relates to a control method and system for a PEMFC injection gas supply system based on model deep reinforcement learning. Background Technology

[0002] The statements in this section are merely background information related to the present invention and do not necessarily constitute prior art.

[0003] Proton exchange membrane fuel cells (PEMFCs) have attracted much attention due to their advantages such as high energy conversion efficiency, low operating noise, fast response speed, and low pollution. In injection-type PEMFCs, the gas supply system is controlled to supply an excess of hydrogen and oxygen, thus avoiding the battery life degradation caused by hydrogen and oxygen starvation. Simultaneously, injectors replace the hydrogen circulation pump to circulate unreacted hydrogen, and the water generated in the reaction is discharged through hydrogen circulation, avoiding flooding and improving hydrogen utilization. However, the injection-type PEMFC gas supply system is a complex system with multiple variables, nonlinearity, and strong coupling. Establishing a precise control model is difficult, and traditional model-based control methods are challenging to apply. Therefore, intelligent control technology is needed to ensure the efficient operation of injection-type PEMFCs.

[0004] With the development of artificial intelligence technology, deep reinforcement learning can be used to solve the control problem of PEMFC injection gas supply system. Compared with traditional model-based control methods, this control method does not require prior knowledge of the PEMFC injection gas supply system. It obtains the optimal control strategy by interacting with the PEMFC injection gas supply system.

[0005] The inventors discovered that the traditional control strategy for jet-type air supply systems, which uses model-free deep reinforcement learning, has low sample sampling efficiency and requires a lot of frequent interactions with the real system to obtain a better control strategy. The interaction process often deviates from the normal operating conditions of the system, affecting the normal operation of the real system and even causing damage to the system.

[0006] In addition to learning control strategies, model-based deep reinforcement learning also learns the dynamic system model of the environment, thereby greatly reducing the amount of data required and improving sample sampling efficiency. Furthermore, by interacting with the model, model-based deep reinforcement learning does not affect the normal operation of the real system and can ensure the safe operation of the system. Summary of the Invention

[0007] To address at least one of the technical problems mentioned above, this invention provides a control method and system for a PEMFC injection gas supply system based on deep reinforcement learning. By learning the dynamic system model of the PEMFC injection gas supply system, the required amount of data is greatly reduced and the sample sampling efficiency is improved. Furthermore, based on deep reinforcement learning, the optimal control strategy is learned by interacting with the learned dynamic system model of the PEMFC injection gas supply system without affecting the normal operation of the real system and ensuring the safe operation of the system.

[0008] To achieve the above objectives, the present invention adopts the following technical solution:

[0009] The first aspect of the present invention provides a control method for a PEMFC injection gas supply system based on model deep reinforcement learning, comprising the following steps:

[0010] The control problem of the PEMFC injection gas supply system is described as a Markov decision process, and a dynamic system model of the PEMFC injection gas supply system is established using a deep neural network.

[0011] The optimal control strategy is obtained by interacting with the dynamic system model of the PEMFC injection gas supply system based on the reinforcement learning mechanism, specifically including:

[0012] Based on the actor-critic framework, the dynamic system model of the PEMFC injection gas supply system interacts with the learned model in the rolling prediction time domain. By maximizing the cumulative reward within the prediction interval, a neural network policy based on model predictive control is learned.

[0013] The parameters of the actor network model are fixed and the actor network is deployed in the controller of the PEMFC jet gas supply system to achieve real-time optimal control of the PEMFC jet gas supply system.

[0014] Furthermore, the description of the PEMFC injection gas supply system control problem as a Markov decision process includes:

[0015] Define states, actions, and rewards;

[0016] The state is defined as the control objective of the PEMFC injection gas supply system and the system state variables related to the control objective. The state s observed by the agent at time t is:

[0017]

[0018] The action of the agent at time t is defined as:

[0019] a t =[AP t HPS t HRSt ],

[0020] The reward obtained by the agent at time t is defined as:

[0021]

[0022] in, For excess oxygen ratio, For the excess hydrogen ratio, ΔP t The pressure difference between the anode and cathode films. For cathode pressure, P is the mass flow rate at the injector outlet. t sm,an P is the anode inlet pressure. t fc This refers to the system output power; AP t For air compressor voltage, HPS t HRS is the setpoint for the hydrogen pressure regulating valve. t Set the value for the hydrogen return regulating valve; P t target λ1, λ2, and λ3 are the target setpoints for the excess oxygen ratio, excess hydrogen ratio, and anode-cathode membrane pressure difference of the PEMFC injection gas supply system at time t, respectively, and are the reward coefficients of the environment for the three tracking errors.

[0023] Furthermore, the control objective of the PEMFC jet gas supply system is to make the oxygen excess ratio, hydrogen excess ratio, and anode-cathode membrane pressure difference track the optimal set values. This is achieved by changing three variables: the air compressor voltage, the hydrogen pressure regulating valve set value, and the hydrogen return regulating valve set value.

[0024] Furthermore, the establishment of a dynamic system model for the PEMFC injection gas supply system using a deep neural network includes:

[0025] A trajectory of length M is obtained by interacting with the PEMFC injection air supply system using the controller;

[0026] The trajectory is divided into M tuples and stored in the model training experience replay pool. The data in the model training experience replay pool is then normalized.

[0027] The deep neural network model is trained based on normalized data. Based on the current state and actions of the PEMFC injection gas supply system, the deep neural network is used to predict the state change of the PEMFC injection gas supply system at the next moment.

[0028] Furthermore, the actor network takes the state of the PEMFC jet gas supply system as input and outputs the optimal control strategy corresponding to the current state of the PEMFC jet gas supply system. The critic network takes the state of the PEMFC jet gas supply system and the corresponding control action as input and outputs the value corresponding to the state and control action. Furthermore, the actor-critic framework network interacts with the learned dynamic system model of the PEMFC jet gas supply system in a rolling prediction time domain manner, including:

[0029] The actor network, which learns a deterministic policy, interacts with the dynamic system model of the PEMFC injection gas supply system in a rolling prediction time domain. The interaction between the two results in a trajectory of length denoted by the prediction interval. The cumulative reward corresponding to the state transition within the trajectory is the cumulative reward within the prediction interval. The actual cumulative reward within the prediction interval obtained from the interaction is used as the update target of the critic network.

[0030] Furthermore, the training process of the actor-critic deep neural network includes:

[0031] K tuples of data are randomly sampled from the policy learning experience replay pool. The commentator network model parameters are updated by minimizing the commentator network loss function using gradient descent. The actor network model parameters are updated by using gradient ascent and deterministic policy gradient. The actor-commentator network interacts with the dynamic system model of the PEMFC injection gas supply system learned from deep neural networks to continuously update the network model parameters of the actor and commentator, thereby obtaining the optimal control policy.

[0032] A second aspect of the present invention provides a control system for a PEMFC injection gas supply system based on deep reinforcement learning, comprising:

[0033] The dynamic system model building module is configured to describe the control problem of the PEMFC injection gas supply system as a Markov decision process and to build a dynamic system model of the PEMFC injection gas supply system using a deep neural network.

[0034] The deep reinforcement learning module is configured to interact with the learned dynamic system model of the PEMFC injection gas supply system based on reinforcement learning mechanisms to obtain the optimal control strategy, specifically including:

[0035] The dynamic system model of the PEMFC injection gas supply system interacts with the actor-critic framework network in a rolling prediction time domain. By maximizing the cumulative reward within the prediction interval, a neural network policy based on model predictive control is learned.

[0036] The control module is configured to fix the parameters of the actor network model and deploy the actor network in the controller of the PEMFC jet gas supply system to achieve real-time optimal control of the PEMFC jet gas supply system.

[0037] A third aspect of the present invention provides a computer-readable storage medium.

[0038] A computer-readable storage medium having a computer program stored thereon, which, when executed by a processor, implements the steps of the model-based deep reinforcement learning-based PEMFC injection gas supply system control method described above.

[0039] A fourth aspect of the present invention provides a computer device.

[0040] A computer device includes a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the program to implement the steps in the model-based deep reinforcement learning-based PEMFC injection gas supply system control method described above.

[0041] Compared with the prior art, the beneficial effects of the present invention are:

[0042] 1. To address the problem that model-free deep reinforcement learning requires extensive interaction with the PEMFC injection gas supply system to obtain a better control strategy, this invention proposes a control method for the PEMFC injection gas supply system based on model deep reinforcement learning. First, a dynamic system model of the PEMFC injection gas supply system is established using a deep neural network. Then, the optimal control strategy is learned by interacting with the learned dynamic system model of the PEMFC injection gas supply system, thereby greatly reducing the interaction with the real PEMFC injection gas supply system.

[0043] 2. To address the problem of compound error caused by learning the optimal policy through random sampling of action sequences within the prediction interval in deep reinforcement learning based on models, this invention utilizes an actor-critic framework to directly learn a neural network policy based on model predictive control, thereby eliminating compound error.

[0044] 3. To address the issue of reduced learning stability caused by using a bootstrapping strategy to update the critic network, this invention utilizes the actor network to interact with the learned dynamic system model of the PEMFC injection system in a rolling prediction time domain. The actual cumulative reward within the prediction interval obtained from the interaction is used as the update target of the critic network, thereby improving the stability of learning.

[0045] Advantages of additional aspects of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. Attached Figure Description

[0046] The accompanying drawings, which form part of this invention, are used to provide a further understanding of the invention. The illustrative embodiments of the invention and their descriptions are used to explain the invention and do not constitute an improper limitation of the invention.

[0047] Figure 1 This is a flowchart of the control method for a PEMFC-based jet gas supply system based on deep reinforcement learning provided in this embodiment of the invention.

[0048] Figure 2 This is a block diagram of the PEMFC injection air supply system provided in an embodiment of the present invention. Detailed Implementation

[0049] The present invention will be further described below with reference to the accompanying drawings and embodiments.

[0050] It should be noted that the following detailed description is illustrative and intended to provide further explanation of the invention. Unless otherwise specified, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention pertains.

[0051] It should be noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to limit the scope of exemplary embodiments according to the invention. As used herein, the singular form is intended to include the plural form as well, unless the context clearly indicates otherwise. Furthermore, it should be understood that when the terms "comprising" and / or "including" are used in this specification, they indicate the presence of features, steps, operations, devices, components, and / or combinations thereof.

[0052] This invention proposes a control method for PEMFC injection gas supply system based on deep reinforcement learning. First, a dynamic system model of the PEMFC injection gas supply system is established using a deep neural network. Then, the optimal control strategy is learned by interacting with the obtained dynamic system model based on a reinforcement learning mechanism, thereby greatly reducing the interaction with the real PEMFC injection gas supply system.

[0053] To address the problem of compound errors arising from learning the optimal policy by randomly sampling action sequences within the prediction interval using deep reinforcement learning based on models, this invention utilizes an actor-critic framework to directly learn a neural network policy based on model predictive control, thereby eliminating compound errors.

[0054] To address the issue of reduced learning stability caused by using a bootstrapping strategy to update the commentator network, this invention integrates model predictive control. It utilizes an actor network that interacts with the learned dynamic system model of the PEMFC jet gas supply system through a rolling prediction time domain. The actual cumulative reward within the prediction interval obtained from this interaction is used as the update target for the commentator network, instead of employing a bootstrapping strategy to construct the update target. This improves learning stability and increases the convergence speed of reinforcement learning. After training based on model reinforcement learning, simply fixing the actor network model parameters and deploying the actor network in the controller of the PEMFC jet gas supply system enables real-time optimal control of the PEMFC jet gas supply system.

[0055] In summary, to address the difficulty in establishing an accurate control model for PEMFC jet gas supply systems, this invention proposes a control method for PEMFC jet gas supply systems based on deep reinforcement learning. First, a dynamic system model of the PEMFC jet gas supply system is established using a deep neural network. Second, an actor-critic framework is used to interact with the learned dynamic system model of the PEMFC jet gas supply system in a rolling prediction time domain, maximizing the cumulative reward within the prediction interval. This learns a neural network strategy based on model predictive control. Finally, by fixing the parameters of the actor network model and deploying the actor network in the controller of the PEMFC jet gas supply system, real-time optimal control of the PEMFC jet gas supply system can be achieved.

[0056] Example 1

[0057] Reference Figure 1 and Figure 2 This embodiment provides a control method for a PEMFC injection gas supply system based on model deep reinforcement learning, including the following steps:

[0058] Step 1: Construct a Markov Decision Process (MDP);

[0059] First, the control problem of the PEMFC injection air supply system is described as a Markov decision process, defining the state, action, and reward as follows:

[0060] State: The state is defined as the control objective of the PEMFC injection gas supply system and the system state variables related to the control objective. The state s observed by the agent at time t is:

[0061]

[0062] in, For excess oxygen ratio, For the excess hydrogen ratio, ΔP t The pressure difference between the anode and cathode films. For cathode pressure, P is the mass flow rate at the injector outlet. t sm,an P is the anode inlet pressure. t fc This refers to the system output power.

[0063] Action: The control objective of the PEMFC injection gas supply system is to ensure that the excess oxygen ratio, excess hydrogen ratio, and anode-cathode membrane pressure difference track the optimal setpoints. This control is achieved by changing three variables: the air compressor voltage, the hydrogen pressure regulating valve setpoint, and the hydrogen return regulating valve setpoint. The action of the agent at time t is defined as follows:

[0064] a t =[AP t HPS t HRS t (2)

[0065] Among them, AP t For air compressor voltage, HPS t HRS is the setpoint for the hydrogen pressure regulating valve. t This is the set value for the hydrogen return regulating valve.

[0066] Furthermore, considering hardware security, the amplitude and maximum absolute change of the motion need to be truncated. The truncating formula is as follows:

[0067] a t+1 =clip(a t +Δ×z t+1 ,a min ,a max (3)

[0068] Where Δ is the maximum allowable change in motion vector, z t+1 Let a be the normalized action vector. max and a min These are the upper limit vector and lower limit vector of the action, respectively.

[0069] Reward: The control objective of the PEMFC jet gas supply system is to achieve optimal setpoints for the excess oxygen ratio, excess hydrogen ratio, and anode-cathode membrane pressure difference, i.e., to achieve tracking control of these three variables. To achieve the control objective, the environment needs to provide a higher reward for actions that produce smaller tracking errors, and conversely, to penalize actions that produce larger tracking errors, i.e., to provide a lower reward. Therefore, the reward obtained by the agent at time t is defined as:

[0070]

[0071] in P t target λ1, λ2, and λ3 are the target setpoints for the excess oxygen ratio, excess hydrogen ratio, and anode-cathode membrane pressure difference of the PEMFC injection gas supply system at time t, respectively, and are the reward coefficients of the environment for the three tracking errors.

[0072] Step 2: Establish a dynamic system model of the PEMFC injection gas supply system based on deep neural networks.

[0073] In this embodiment, the dynamic system model of the PEMFC injection gas supply system based on a deep neural network is denoted as . The input to the deep neural network is the state s of the PEMFC injection system. t and action a t The output is the state change quantity of the PEMFC injection air supply system.

[0074] In this embodiment, the deep neural network has one input layer, 12 hidden layers and one output layer. The layers are fully connected. The input layer has 10 neurons, the output layer has 7 neurons, and each hidden layer has 64 neurons.

[0075] Based on the current state and actions of the PEMFC injection air supply system, a deep neural network is used to predict the state of the dynamic system at the next moment:

[0076]

[0077] in, It is a deep neural network model, that is, a dynamic system model of the PEMFC injection gas supply system obtained through learning, where φ is the parameter of the deep neural network model.

[0078] The specific process of learning the dynamic system model of the PEMFC injection gas supply system based on deep neural network is as follows:

[0079] Step 201: Collect training data. Using the default controller (a PID controller), interact with the PEMFC injection system to obtain a trajectory τ = (s0, a0, s1, ..., s...). M-2 ,a M-2 ,s M-1 The use of a PID controller interacting with the PEMFC gas supply system ensures the safe operation of the PEMFC during data collection.

[0080] Step 202: Process the data. First, divide the trajectory {τ} into M tuples (s...t ,a t ,s t+1 ) and store it in the model training experience replay pool R m Secondly, in order to eliminate the difference in magnitude between data, equation (6) is used to adjust the model training experience replay pool R. m The data in the data are normalized.

[0081]

[0082] Step 203: Train the deep neural network model. Update the deep neural network by minimizing the mean squared error loss function using gradient descent. Model parameters. Equation (7) is for deep neural networks. The mean squared error loss function is given by equation (8), which represents the parameter update process of the deep neural network model.

[0083]

[0084]

[0085] Where, x norm ∈[0,1] represents the normalized data, and x represents the state or action of the PEMFC injection air supply system. Let σ(x) be the mean and standard deviation of the states or actions of all PEMFC injection systems stored in the experience replay pool, respectively, where t∈[0,M-2], and M is the model training experience replay pool R. m The capacity, η(φ), is the capacity of the deep neural network. The mean squared error loss function, η φ For deep neural networks learning rate, To calculate the gradient, φ i For deep neural networks Model parameters.

[0086] Step 3: Model-based deep reinforcement learning for fusing model predictive control.

[0087] By integrating the concept of model predictive control, and using the actor-critic framework to interact with the learned dynamic system model of the PEMFC injection gas supply system, a neural network policy based on model predictive control is learned by maximizing the cumulative reward within the prediction interval.

[0088] This invention utilizes a learning deterministic strategy The actor network interacts with the learned dynamic system model of the PEMFC injection air supply system in a rolling prediction time domain. The cumulative reward within the prediction interval obtained from the interaction is used as the update target of the critic network, instead of using the reward estimated by the critic network as the update target, thereby avoiding the bootstrapping strategy and improving the learning stability.

[0089] The input to the actor network is the state of the PEMFC injection air supply system, and the output is the optimal control strategy corresponding to the current state of the PEMFC injection air supply system.

[0090] The commentator network takes the state of the PEMFC injection system and the corresponding control action as input, and outputs the value corresponding to the state and control action.

[0091] In this embodiment, the actor network has one input layer, two hidden layers, and one output layer, all fully connected. The input layer has 7 neurons, the output layer has 3 neurons, and each hidden layer has 64 neurons. The critic network has one input layer, three hidden layers, and one output layer, all fully connected. The input layer has 10 neurons, the output layer has 1 neuron, and each hidden layer has 64 neurons.

[0092] In step 3, the specific learning process of the model-based deep reinforcement learning for the fusion model prediction control includes:

[0093] Step 301: Initialization. Randomly initialize the actor network parameters and critic network parameters, setting n=0.

[0094] Step 302: Collect training data for model-based predictive control evaluation.

[0095] From the model training experience replay pool R m Randomly select a set of data (st, at, st) +1 Initialize state s n,1 Employing the concept of model predictive control, within a prediction interval of length H, the system interacts with a dynamic system model of the PEMFC injection gas supply system learned from a deep neural network in a rolling prediction time domain. The interaction between the two yields a trajectory of length H within the prediction interval. The reward corresponding to the state transition within the cumulative trajectory τ is the cumulative reward R within the prediction interval. n,1 The cumulative rewards are all actual rewards observed from the PEMFC injection system and do not include rewards estimated using the critic network, thus avoiding bootstrapping strategies and improving learning stability. The resulting tuple data (s) n,1 ,a n,1 ,R n,1Stored in the policy learning experience replay pool R ac Then let n = n + 1, and repeat the above process until the strategy learning experience replay pool R is reached. ac The prediction interval contains N data points. The cumulative reward R within the prediction interval is... n,1 The calculation formula is (9).

[0096]

[0097] Step 303: Train the actor-critic deep neural network.

[0098] From the strategy learning experience replay pool R ac We randomly sample K tuples of data and use gradient descent to minimize the commenter network loss function to update the commenter network. Model parameters are updated using gradient ascent and a deterministic policy gradient update for the actor network. Model parameters. The network model parameters of the actor and critic are continuously updated by interacting with the dynamic system model of the PEMFC injection gas supply system learned by deep neural network, and finally the optimal control strategy is obtained. Equation (10) is the loss function of the critic network, Equation (11) is the update process of the critic network model parameters, Equation (12) is the gradient of the deterministic policy of the actor network, and Equation (13) is the update process of the actor network model parameters.

[0099]

[0100]

[0101]

[0102]

[0103] Where H is the model prediction interval, and N is the policy learning experience replay pool R. ac The capacity, L(θ) Q ) is the loss function of the critic network. To calculate the gradient, θ Q and θ π η represents the parameters of the critic network and actor network models, respectively. Q and η π denoted by , where are the learning rates of the critic network and the actor network, respectively, and J is the loss function of the actor network.

[0104] Step 4: Control of PEMFC injection gas supply system based on deep reinforcement learning of the model.

[0105] After the deep reinforcement learning training of the model is completed, the parameters of the actor network model are fixed and the actor network is deployed in the controller of the PEMFC jet gas supply system to achieve real-time optimal control of the PEMFC jet gas supply system.

[0106] Example 2

[0107] This embodiment provides a control system for a PEMFC injection gas supply system based on model deep reinforcement learning, including:

[0108] The dynamic system model building module is configured to describe the control problem of the PEMFC injection gas supply system as a Markov decision process and to build a dynamic system model of the PEMFC injection gas supply system using a deep neural network.

[0109] The deep reinforcement learning module is configured to interact with the learned dynamic system model of the PEMFC injection gas supply system based on reinforcement learning mechanisms to obtain the optimal control strategy, specifically including:

[0110] The dynamic system model of the PEMFC injection gas supply system interacts with the actor-critic framework network in a rolling prediction time domain. By maximizing the cumulative reward within the prediction interval, a neural network policy based on model predictive control is learned.

[0111] The control module is configured to fix the parameters of the actor network model and deploy the actor network in the controller of the PEMFC jet gas supply system to achieve real-time optimal control of the PEMFC jet gas supply system.

[0112] Example 3

[0113] This embodiment provides a computer-readable storage medium storing a computer program that, when executed by a processor, implements the steps in the model-based deep reinforcement learning-based PEMFC injection gas supply system control method described above.

[0114] Example 4

[0115] This embodiment provides a computer device, including a memory, a processor, and a computer program stored in the memory and executable on the processor. When the processor executes the program, it implements the steps in the model-based deep reinforcement learning-based PEMFC injection gas supply system control method described above.

[0116] Those skilled in the art will understand that embodiments of the present invention can be provided as methods, systems, or computer program products. Therefore, the present invention can take the form of hardware embodiments, software embodiments, or embodiments combining software and hardware aspects. Furthermore, the present invention can take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage and optical storage) containing computer-usable program code.

[0117] This invention is described with reference to flowchart illustrations and / or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and / or block diagrams, and combinations of blocks in the flowchart illustrations and / or block diagrams, can be implemented by computer program instructions. These computer program instructions can be provided to a processor of a general-purpose computer, special-purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, generate instructions for implementing the flowchart illustrations and / or block diagrams. Figure 1 One or more processes and / or boxes Figure 1 A device that provides the functions specified in one or more boxes.

[0118] These computer program instructions may also be stored in a computer-readable storage medium that can direct a computer or other programmable data processing device to function in a particular manner, such that the instructions stored in the computer-readable storage medium produce an article of manufacture including instruction means, which are implemented in a process Figure 1 One or more processes and / or boxes Figure 1 The function specified in one or more boxes.

[0119] These computer program instructions may also be loaded onto a computer or other programmable data processing equipment to cause a series of operational steps to be performed on the computer or other programmable equipment to produce a computer-implemented process, thereby providing instructions that execute on the computer or other programmable equipment for implementing the process. Figure 1 One or more processes and / or boxes Figure 1 The steps of the function specified in one or more boxes.

[0120] Those skilled in the art will understand that all or part of the processes in the above embodiments can be implemented by a computer program instructing related hardware. The program can be stored in a computer-readable storage medium, and when executed, it can include the processes of the embodiments of the above methods. The storage medium can be a magnetic disk, optical disk, read-only memory (ROM), or random access memory (RAM), etc.

[0121] The above description is merely a preferred embodiment of the present invention and is not intended to limit the invention. Various modifications and variations can be made to the present invention by those skilled in the art. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of the present invention should be included within the scope of protection of the present invention.

Claims

1. A control method for a PEMFC injection gas supply system based on deep reinforcement learning, characterized in that, Includes the following steps: The control problem of the PEMFC injection gas supply system is described as a Markov decision process, and a dynamic system model of the PEMFC injection gas supply system is established using a deep neural network. The optimal control strategy is obtained through interaction between reinforcement learning and the learned dynamic system model of the PEMFC injection gas supply system, specifically including: The dynamic system model of the PEMFC injection gas supply system, which is learned by the actor-critic network, interacts with the network in the rolling prediction time domain. By maximizing the cumulative reward within the prediction interval, a neural network strategy based on model predictive control is learned. The parameters of the actor network model are fixed and the actor network is deployed in the controller of the PEMFC jet gas supply system to achieve real-time optimal control of the PEMFC jet gas supply system.

2. The control method for a PEMFC-based jet gas supply system based on deep reinforcement learning as described in claim 1, characterized in that, The description of the PEMFC injection gas supply system control problem as a Markov decision process includes: Define states, actions, and rewards; The state is defined as the control objective of the PEMFC injection gas supply system and the system state variables related to the control objective. The state s observed by the agent at time t is: , The action of the agent at time t is defined as: , The reward obtained by the agent at time t is defined as: , in, For excess oxygen ratio, The hydrogen excess ratio, The pressure difference between the anode and cathode films. For cathode pressure, This refers to the mass flow rate at the injector outlet. This is the anode inlet pressure. This refers to the system output power. This refers to the air compressor voltage. Set the value for the hydrogen pressure regulating valve. Set the value for the hydrogen return regulating valve; , , For PEMFC injection air supply system The target setpoints for the oxygen excess ratio, hydrogen excess ratio, and anode-cathode membrane pressure difference at each time point. represents the reward coefficient of the environment for the three tracking errors.

3. The control method for a PEMFC-based jet gas supply system based on deep reinforcement learning as described in claim 1, characterized in that, The control objective of the PEMFC jet gas supply system is to ensure that the excess oxygen ratio, excess hydrogen ratio, and anode-cathode membrane pressure difference track the optimal setpoints. This is achieved by changing three variables: the air compressor voltage, the hydrogen pressure regulating valve setpoint, and the hydrogen return regulating valve setpoint.

4. The control method for a PEMFC-based jet gas supply system based on deep reinforcement learning as described in claim 1, characterized in that, The method of establishing a dynamic system model of the PEMFC injection gas supply system using a deep neural network includes: A trajectory of length M is obtained by interacting with the PEMFC injection air supply system using the controller; The trajectory is divided into M tuples and stored in the model training experience replay pool. The data in the model training experience replay pool is then normalized. The deep neural network model is trained based on normalized data. Based on the current state and actions of the PEMFC injection gas supply system, the deep neural network is used to predict the state change of the PEMFC injection gas supply system at the next moment.

5. The control method for a PEMFC-based injection gas supply system based on deep reinforcement learning as described in claim 1, characterized in that, The actor network takes the state of the PEMFC jet air supply system as input and outputs the optimal control strategy corresponding to the current state of the PEMFC jet air supply system. The critic network takes the state of the PEMFC jet air supply system and the control action corresponding to the current state as input and outputs the value corresponding to the state and control action.

6. The control method for a PEMFC-based jet gas supply system based on deep reinforcement learning as described in claim 1, characterized in that, The dynamic system model of the PEMFC injection air supply system, based on the actor-critic network and learned, interacts in a rolling prediction time domain manner, including: The actor network, which learns a deterministic policy, interacts with the dynamic system model of the PEMFC injection gas supply system in a rolling prediction time domain. The interaction between the two results in a trajectory of length denoted by the prediction interval. The cumulative reward corresponding to the state transition within the trajectory is the cumulative reward within the prediction interval. The actual cumulative reward within the prediction interval obtained from the interaction is used as the update target of the critic network.

7. The control method for a PEMFC-based jet gas supply system based on deep reinforcement learning as described in claim 1, characterized in that, The training process for the actor-critic network includes: K tuples of data are randomly sampled from the policy learning experience replay pool. The commentator network model parameters are updated by minimizing the commentator network loss function using gradient descent. The actor network model parameters are updated by using gradient ascent and deterministic policy gradient. The actor-commentator network interacts with the dynamic system model of the PEMFC injection gas supply system learned from deep neural networks to continuously update the network model parameters of the actor and commentator, thereby obtaining the optimal control policy.

8. A control system for a PEMFC injection gas supply system based on deep reinforcement learning, characterized in that, include: The dynamic system model building module is configured to describe the control problem of the PEMFC injection gas supply system as a Markov decision process and to build a dynamic system model of the PEMFC injection gas supply system using a deep neural network. Based on the model's deep reinforcement learning module, the optimal control strategy is obtained through interaction between the reinforcement learning mechanism and the learned dynamic system model of the PEMFC injection gas supply system. Specifically, this includes: The dynamic system model of the PEMFC injection gas supply system, which is learned by the actor-critic network, interacts with the network in the rolling prediction time domain. By maximizing the cumulative reward within the prediction interval, a neural network strategy based on model predictive control is learned. The control module is configured to fix the parameters of the actor network model and deploy the actor network in the controller of the PEMFC jet gas supply system to achieve real-time optimal control of the PEMFC jet gas supply system.

9. A computer-readable storage medium having a computer program stored thereon, characterized in that, When the program is executed by the processor, it implements the steps in the PEMFC injection air supply system control method based on model deep reinforcement learning as described in any one of claims 1-7.

10. A computer device, comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, characterized in that, When the processor executes the program, it implements the steps in the PEMFC injection gas supply system control method based on model deep reinforcement learning as described in any one of claims 1-7.