An Efficient Value Function Iterative Reinforcement Learning Method for Shared Recurrent Neural Networks

A recurrent neural network and reinforcement learning technology, applied in the field of efficient value function iterative reinforcement learning, can solve the problems of long interaction time and high sampling cost
CN111582441BActive Publication Date: 2021-07-30TSINGHUA UNIV

Patent Information

Authority / Receiving Office
CN ยท China
Patent Type
Patents(China)
Current Assignee / Owner
TSINGHUA UNIV
Publication Date
2021-07-30

Smart Images

  • Figure 1
    Figure 1
  • Figure 2
    Figure 2
  • Figure 3
    Figure 3
Patent Text Reader

Abstract

The invention discloses a high-efficiency value function iterative reinforcement learning method for a shared cyclic neural network. The method includes: obtaining sample data through the interaction between an agent and the environment, and adding the sample data to a sample pool; randomly selecting the sample data from the sample pool The sample data is used as the training sample data; the output of the Critic network is normalized according to the training sample data, and its MLP network and shared LSTM parameters are updated; The parameters of the MLP part of the Critic network are updated; the third Critic network and the fourth Critic network in the Critic network, and the second Actor network parameters of the Actor network are updated. The method combines recurrent neural network with value function iteration to improve algorithm training efficiency and shorten algorithm training time.
Need to check novelty before this filing date? Find Prior Art

Description

technical field

[0001] The invention relates to the technical field of reinforcement learning, in particular to a high-efficiency value function iterative reinforcement learning method of a shared cyclic neural network. Background technique

[0002] Reinforcement learning is based on the theoretical framework of Markov decision process, which models the sequential decision-making task as a trial-and-error learning problem of the interaction between the agent and the system environment. Two types of model-free reinforcement learning algorithms, value function iteration methods and policy optimization methods, are widely used to solve various decision-making problems. Compared with the strategy optimization method, the value function iteration method can use the data generated by the historical strategy to update the algorithm, so it requires fewer interactions with the environment, has a higher utilization rate of samples, and is more capable of solving real environment decis...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More