An Efficient Value Function Iterative Reinforcement Learning Method for Shared Recurrent Neural Networks
Patent Information
- Authority / Receiving Office
- CN ยท China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- TSINGHUA UNIV
- Publication Date
- 2021-07-30
Smart Images

Figure 1 
Figure 2 
Figure 3
Abstract
Description
technical field
[0001] The invention relates to the technical field of reinforcement learning, in particular to a high-efficiency value function iterative reinforcement learning method of a shared cyclic neural network. Background technique
[0002] Reinforcement learning is based on the theoretical framework of Markov decision process, which models the sequential decision-making task as a trial-and-error learning problem of the interaction between the agent and the system environment. Two types of model-free reinforcement learning algorithms, value function iteration methods and policy optimization methods, are widely used to solve various decision-making problems. Compared with the strategy optimization method, the value function iteration method can use the data generated by the historical strategy to update the algorithm, so it requires fewer interactions with the environment, has a higher utilization rate of samples, and is more capable of solving real environment decis...