An Efficient Value Function Iterative Reinforcement Learning Method for Shared Recurrent Neural Networks

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A recurrent neural network and reinforcement learning technology, applied in the field of efficient value function iterative reinforcement learning, can solve the problems of long interaction time and high sampling cost

Active Publication Date: 2021-07-30

TSINGHUA UNIV

View PDF8 Cites 0 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

Although the policy search method using the recurrent neural network has the ability to solve some observable problems in the environment, due to the fact that this type of method faces the problems of long time consumption and high sampling cost for the interaction between the agent and the environment in actual tasks

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0058] Embodiments of the present invention are described in detail below, examples of which are shown in the drawings, wherein the same or similar reference numerals designate the same or similar elements or elements having the same or similar functions throughout. The embodiments described below by referring to the figures are exemplary and are intended to explain the present invention and should not be construed as limiting the present invention.

[0059] The following describes an iterative reinforcement learning method for high-efficiency value function sharing a cyclic neural network according to an embodiment of the present invention with reference to the accompanying drawings.

[0060] First of all, the high-efficiency value function iterative reinforcement learning method of the shared cyclic neural network proposed by the present invention includes two modules: a Critic module and an Actor module. In the Critic module, the problem of overestimation of the value funct...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a high-efficiency value function iterative reinforcement learning method for a shared cyclic neural network. The method includes: obtaining sample data through the interaction between an agent and the environment, and adding the sample data to a sample pool; randomly selecting the sample data from the sample pool The sample data is used as the training sample data; the output of the Critic network is normalized according to the training sample data, and its MLP network and shared LSTM parameters are updated; The parameters of the MLP part of the Critic network are updated; the third Critic network and the fourth Critic network in the Critic network, and the second Actor network parameters of the Actor network are updated. The method combines recurrent neural network with value function iteration to improve algorithm training efficiency and shorten algorithm training time.

Description

technical field [0001] The invention relates to the technical field of reinforcement learning, in particular to a high-efficiency value function iterative reinforcement learning method of a shared cyclic neural network. Background technique [0002] Reinforcement learning is based on the theoretical framework of Markov decision process, which models the sequential decision-making task as a trial-and-error learning problem of the interaction between the agent and the system environment. Two types of model-free reinforcement learning algorithms, value function iteration methods and policy optimization methods, are widely used to solve various decision-making problems. Compared with the strategy optimization method, the value function iteration method can use the data generated by the historical strategy to update the algorithm, so it requires fewer interactions with the environment, has a higher utilization rate of samples, and is more capable of solving real environment decis...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Patents(China)

IPC IPC(8): G06N3/04G06N3/08

CPCG06N3/049G06N3/08G06N3/045

Inventor 杨君薛晨芦维宁梁斌赵千川

Owner TSINGHUA UNIV

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

An Efficient Value Function Iterative Reinforcement Learning Method for Shared Recurrent Neural Networks

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology