Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Reinforcement learning method based on bidirectional model

A technology of reinforcement learning and forward model, applied in the field of reinforcement learning, to achieve the effect of small cumulative error of the model and excellent asymptotic performance

Pending Publication Date: 2020-11-17
SHANGHAI JIAO TONG UNIV
View PDF1 Cites 4 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0007] But in general, these studies are limited to the forward model, and in order to avoid the impact of too large model cumulative error, a certain compromise needs to be made in terms of the length of the generated trajectory and applicability.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Reinforcement learning method based on bidirectional model

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0024] The following describes the preferred embodiments of the present application with reference to the accompanying drawings to make the technical content clearer and easier to understand. The present application can be embodied in many different forms of embodiments, and the protection scope of the present application is not limited to the embodiments mentioned herein.

[0025] The idea, specific structure and technical effects of the present invention will be further described below to fully understand the purpose, features and effects of the present invention, but the protection of the present invention is not limited thereto.

[0026] This embodiment is mainly used to solve the Mojoco robot control problem in the open source library Gym of OpenAI. Specifically, the definition state is the position and velocity of each part of the robot, and the action is the force applied to each part. The goal is to make the robot move as far as possible without falling down, and at th...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

A reinforcement learning method based on a bidirectional model is used for robot control, and is characterized by comprising a forward model, a reverse model, a forward strategy and a reverse strategy, tracks are bidirectionally generated from a certain real state, and iteration is continuously performed in three stages: a data collection stage, a model learning stage and a strategy optimization stage until an algorithm converges. The method has the advantages that compared with a traditional forward model, the bidirectional model is smaller in model accumulative error under the condition thatvirtual tracks with the same length are generated, and in a further simulation control experiment, compared with a previous model-based method, the method is better in sampling efficiency and progressive performance.

Description

technical field [0001] The invention relates to the field of reinforcement learning methods, in particular to research on model cumulative errors in model-based reinforcement learning. Background technique [0002] Reinforcement learning can be divided into model-free reinforcement learning and model-based reinforcement learning according to whether the environment is modeled. Among them, the model-free reinforcement learning directly trains a policy function or value function by sampling the data obtained in the real environment, while the model-based reinforcement learning first learns a model to fit the state change function through the real data obtained by interacting with the environment. The model is then used to generate simulated trajectories to optimize a policy or controller. Although model-free reinforcement learning has achieved very good results on many tasks, the achievement of these results often requires a large amount of data interacting with the environme...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06N20/00B25J9/16
CPCG06N20/00B25J9/163
Inventor 张伟楠赖行沈键
Owner SHANGHAI JIAO TONG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products