Reinforcement learning method based on bidirectional model

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A technology of reinforcement learning and forward model, applied in the field of reinforcement learning, to achieve the effect of small cumulative error of the model and excellent asymptotic performance

Pending Publication Date: 2020-11-17

SHANGHAI JIAO TONG UNIV

View PDF1 Cites 4 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0007] But in general, these studies are limited to the forward model, and in order to avoid the impact of too large model cumulative error, a certain compromise needs to be made in terms of the length of the generated trajectory and applicability.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0024] The following describes the preferred embodiments of the present application with reference to the accompanying drawings to make the technical content clearer and easier to understand. The present application can be embodied in many different forms of embodiments, and the protection scope of the present application is not limited to the embodiments mentioned herein.

[0025] The idea, specific structure and technical effects of the present invention will be further described below to fully understand the purpose, features and effects of the present invention, but the protection of the present invention is not limited thereto.

[0026] This embodiment is mainly used to solve the Mojoco robot control problem in the open source library Gym of OpenAI. Specifically, the definition state is the position and velocity of each part of the robot, and the action is the force applied to each part. The goal is to make the robot move as far as possible without falling down, and at th...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

A reinforcement learning method based on a bidirectional model is used for robot control, and is characterized by comprising a forward model, a reverse model, a forward strategy and a reverse strategy, tracks are bidirectionally generated from a certain real state, and iteration is continuously performed in three stages: a data collection stage, a model learning stage and a strategy optimization stage until an algorithm converges. The method has the advantages that compared with a traditional forward model, the bidirectional model is smaller in model accumulative error under the condition thatvirtual tracks with the same length are generated, and in a further simulation control experiment, compared with a previous model-based method, the method is better in sampling efficiency and progressive performance.

Description

technical field [0001] The invention relates to the field of reinforcement learning methods, in particular to research on model cumulative errors in model-based reinforcement learning. Background technique [0002] Reinforcement learning can be divided into model-free reinforcement learning and model-based reinforcement learning according to whether the environment is modeled. Among them, the model-free reinforcement learning directly trains a policy function or value function by sampling the data obtained in the real environment, while the model-based reinforcement learning first learns a model to fit the state change function through the real data obtained by interacting with the environment. The model is then used to generate simulated trajectories to optimize a policy or controller. Although model-free reinforcement learning has achieved very good results on many tasks, the achievement of these results often requires a large amount of data interacting with the environme...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Applications(China)

IPC IPC(8): G06N20/00B25J9/16

CPCG06N20/00B25J9/163

Inventor 张伟楠赖行沈键

Owner SHANGHAI JIAO TONG UNIV

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Reinforcement learning method based on bidirectional model

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology