Robot reinforcement learning method based on adaptive model

An adaptive model and reinforcement learning technology, applied in the field of artificial intelligence, can solve problems such as data distribution deviation, achieve high accuracy, excellent progressive performance, and small feature distribution distance

Active Publication Date: 2021-02-02
SHANGHAI JIAO TONG UNIV
View PDF6 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

In order to solve the problem of data distribution shift in domain adaptation, an effective method is to learn features with invariant properties

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Robot reinforcement learning method based on adaptive model

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0036] The following describes the preferred embodiments of the present application with reference to the accompanying drawings to make the technical content clearer and easier to understand. The present application can be embodied in many different forms of embodiments, and the protection scope of the present application is not limited to the embodiments mentioned herein.

[0037]The idea, specific structure and technical effects of the present invention will be further described below to fully understand the purpose, features and effects of the present invention, but the protection of the present invention is not limited thereto.

[0038] For an environment model built by a neural network, we can regard its first few layers as feature extractors, and the latter few layers as decoders. Given a data input (s, a), the state s is the position and velocity of each part of the robot, and the action a is the force applied to each part. First, the hidden layer feature h is obtained ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a robot reinforcement learning method based on an adaptive model. According to the robot reinforcement learning method, the step of model self-adaption is added while an environment model is learned normally, and therefore the effect of improving the accuracy of virtual data can be achieved. Specifically, when the model is of a neural network structure, the model adaptivelyimproves the accuracy of the model on virtual data by reducing the feature distribution of real data and virtual data on a network hidden layer. Experiments prove that in the fields of robot control and the like, compared with a previous model-based reinforcement learning method, the method has higher sampling efficiency and final performance.

Description

technical field [0001] The invention relates to the field of artificial intelligence, in particular to a robot reinforcement learning method. Background technique [0002] In the field of robot control, we define the state as the position and speed of each part of the robot, and the action as the force applied to each part. The goal is to make the robot move as far as possible without falling over, and at the same time make the robot move as far as possible. The control force should be as small as possible, and the reinforcement learning method can be used to complete the above tasks. [0003] Reinforcement learning can be divided into model-free reinforcement learning and model-based reinforcement learning according to whether the environment is modeled. Among them, model-free reinforcement learning uses (state, action, next action, reward) tuple data sampled in the real environment to directly train a policy or value function, while model-based reinforcement learning meth...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): B25J9/16G06K9/62G06N3/04
CPCB25J9/161B25J9/1664G06N3/045G06F18/214Y02T10/40
Inventor 张伟楠沈键赵晗
Owner SHANGHAI JIAO TONG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products