Multi-objective robot control method based on dynamic model and post-event experience replay

A dynamic model and control method technology, applied in the direction of a specific mathematical model, calculation model, program-controlled manipulator, etc., can solve the problems of accelerated robot task training, large offline deviation, poor generalization, etc., to improve data utilization efficiency. Effect

Active Publication Date: 2022-01-25
SHENZHEN GRADUATE SCHOOL TSINGHUA UNIV
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The problem with the dynamic model-based method is that when the dynamic model does not fit the real environment enough, there will be model errors, and as the number of interaction steps accumulates, it may also bring harm to the training of the agent.
[0012] (1) The existing technology usually needs to train a policy network to complete a specific task, and its generalization is not strong. However, our multi-objective reinforcement learning technology can train a large number of targets at the same time, and one model can complete all tasks in a certain target space. ;
[0013] (2) Existing technologies do not utilize post-event experience replay information, and often cannot learn from failure data, while our technology uses post-event experience replay to improve the utilization of failure data and accelerate the training of robot tasks;
[0014] (3) The existing technology does not use the value function expansion method based on the model, and often uses a single-step temporal difference method to learn, while our technology can accelerate the learning of the value function and the training of the agent;
[0015] (4) The multi-step value function estimation method in the prior art has a large offline deviation in the case of an offline strategy, and the method of this patent does not have an offline deviation due to the use of a model-based value function expansion, but there is a certain model error

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Multi-objective robot control method based on dynamic model and post-event experience replay
  • Multi-objective robot control method based on dynamic model and post-event experience replay
  • Multi-objective robot control method based on dynamic model and post-event experience replay

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0043] Such as figure 1 As shown, a multi-objective robot control method based on dynamic model and post-event experience playback, the specific method is as follows:

[0044] (1) Setting multi-objective reinforcement learning parameters;

[0045] (2) Under the parameter setting of multi-objective reinforcement learning, the loss function L of the deterministic policy gradient algorithm Actor and Critic is obtained actor and L critic ;

[0046] (3) Establish a dynamic model, based on the dynamic model and single-step value function estimation and multi-step value function expansion to accelerate multi-objective reinforcement learning training;

[0047] (4) Using post-event experience replay technology, in multi-objective reinforcement learning, replace the failed-experienced goals with the actually completed goals.

[0048] The details of the multi-objective reinforcement learning parameters are as follows:

[0049] Reinforcement learning is expressed as a Markov decision p...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a multi-objective robot control method based on dynamic models and post-event experience playback. The invention can learn the strategy of completing the entire target space, and has more advantages than the existing methods in generalization; the invention adopts the model-based The value function estimation and post-event experience playback of the method improve the data utilization efficiency in multi-objective reinforcement learning; compared with other offline value function estimation methods, there is no offline deviation. Although there are model errors, the present invention uses single-step and multi-step value function estimation. Weighted summation balances model error and learning speed, and has better performance.

Description

technical field [0001] The invention relates to the technical field of robot control, in particular to a multi-objective robot control method based on dynamic models and post-event experience playback. Background technique [0002] Reinforcement learning: A class of methods in machine learning, mainly composed of two parts: agent and environment. According to the current state, the agent executes actions according to a certain strategy (policy) and acts on the environment. After the environment receives the action, it will return a new state and a reward (reward). [0003] Deep reinforcement learning: The combination of deep neural network and reinforcement learning enables reinforcement learning to effectively solve complex problems in large state spaces and even continuous state spaces. Robot control is a continuous state space control problem. [0004] Multi-objective reinforcement learning: The usual reinforcement learning is to accomplish a specific goal, but there are...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): B25J9/16B25J13/00G06F30/27G06N7/00G06F113/28
CPCB25J9/1602B25J13/00G06F30/27G06F2113/28G06N7/01
Inventor 李秀杨瑞吕加飞杨宇
Owner SHENZHEN GRADUATE SCHOOL TSINGHUA UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products