Multi-objective robot control method based on dynamic model and post-event experience replay

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A dynamic model and control method technology, applied in the direction of a specific mathematical model, calculation model, program-controlled manipulator, etc., can solve the problems of accelerated robot task training, large offline deviation, poor generalization, etc., to improve data utilization efficiency. Effect

Active Publication Date: 2022-01-25

SHENZHEN GRADUATE SCHOOL TSINGHUA UNIV

View PDF0 Cites 0 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

The problem with the dynamic model-based method is that when the dynamic model does not fit the real environment enough, there will be model errors, and as the number of interaction steps accumulates, it may also bring harm to the training of the agent.

[0012] (1) The existing technology usually needs to train a policy network to complete a specific task, and its generalization is not strong. However, our multi-objective reinforcement learning technology can train a large number of targets at the same time, and one model can complete all tasks in a certain target space. ;

[0013] (2) Existing technologies do not utilize post-event experience replay information, and often cannot learn from failure data, while our technology uses post-event experience replay to improve the utilization of failure data and accelerate the training of robot tasks;

[0014] (3) The existing technology does not use the value function expansion method based on the model, and often uses a single-step temporal difference method to learn, while our technology can accelerate the learning of the value function and the training of the agent;

[0015] (4) The multi-step value function estimation method in the prior art has a large offline deviation in the case of an offline strategy, and the method of this patent does not have an offline deviation due to the use of a model-based value function expansion, but there is a certain model error

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0043] Such as figure 1 As shown, a multi-objective robot control method based on dynamic model and post-event experience playback, the specific method is as follows:

[0044] (1) Setting multi-objective reinforcement learning parameters;

[0045] (2) Under the parameter setting of multi-objective reinforcement learning, the loss function L of the deterministic policy gradient algorithm Actor and Critic is obtained actor and L critic ;

[0046] (3) Establish a dynamic model, based on the dynamic model and single-step value function estimation and multi-step value function expansion to accelerate multi-objective reinforcement learning training;

[0047] (4) Using post-event experience replay technology, in multi-objective reinforcement learning, replace the failed-experienced goals with the actually completed goals.

[0048] The details of the multi-objective reinforcement learning parameters are as follows:

[0049] Reinforcement learning is expressed as a Markov decision p...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a multi-objective robot control method based on dynamic models and post-event experience playback. The invention can learn the strategy of completing the entire target space, and has more advantages than the existing methods in generalization; the invention adopts the model-based The value function estimation and post-event experience playback of the method improve the data utilization efficiency in multi-objective reinforcement learning; compared with other offline value function estimation methods, there is no offline deviation. Although there are model errors, the present invention uses single-step and multi-step value function estimation. Weighted summation balances model error and learning speed, and has better performance.

Description

technical field [0001] The invention relates to the technical field of robot control, in particular to a multi-objective robot control method based on dynamic models and post-event experience playback. Background technique [0002] Reinforcement learning: A class of methods in machine learning, mainly composed of two parts: agent and environment. According to the current state, the agent executes actions according to a certain strategy (policy) and acts on the environment. After the environment receives the action, it will return a new state and a reward (reward). [0003] Deep reinforcement learning: The combination of deep neural network and reinforcement learning enables reinforcement learning to effectively solve complex problems in large state spaces and even continuous state spaces. Robot control is a continuous state space control problem. [0004] Multi-objective reinforcement learning: The usual reinforcement learning is to accomplish a specific goal, but there are...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Patents(China)

IPC IPC(8): B25J9/16B25J13/00G06F30/27G06N7/00G06F113/28

CPCB25J9/1602B25J13/00G06F30/27G06F2113/28G06N7/01

Inventor 李秀杨瑞吕加飞杨宇

Owner SHENZHEN GRADUATE SCHOOL TSINGHUA UNIV

Multi-objective robot control method based on dynamic model and post-event experience replay

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology