Robot imitation learning method based on virtual scene training

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A learning method and virtual scene technology, applied in the field of imitation learning and artificial intelligence, can solve the problems of reducing the amount of calculation, the amount of calculation is large, and the learner cannot be rewarded frequently, so as to reduce the training cost and reduce the risk.

Pending Publication Date: 2020-04-10

SOUTH CHINA UNIV OF TECH

View PDF0 Cites 22 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0006] The purpose of the present invention is to solve complex skill learning tasks that require multi-step decision-making. In these tasks, the learner cannot be rewarded frequently. Common reinforcement learning algorithms need a suitable reward function to solve the optimal action strategy. In many cases It is extremely complicated and unrealistic to design a sufficiently comprehensive and excellent reward function

Therefore, the present invention uses imitation learning and inverse reinforcement learning to solve such problems. In order to avoid the common behavior cloning method and inverse reinforcement learning algorithm with large amount of calculation, difficult network design, and complicated intermediate calculation process, the present invention introduces The idea of GAN directly learns the distribution of the reward function, bypassing many intermediate steps in reverse reinforcement learning, especially repeated reinforcement learning calculations, thus reducing the amount of calculation.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment

[0064] For ease of understanding, in this embodiment, the CartPole balancing car game is taken as an example.

[0065] A robot imitation learning method based on virtual scene training, such as figure 1 shown, including the following steps:

[0066] S1. Designing a robot model and a virtual interactive environment according to specific tasks; including the following steps:

[0067] S1.1. Design the robot model and virtual environment according to specific tasks, and use the unity3D engine to design the simulation environment. The simulation environment is as close to the real environment as possible, including the car, the straight rod on the car, and the moving slide rail; its purpose is to provide a The visual graphical interface helps to train the model faster and migrate later, reduces the dangers that may be encountered in direct training in the real environment, and reduces training costs;

[0068] S1.2. Combining domain randomization method, randomize the environmenta...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a robot imitation learning method based on virtual scene training. The method comprises the following steps: designing a robot model and a virtual interaction environment according to a specific task; collecting and arranging an expert data set; determining a state value space S and an action value space A according to the specific task, and determining structures of the network of a strategy generator and the network of a discriminator according to the state value space S and the action value space A; sampling data from the strategy generator, designing a parameter updating strategy, and alternately training the strategy generator and the discriminator by combining the expert data set and adopting an adversarial training method until the discriminator converges toa saddle point; and testing a network model composed of the strategy generator and the discriminator obtained by training, and taking a real environment state as input of the strategy generator so asto obtain action output. According to the method, a value return function is judged and learned; a large number of complex intermediate steps of inverse reinforcement learning with high calculation amount are bypassed; and the learning process is simpler and more efficient.

Description

technical field [0001] The invention belongs to the technical field of imitation learning and artificial intelligence, in particular to a robot imitation learning method based on virtual scene training. Background technique [0002] In traditional reinforcement learning tasks, the optimal policy is usually learned by calculating cumulative rewards. This method is simple and straightforward, and has better performance when more training data is available. However, in the multi-step decision task (sequential decision), the learner cannot be rewarded frequently, and there is a huge search space based on the cumulative reward and learning method. At the same time, reinforcement learning requires a suitable reward function to solve the optimal action strategy, but in many cases it is not easy to design a sufficiently comprehensive and excellent reward function, especially in some complex application scenarios, such as automatic driving collision It is difficult to have a reasona...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G06F30/20G06N3/00G06N3/04G06N3/08

CPCG06N3/008G06N3/08G06N3/045

Inventor 杜广龙周万义

Owner SOUTH CHINA UNIV OF TECH

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Robot imitation learning method based on virtual scene training

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology