Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Robot imitation learning method based on virtual scene training

A learning method and virtual scene technology, applied in the field of imitation learning and artificial intelligence, can solve the problems of reducing the amount of calculation, the amount of calculation is large, and the learner cannot be rewarded frequently, so as to reduce the training cost and reduce the risk.

Pending Publication Date: 2020-04-10
SOUTH CHINA UNIV OF TECH
View PDF0 Cites 22 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] The purpose of the present invention is to solve complex skill learning tasks that require multi-step decision-making. In these tasks, the learner cannot be rewarded frequently. Common reinforcement learning algorithms need a suitable reward function to solve the optimal action strategy. In many cases It is extremely complicated and unrealistic to design a sufficiently comprehensive and excellent reward function
Therefore, the present invention uses imitation learning and inverse reinforcement learning to solve such problems. In order to avoid the common behavior cloning method and inverse reinforcement learning algorithm with large amount of calculation, difficult network design, and complicated intermediate calculation process, the present invention introduces The idea of ​​GAN directly learns the distribution of the reward function, bypassing many intermediate steps in reverse reinforcement learning, especially repeated reinforcement learning calculations, thus reducing the amount of calculation.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Robot imitation learning method based on virtual scene training
  • Robot imitation learning method based on virtual scene training
  • Robot imitation learning method based on virtual scene training

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0064] For ease of understanding, in this embodiment, the CartPole balancing car game is taken as an example.

[0065] A robot imitation learning method based on virtual scene training, such as figure 1 shown, including the following steps:

[0066] S1. Designing a robot model and a virtual interactive environment according to specific tasks; including the following steps:

[0067] S1.1. Design the robot model and virtual environment according to specific tasks, and use the unity3D engine to design the simulation environment. The simulation environment is as close to the real environment as possible, including the car, the straight rod on the car, and the moving slide rail; its purpose is to provide a The visual graphical interface helps to train the model faster and migrate later, reduces the dangers that may be encountered in direct training in the real environment, and reduces training costs;

[0068] S1.2. Combining domain randomization method, randomize the environmenta...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a robot imitation learning method based on virtual scene training. The method comprises the following steps: designing a robot model and a virtual interaction environment according to a specific task; collecting and arranging an expert data set; determining a state value space S and an action value space A according to the specific task, and determining structures of the network of a strategy generator and the network of a discriminator according to the state value space S and the action value space A; sampling data from the strategy generator, designing a parameter updating strategy, and alternately training the strategy generator and the discriminator by combining the expert data set and adopting an adversarial training method until the discriminator converges toa saddle point; and testing a network model composed of the strategy generator and the discriminator obtained by training, and taking a real environment state as input of the strategy generator so asto obtain action output. According to the method, a value return function is judged and learned; a large number of complex intermediate steps of inverse reinforcement learning with high calculation amount are bypassed; and the learning process is simpler and more efficient.

Description

technical field [0001] The invention belongs to the technical field of imitation learning and artificial intelligence, in particular to a robot imitation learning method based on virtual scene training. Background technique [0002] In traditional reinforcement learning tasks, the optimal policy is usually learned by calculating cumulative rewards. This method is simple and straightforward, and has better performance when more training data is available. However, in the multi-step decision task (sequential decision), the learner cannot be rewarded frequently, and there is a huge search space based on the cumulative reward and learning method. At the same time, reinforcement learning requires a suitable reward function to solve the optimal action strategy, but in many cases it is not easy to design a sufficiently comprehensive and excellent reward function, especially in some complex application scenarios, such as automatic driving collision It is difficult to have a reasona...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F30/20G06N3/00G06N3/04G06N3/08
CPCG06N3/008G06N3/08G06N3/045
Inventor 杜广龙周万义
Owner SOUTH CHINA UNIV OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products