Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Generative confrontation interactive imitation learning method and system, storage medium and application

A learning method and reinforcement learning technology, applied in the field of generative adversarial interactive imitation learning, which can solve the problems of algorithm performance degradation, long running time, and ill-posed recovery of reward function weights.

Pending Publication Date: 2021-09-10
OCEAN UNIV OF CHINA
View PDF0 Cites 3 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, recovering the weight values ​​of the reward function using a linear estimate is an ill-posed problem: different rewards can explain the same behavior
[0007] (1) Most of the existing inverse reinforcement learning algorithms need to use the model information in the inner loop, which takes a lot of running time, which limits its application in large and complex tasks; if the planning problem does not get the optimal solution, the performance of the algorithm will be greatly attenuated

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Generative confrontation interactive imitation learning method and system, storage medium and application
  • Generative confrontation interactive imitation learning method and system, storage medium and application
  • Generative confrontation interactive imitation learning method and system, storage medium and application

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0067] In order to make the object, technical solution and advantages of the present invention clearer, the present invention will be further described in detail below in conjunction with the examples. It should be understood that the specific embodiments described here are only used to explain the present invention, not to limit the present invention.

[0068] Aiming at the problems existing in the prior art, the present invention provides a method, system, storage medium and application of generative confrontation interactive imitation learning. The present invention will be described in detail below with reference to the accompanying drawings.

[0069] Such as figure 1 As shown, the generation confrontation interactive imitation learning method provided by the present invention includes the following steps:

[0070] S101: A GAIL-like stage based on maximum entropy inverse reinforcement learning, used to learn reward functions from expert demonstrations and train human eval...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention belongs to the technical field of artificial intelligence, and discloses a generative adversarial interactive imitation learning method and system, a storage medium and application, the generative adversarial interactive imitation learning method combines generative adversarial interactive imitation learning and an interactive learning framework to form generative adversarial interactive imitation learning GA2IL; the GA2IL is composed of two stages: (1) a GAIL-like stage based on maximum entropy inverse reinforcement learning; and (2) an interactive reinforcement learning stage. The GA2IL intelligent agent can surpass the performance of expert demonstration and obtain the optimal or near-optimal strategy under the condition that expert demonstration is optimal or suboptimal, the stability of the strategy can be improved, and the GA2IL intelligent agent can be expanded to large-scale complex tasks. According to the method, under the condition that whether optimal expert demonstration or suboptimal expert demonstration is given, the GA2IL intelligent agent can always surpass the performance of expert demonstration and can obtain an optimal strategy or a strategy close to the optimal strategy.

Description

technical field [0001] The invention belongs to the technical field of artificial intelligence, and in particular relates to a method, system, storage medium and application of generative confrontation interactive imitation learning. Background technique [0002] Currently: Reinforcement Learning (RL) attempts to allow robots to learn an optimal strategy in trial-and-error learning by interacting with the real world. In recent years, with the development of deep neural networks, by combining deep learning and reinforcement learning, deep reinforcement learning (DRL) has achieved great success in many simulation tasks such as video games, sports behavior, and mechanical operations. However, due to the sparse reward signal, deep reinforcement learning, like reinforcement learning, has low sampling efficiency and slow convergence. In addition, it is very difficult to design an effective reward function for each different task. This series of problems makes it a great challeng...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06N3/04G06N3/08
CPCG06N3/08G06N3/045
Inventor 李光亮黄杰隽荣顺沙启鑫何波
Owner OCEAN UNIV OF CHINA
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products