Generative confrontation interactive imitation learning method and system, storage medium and application

What is AI technical title?
AI technical title is built by Patsnap AI team. It summarizes the technical point description of the patent document.
A learning method and reinforcement learning technology, applied in the field of generative adversarial interactive imitation learning, which can solve the problems of algorithm performance degradation, long running time, and ill-posed recovery of reward function weights.

Pending Publication Date: 2021-09-10

OCEAN UNIV OF CHINA

View PDF0 Cites 3 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

However, recovering the weight values of the reward function using a linear estimate is an ill-posed problem: different rewards can explain the same behavior

[0007] (1) Most of the existing inverse reinforcement learning algorithms need to use the model information in the inner loop, which takes a lot of running time, which limits its application in large and complex tasks; if the planning problem does not get the optimal solution, the performance of the algorithm will be greatly attenuated

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0067] In order to make the object, technical solution and advantages of the present invention clearer, the present invention will be further described in detail below in conjunction with the examples. It should be understood that the specific embodiments described here are only used to explain the present invention, not to limit the present invention.

[0068] Aiming at the problems existing in the prior art, the present invention provides a method, system, storage medium and application of generative confrontation interactive imitation learning. The present invention will be described in detail below with reference to the accompanying drawings.

[0069] Such as figure 1 As shown, the generation confrontation interactive imitation learning method provided by the present invention includes the following steps:

[0070] S101: A GAIL-like stage based on maximum entropy inverse reinforcement learning, used to learn reward functions from expert demonstrations and train human eval...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention belongs to the technical field of artificial intelligence, and discloses a generative adversarial interactive imitation learning method and system, a storage medium and application, the generative adversarial interactive imitation learning method combines generative adversarial interactive imitation learning and an interactive learning framework to form generative adversarial interactive imitation learning GA2IL; the GA2IL is composed of two stages: (1) a GAIL-like stage based on maximum entropy inverse reinforcement learning; and (2) an interactive reinforcement learning stage. The GA2IL intelligent agent can surpass the performance of expert demonstration and obtain the optimal or near-optimal strategy under the condition that expert demonstration is optimal or suboptimal, the stability of the strategy can be improved, and the GA2IL intelligent agent can be expanded to large-scale complex tasks. According to the method, under the condition that whether optimal expert demonstration or suboptimal expert demonstration is given, the GA2IL intelligent agent can always surpass the performance of expert demonstration and can obtain an optimal strategy or a strategy close to the optimal strategy.

Description

technical field [0001] The invention belongs to the technical field of artificial intelligence, and in particular relates to a method, system, storage medium and application of generative confrontation interactive imitation learning. Background technique [0002] Currently: Reinforcement Learning (RL) attempts to allow robots to learn an optimal strategy in trial-and-error learning by interacting with the real world. In recent years, with the development of deep neural networks, by combining deep learning and reinforcement learning, deep reinforcement learning (DRL) has achieved great success in many simulation tasks such as video games, sports behavior, and mechanical operations. However, due to the sparse reward signal, deep reinforcement learning, like reinforcement learning, has low sampling efficiency and slow convergence. In addition, it is very difficult to design an effective reward function for each different task. This series of problems makes it a great challeng...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & AuthorityApplications(China)

IPC IPC(8): G06N3/04G06N3/08

CPCG06N3/08G06N3/045

Inventor李光亮黄杰隽荣顺沙启鑫何波

OwnerOCEAN UNIV OF CHINA

Generative confrontation interactive imitation learning method and system, storage medium and application

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology