Method for selecting reward function in adversarial imitation learning

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A reward function and function technology, applied in the field of reward function selection in adversarial imitation learning, can solve problems such as not being able to achieve optimality

Active Publication Date: 2020-07-10

SHENZHEN GRADUATE SCHOOL TSINGHUA UNIV

View PDF2 Cites 24 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0005] In order to solve the problem that the reward function form used in the imitation learning algorithm based on generative confrontation network is not optimal in all tasks in the prior art, and therefore cannot achieve optimal performance in multiple tasks, the present invention provides a A Method of Reward Function Selection in Adversarial Imitation Learning

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0047] In order to make the technical problems, technical solutions and beneficial effects to be solved by the embodiments of the present invention clearer, the present invention will be further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the present invention, not to limit the present invention.

[0048] It should be noted that when an element is referred to as being “fixed” or “disposed on” another element, it may be directly on the other element or be indirectly on the other element. When an element is referred to as being "connected to" another element, it can be directly connected to the other element or indirectly connected to the other element. In addition, the connection can be used for both fixing function and circuit communication function.

[0049] It is to be understood that the terms "length", "width", "top", "bottom", "front"...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention provides a method for selecting reward functions in adversarial imitation learning. The method comprises the following steps: constructing a strategy network with a parameter theta, a discrimination network with a parameter w and at least two reward functions; obtaining teaching data under an expert strategy and storing the teaching data into an expert data buffer containing an expert track, wherein the input of the control strategy network is a state returned by the simulation environment, and the output is a decision action; enabling the discrimination network to update parameters by using the state action pair under the expert strategy and the state action pair of the strategy network; in the award calculation stage, judging that the input of the network is a state actionpair of the strategy network, and judging that the output value is an award value obtained by award function calculation; selecting a reward function of the current task according to the sizes of theperformance indexes of different reward functions; and storing the parameters of the strategy network corresponding to the selected reward function. And the intelligent agent learns under the guidanceof different reward functions, and then selects an optimal reward function in a specific task scene according to the performance evaluation index.

Description

technical field [0001] The invention relates to the technical field of reward function selection, in particular to a method for selecting a reward function in adversarial imitation learning. Background technique [0002] In recent years, as deep learning has made major breakthroughs in the fields of image recognition, speech technology, and natural language processing, deep reinforcement learning, which combines deep neural networks and reinforcement learning, has also achieved results in large-scale strategy optimization problems such as Go and StarCraft. surpassed human performance. One of the bottlenecks of reinforcement learning is that when faced with practical control problems such as autonomous driving and robots, it takes time and effort to design a reasonable reward function based on expert experience. Data-driven imitation learning provides a way to solve this problem. It does not need to manually design reward functions, and can learn strategies comparable to exp...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G06N3/08

CPCG06N3/08

Inventor 李秀王亚伟张明

Owner SHENZHEN GRADUATE SCHOOL TSINGHUA UNIV

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Method for selecting reward function in adversarial imitation learning

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology