Method for selecting reward function in adversarial imitation learning

A reward function and function technology, applied in the field of reward function selection in adversarial imitation learning, can solve problems such as not being able to achieve optimality

Active Publication Date: 2020-07-10
SHENZHEN GRADUATE SCHOOL TSINGHUA UNIV
View PDF2 Cites 24 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] In order to solve the problem that the reward function form used in the imitation learning algorithm based on generative confrontation network is not optimal in all tasks in the

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for selecting reward function in adversarial imitation learning
  • Method for selecting reward function in adversarial imitation learning
  • Method for selecting reward function in adversarial imitation learning

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0047] In order to make the technical problems, technical solutions and beneficial effects to be solved by the embodiments of the present invention clearer, the present invention will be further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the present invention, not to limit the present invention.

[0048] It should be noted that when an element is referred to as being “fixed” or “disposed on” another element, it may be directly on the other element or be indirectly on the other element. When an element is referred to as being "connected to" another element, it can be directly connected to the other element or indirectly connected to the other element. In addition, the connection can be used for both fixing function and circuit communication function.

[0049] It is to be understood that the terms "length", "width", "top", "bottom", "front"...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a method for selecting reward functions in adversarial imitation learning. The method comprises the following steps: constructing a strategy network with a parameter theta, a discrimination network with a parameter w and at least two reward functions; obtaining teaching data under an expert strategy and storing the teaching data into an expert data buffer containing an expert track, wherein the input of the control strategy network is a state returned by the simulation environment, and the output is a decision action; enabling the discrimination network to update parameters by using the state action pair under the expert strategy and the state action pair of the strategy network; in the award calculation stage, judging that the input of the network is a state actionpair of the strategy network, and judging that the output value is an award value obtained by award function calculation; selecting a reward function of the current task according to the sizes of theperformance indexes of different reward functions; and storing the parameters of the strategy network corresponding to the selected reward function. And the intelligent agent learns under the guidanceof different reward functions, and then selects an optimal reward function in a specific task scene according to the performance evaluation index.

Description

technical field [0001] The invention relates to the technical field of reward function selection, in particular to a method for selecting a reward function in adversarial imitation learning. Background technique [0002] In recent years, as deep learning has made major breakthroughs in the fields of image recognition, speech technology, and natural language processing, deep reinforcement learning, which combines deep neural networks and reinforcement learning, has also achieved results in large-scale strategy optimization problems such as Go and StarCraft. surpassed human performance. One of the bottlenecks of reinforcement learning is that when faced with practical control problems such as autonomous driving and robots, it takes time and effort to design a reasonable reward function based on expert experience. Data-driven imitation learning provides a way to solve this problem. It does not need to manually design reward functions, and can learn strategies comparable to exp...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06N3/08
CPCG06N3/08
Inventor 李秀王亚伟张明
Owner SHENZHEN GRADUATE SCHOOL TSINGHUA UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products