Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

A method of interactive reinforcement learning from demonstration and human assessment feedback

A reinforcement learning, human technique, applied in the field of artificial intelligence

Pending Publication Date: 2019-07-30
OCEAN UNIV OF CHINA
View PDF3 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, the performance learned by the agent from demonstrations is usually limited by the performance of the trainer, while the performance learned by the agent from human rewards generally exceeds the performance of the trainer on the task.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A method of interactive reinforcement learning from demonstration and human assessment feedback
  • A method of interactive reinforcement learning from demonstration and human assessment feedback
  • A method of interactive reinforcement learning from demonstration and human assessment feedback

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0011] The present invention will be described in detail below in combination with specific embodiments.

[0012] The present invention IRL-TAMER combines Inverse Reinforcement Learning (IRL), a typical agent learning from demonstration, and the TAMER framework, a typical agent learning method from human rewards. We hypothesized that agents learning via IRL-TAMER require less feedback than agents learning from human rewards alone, especially negative feedback, tested the algorithm on the Grid World task domain and compared it to human rewards via the TAMER framework. Agent learning using different discount factors was compared, and the results show that although learning an agent through IRL with a demonstration cannot obtain an effective control policy, it can still learn a useful value function through demonstration, which represents which states compare it is good. More importantly, learning from demonstrations reduces the amount of feedback, especially negative feedback, ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a method of interactive reinforcement learning from demonstration and human assessment feedback. An IRL-TAMER is formed by combining an inverse reinforcement learning IRL and aTAMER framework. The method has the beneficial effect that the intelligent agent can effectively learn from human rewards and demonstration through the learning method.

Description

technical field [0001] The invention belongs to the technical field of artificial intelligence and relates to a method for interactive reinforcement learning from demonstration and human evaluation feedback. Background technique [0002] Artificial intelligence (AI) research is experiencing an explosive boom with the central goal of deploying autonomous agents to solve real-world problems. With the vigorous development of artificial intelligence, autonomous agents have sprung up like mushrooms after rain and began to enter people's daily lives. Since most agents applied to real-world applications will be active in environments populated by humans, the skills to interact with and learn from human users in a natural way will be key to their success. Reinforcement learning with human evaluation feedback has proven to be a very effective way to help non-technical people guide agents to perform tasks. However, when learning from human rewards, the agent still needs to learn thr...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06N20/00
CPCG06N20/00
Inventor 李光亮何波冯晨林金莹张期磊
Owner OCEAN UNIV OF CHINA
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products