Improved Bayesian inverse reinforcement learning method based on combined feedback

A reinforcement learning, learner technology, applied in the field of machine learning, can solve problems such as inability to achieve

Inactive Publication Date: 2019-07-05
BEIJING UNIV OF TECH
View PDF0 Cites 13 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] Traditional inverse reinforcement learning often requires optimal demonstrat

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Improved Bayesian inverse reinforcement learning method based on combined feedback
  • Improved Bayesian inverse reinforcement learning method based on combined feedback
  • Improved Bayesian inverse reinforcement learning method based on combined feedback

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0069] In order to make the object, technical solution and features of the present invention clearer, the present invention will be further described in detail below in conjunction with specific implementation examples and with reference to the accompanying drawings. The basic flow chart of the detection method for malicious code variants is as follows: figure 1 shown.

[0070] The experimental environment is as follows:

[0071] In order to verify the performance of the design optimization algorithm, the simulation experiment of the algorithm is implemented based on the Matlab2014a environment on the Windows 10 operating system. The PC is configured as Intel(R) Core i56500@3, 2GHz processor, and 4G memory.

[0072] The performance of the method described in the present invention will be evaluated in the following two simulated environments:

[0073] · Grid world navigation tasks.

[0074] · Highway car driving.

[0075]At present, an IRL method with feedback has not been ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses an improved Bayesian inverse reinforcement learning method based on combined feedback, and provides an interactive learning method combining expert feedback and demonstration,in an LfF, an expert evaluates the behavior of a learner and gives feedback with different rewards, so as to improve the strategy of the learner. In the LfD, the Agent attempts to learn its policy byobserving expert demonstration. The research algorithm of the method is divided into three learning stages: learning is carried out from non-optimal demonstration; learning from the feedback; demonstration and feedback learning; to reduce states requiring iteration According to the number of actions, the Agent strategy is improved through iteration of the graphical Bayesian rule to enhance the learned reward function, and the speed of finding the optimal action is increased.

Description

technical field [0001] The invention belongs to the field of machine learning, and relates to the combination of a reverse reinforcement learning algorithm in machine learning and graphical Bayesian theorem, which is used in combination with expert feedback in the reverse reinforcement learning algorithm for interactive learning. Background technique [0002] In recent years, machine learning is a research hotspot in various fields. The purpose of machine learning is to train complex systems, such as autonomous cars and assistive robots, to perform complex tasks in the real world. [0003] Reinforcement learning is the most widely used algorithm in machine learning, but because the reward function of reinforcement learning is artificially set, it is highly subjective. Inverse reinforcement learning can better solve this problem. The Inverse Reinforcement Learning (IRL) problem was first proposed by Russell (1998), and there are already many IRL algorithms. For example, th...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06K9/62G06N99/00
CPCG06F18/29
Inventor 张丽雅宁振虎薛菲王小平
Owner BEIJING UNIV OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products