A method and apparatus for reinforcement learning

A technology of reinforcement learning and learning environment, applied in the field of reinforcement learning, can solve a large number of problems such as labor and deviation, and achieve the effect of speeding up training, improving the final effect, and expanding applications

Inactive Publication Date: 2019-06-07
TSINGHUA UNIV
View PDF2 Cites 11 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The common feature of these methods is that not only a lot of labor is required, but also the newly set reward function is not completely equivalent to the orig

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A method and apparatus for reinforcement learning
  • A method and apparatus for reinforcement learning
  • A method and apparatus for reinforcement learning

Examples

Experimental program
Comparison scheme
Effect test

Example Embodiment

[0021] The embodiments of the present invention will now be described in detail with reference to the drawings. It should be noted that the following description is only exemplary, and is not intended to limit the present invention. In addition, in the following description, the same reference numerals will be used to denote the same or similar components in different drawings. Different features in the different embodiments described below can be combined with each other to form other embodiments within the scope of the present invention.

[0022] figure 1 A schematic flowchart of a method for reinforcement learning according to an embodiment of the present invention is shown. Such as figure 1 As shown, the method of the present invention abstracts the environment into objects and the relationship between objects, and then uses relational reinforcement learning to solve the optimal state evaluation function of each environmental state, and uses the difference of the evaluation ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a method and apparatus for reinforcement learning. The method comprises the following steps: running a reinforcement learning model to obtain a first training data set; deriving a state transition function and a reward function of the reinforcement learning model by counting the first training data set; based on the state transition function and the reward function, solvingan optimal state estimation function through a relation reinforcement learning algorithm; modifying a reward value in the first training data set by using the optimal state estimation function; and training the reinforcement learning model by using the modified first training data set. According to the method, the final effect obtained by training the reinforcement learning model can be improved,and the training speed of the reinforcement learning model can be increased, so that the application of reinforcement learning in each actual scene is expanded.

Description

technical field [0001] The present invention relates to the field of machine learning, in particular to a method, device and storage medium for reinforcement learning. Background technique [0002] Reinforcement learning technology has achieved good results in many application fields. The deep reinforcement learning method based on neural network has achieved great success beyond the top human level in Atari video games, Go, Japanese chess and chess. But the training of deep reinforcement learning is very difficult. There are two main reasons. One is that the training signal of reinforcement learning is sparse throughout the training process. Specifically, only a small part of the training data has a reward function value that is not zero; the second is the role of the reward function Effects are usually delayed, i.e. actions that should be rewarded often get their corresponding reward signal many actions later. [0003] Traditional methods usually manually set the reward ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06N20/00G06N3/04
Inventor 朱军阎栋苏航黄世宇
Owner TSINGHUA UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products