A method and apparatus for reinforcement learning

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A technology of reinforcement learning and learning environment, applied in the field of reinforcement learning, can solve a large number of problems such as labor and deviation, and achieve the effect of speeding up training, improving the final effect, and expanding applications

Inactive Publication Date: 2019-06-07

TSINGHUA UNIV

View PDF2 Cites 11 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

The common feature of these methods is that not only a lot of labor is required, but also the newly set reward function is not completely equivalent to the original reward function in the environment due to human knowledge, which leads to the final training strategy and the actual optimal strategy. There is a deviation

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0021] Embodiments of the present invention will now be described in detail with reference to the accompanying drawings. It should be noted that the following description is exemplary only and is not intended to limit the present invention. Also, in the following description, the same reference numerals will be used to designate the same or similar components in different drawings. Different features in different embodiments described below can be combined with each other to form other embodiments within the scope of the present invention.

[0022] figure 1 A schematic flowchart of a method for reinforcement learning according to an embodiment of the present invention is shown. Such as figure 1 As shown, the method of the present invention abstracts the environment into objects and the relationship between objects, and then uses relational reinforcement learning to solve the optimal state evaluation function of each environmental state, and uses the difference of the evalua...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a method and apparatus for reinforcement learning. The method comprises the following steps: running a reinforcement learning model to obtain a first training data set; deriving a state transition function and a reward function of the reinforcement learning model by counting the first training data set; based on the state transition function and the reward function, solvingan optimal state estimation function through a relation reinforcement learning algorithm; modifying a reward value in the first training data set by using the optimal state estimation function; and training the reinforcement learning model by using the modified first training data set. According to the method, the final effect obtained by training the reinforcement learning model can be improved,and the training speed of the reinforcement learning model can be increased, so that the application of reinforcement learning in each actual scene is expanded.

Description

technical field [0001] The present invention relates to the field of machine learning, in particular to a method, device and storage medium for reinforcement learning. Background technique [0002] Reinforcement learning technology has achieved good results in many application fields. The deep reinforcement learning method based on neural network has achieved great success beyond the top human level in Atari video games, Go, Japanese chess and chess. But the training of deep reinforcement learning is very difficult. There are two main reasons. One is that the training signal of reinforcement learning is sparse throughout the training process. Specifically, only a small part of the training data has a reward function value that is not zero; the second is the role of the reward function Effects are usually delayed, i.e. actions that should be rewarded often get their corresponding reward signal many actions later. [0003] Traditional methods usually manually set the reward ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G06N20/00G06N3/04

Inventor 朱军阎栋苏航黄世宇

Owner TSINGHUA UNIV

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

A method and apparatus for reinforcement learning

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology