Reinforcement learning apparatus, control apparatus, and reinforcement learning method

a technology of reinforcement learning and control apparatus, applied in the direction of electric programme control, program control, instruments, etc., can solve the problems of impeded learning, increased or decreased learning results, and increased difficulty in trade-off problems

Active Publication Date: 2014-11-11
ATR ADVANCED TELECOMM RES INST INT +1
View PDF4 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Benefits of technology

[0008]With this configuration, the requirements for obstacle avoidance are solved with a virtual external force, and therefore it is possible to perform robot motor learning with a simple reward function in a quick and stable manner.
[0025]With a reinforcement learning apparatus according to the present invention, it is possible to perform robot motor learning in a quick and stable manner.

Problems solved by technology

Conventional techniques, however, have a problem in that the reward function for a complex motion trajectory is often expressed by the sum of various terms, and a trade-off occurring between the terms impedes learning (this is called the “trade-off problem”).
If the ratio of these two elements is not set appropriately, then the speed of learning results is extremely increased or decreased, resulting in undesirable motion trajectories.
This trade-off problem becomes more challenging if requirements such as obstacle avoidance, in addition to a reaching movement, are further imposed.
Too small a negative reward given upon contact with an obstacle results in collision of the robot arm with the obstacle, and too large a negative reward leads to a learning result in which the robot arm does not move from the starting point.
When the reward function has become too complex, the designer has to empirically adjust the balance between the elements, compromising the advantage of reinforcement learning.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Reinforcement learning apparatus, control apparatus, and reinforcement learning method
  • Reinforcement learning apparatus, control apparatus, and reinforcement learning method
  • Reinforcement learning apparatus, control apparatus, and reinforcement learning method

Examples

Experimental program
Comparison scheme
Effect test

embodiment 1

[0038]In this embodiment, a description will be given of a reinforcement learning apparatus and the like that have the function of generating a virtual external force. In this embodiment, a description will also be given of a reinforcement learning apparatus in which a reinforcement learner and a virtual external force generator that constitute the reinforcement learning apparatus are separated. In this embodiment, a description will also be given of a reinforcement learning apparatus capable of performing automatic switching from the virtual external force generator to the virtual external force approximator. In this embodiment, a description will also be given of a reinforcement learning apparatus that can improve the reusability of the virtual external force generator. Furthermore, in this embodiment, a description will also be given of a reinforcement learning apparatus capable of performing model selection for the function approximator.

[0039]FIG. 1 is a schematic diagram illust...

embodiment 2

[0113]In this embodiment, a description will be given of a reinforcement learning apparatus and the like in which the reinforcement learner and the virtual external force generator are not separated.

[0114]FIG. 9 is a block diagram showing a reinforcement learning system B according to this embodiment.

[0115]The reinforcement learning system B includes the control object 1 and a reinforcement learning apparatus 3. The difference between the reinforcement learning apparatus 3 and the reinforcement learning apparatus 2 lies in whether the reinforcement learner and the virtual external force generator are separated or not. In the reinforcement learning apparatus 3, the reinforcement learner and the virtual external force generator are not separated.

[0116]The reinforcement learning apparatus 3 includes the reward function storage unit 211, the first-type environment parameter obtaining unit 212, the control parameter value calculation unit 213, the control parameter value output unit 214,...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

It is possible to perform robot motor learning in a quick and stable manner using a reinforcement learning apparatus including: a first-type environment parameter obtaining unit that obtains a value of one or more first-type environment parameters; a control parameter value calculation unit that calculates a value of one or more control parameters maximizing a reward by using the value of the one or more first-type environment parameters; a control parameter value output unit that outputs the value of the one or more control parameters to the control object; a second-type environment parameter obtaining unit that obtains a value of one or more second-type environment parameters; a virtual external force calculation unit that calculates the virtual external force by using the value of the one or more second-type environment parameters; and a virtual external force output unit that outputs the virtual external force to the control object.

Description

[0001]This application claims priority under 35 U.S.C. §119 to Japanese Patent Application No. 2011-074694, filed Mar. 30, 2011, which is incorporated by reference.BACKGROUND OF THE INVENTION[0002]1. Field of the Invention[0003]The present invention relates to a reinforcement learning apparatus and the like that perform robot motor learning.[0004]2. Description of Related Art[0005]Reinforcement learning has been widely used as a robot motor learning technique because it can be implemented even if the dynamics of the control object or the environment are unknown and it autonomously performs learning by simply setting a reward function according to the task (see, for example, JP 2007-66242A).[0006]Conventional techniques, however, have a problem in that the reward function for a complex motion trajectory is often expressed by the sum of various terms, and a trade-off occurring between the terms impedes learning (this is called the “trade-off problem”). For example, the reward function...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(United States)
IPC IPC(8): G05B13/02G06F19/00
CPCG05B13/0265Y10S901/03
Inventor SUGIMOTO, NORIKAZUUEDA, YUGOHASEGAWA, TADAAKIIBA, SOSHIAKATSUKA, KOJI
Owner ATR ADVANCED TELECOMM RES INST INT
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products