Eureka AIR delivers breakthrough ideas for toughest innovation challenges, trusted by R&D personnel around the world.

Reinforcement learning method based on mixed behavior space

A technology of reinforcement learning and behavior, applied in the field of reinforcement learning, can solve the problems of loss of accuracy, over-parameterization, lack of theoretical support, etc., and achieve the effect of not losing accuracy

Inactive Publication Date: 2021-01-05
SHANGHAI JIAO TONG UNIV
View PDF3 Cites 3 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

But in the update process, the goal is to maximize all Q values ​​instead of the largest Q value, which will lead to over-parameterization and cause some unnecessary training
[0008] The following conclusions can be drawn from the analysis of related research at home and abroad: At present, there are some loopholes in the existing methods of reinforcement learning algorithms in mixed behavior spaces, such as loss of accuracy, lack of theoretical support, over-parameterization, etc. A relatively complete and general reinforcement learning algorithm for mixed behavior spaces has not been proposed

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Reinforcement learning method based on mixed behavior space
  • Reinforcement learning method based on mixed behavior space

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0035] The following describes several preferred embodiments of the present invention with reference to the accompanying drawings, so as to make the technical content clearer and easier to understand. The present invention can be embodied in many different forms of embodiments, and the protection scope of the present invention is not limited to the embodiments mentioned herein.

[0036] In the drawings, components with the same structure are denoted by the same numerals, and components with similar structures or functions are denoted by similar numerals. The size and thickness of each component shown in the drawings are shown arbitrarily, and the present invention does not limit the size and thickness of each component. In order to make the illustration clearer, the thickness of parts is appropriately exaggerated in some places in the drawings.

[0037] Such as figure 1 As shown, the embodiment of the present invention provides a reinforcement learning method based on a mixed ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a reinforcement learning method based on a mixed behavior space, and relates to the field of reinforcement learning, and the invention consists of a plurality of parallel Actornetworks which jointly act to output structured behaviors, and a Critic network which guides the training of the Actor networks. The Actor network comprises a state coding network, a discrete Actor network and a continuous parameter Actor network, the state coding network codes the state and inputs the state into the discrete Actor network and the continuous parameter Actor network, the discreteActor network is used for generating discrete actions, and the continuous parameter Actor network is used for generating continuous parameters corresponding to the discrete actions. According to the invention, the mixed behavior space with continuous actions and discrete actions can be processed, and the method can be expanded to all behavior spaces with hierarchical structures. According to the invention, a reinforcement learning result better than that of a previous mixed behavior space processing method can be obtained, the accuracy of behaviors is not lost, and the problem of over-parameterization is avoided through mask operation.

Description

technical field [0001] The invention relates to the field of reinforcement learning, in particular to a reinforcement learning method based on a mixed behavior space. Background technique [0002] Representation and learning of complex strategies in reinforcement learning refers to the problem of how to represent strategies and learn end-to-end when the strategy is relatively complex in reinforcement learning. The present invention is mainly aimed at the problem of mixed behavior space, that is, the behavior has a part of discrete selection and a part of continuous parameters. For example, in an automatic driving task, whether to turn the steering wheel or brake in this step is a discrete action choice. If you turn the steering wheel, what is the corresponding angle? This is the action selection of continuous values. Most of the current reinforcement learning algorithms are aimed at pure discrete behavior spaces or pure continuous behavior spaces, and there are few algorith...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06N20/00
CPCG06N20/00
Inventor 粟锐张伟楠俞勇
Owner SHANGHAI JIAO TONG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Eureka Blog
Learn More
PatSnap group products