Reinforcement learning method based on mixed behavior space

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A technology of reinforcement learning and behavior, applied in the field of reinforcement learning, can solve the problems of loss of accuracy, over-parameterization, lack of theoretical support, etc., and achieve the effect of not losing accuracy

Inactive Publication Date: 2021-01-05

SHANGHAI JIAO TONG UNIV

View PDF3 Cites 3 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

But in the update process, the goal is to maximize all Q values instead of the largest Q value, which will lead to over-parameterization and cause some unnecessary training

[0008] The following conclusions can be drawn from the analysis of related research at home and abroad: At present, there are some loopholes in the existing methods of reinforcement learning algorithms in mixed behavior spaces, such as loss of accuracy, lack of theoretical support, over-parameterization, etc. A relatively complete and general reinforcement learning algorithm for mixed behavior spaces has not been proposed

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0035] The following describes several preferred embodiments of the present invention with reference to the accompanying drawings, so as to make the technical content clearer and easier to understand. The present invention can be embodied in many different forms of embodiments, and the protection scope of the present invention is not limited to the embodiments mentioned herein.

[0036] In the drawings, components with the same structure are denoted by the same numerals, and components with similar structures or functions are denoted by similar numerals. The size and thickness of each component shown in the drawings are shown arbitrarily, and the present invention does not limit the size and thickness of each component. In order to make the illustration clearer, the thickness of parts is appropriately exaggerated in some places in the drawings.

[0037] Such as figure 1 As shown, the embodiment of the present invention provides a reinforcement learning method based on a mixed ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a reinforcement learning method based on a mixed behavior space, and relates to the field of reinforcement learning, and the invention consists of a plurality of parallel Actornetworks which jointly act to output structured behaviors, and a Critic network which guides the training of the Actor networks. The Actor network comprises a state coding network, a discrete Actor network and a continuous parameter Actor network, the state coding network codes the state and inputs the state into the discrete Actor network and the continuous parameter Actor network, the discreteActor network is used for generating discrete actions, and the continuous parameter Actor network is used for generating continuous parameters corresponding to the discrete actions. According to the invention, the mixed behavior space with continuous actions and discrete actions can be processed, and the method can be expanded to all behavior spaces with hierarchical structures. According to the invention, a reinforcement learning result better than that of a previous mixed behavior space processing method can be obtained, the accuracy of behaviors is not lost, and the problem of over-parameterization is avoided through mask operation.

Description

technical field [0001] The invention relates to the field of reinforcement learning, in particular to a reinforcement learning method based on a mixed behavior space. Background technique [0002] Representation and learning of complex strategies in reinforcement learning refers to the problem of how to represent strategies and learn end-to-end when the strategy is relatively complex in reinforcement learning. The present invention is mainly aimed at the problem of mixed behavior space, that is, the behavior has a part of discrete selection and a part of continuous parameters. For example, in an automatic driving task, whether to turn the steering wheel or brake in this step is a discrete action choice. If you turn the steering wheel, what is the corresponding angle? This is the action selection of continuous values. Most of the current reinforcement learning algorithms are aimed at pure discrete behavior spaces or pure continuous behavior spaces, and there are few algorith...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G06N20/00

CPCG06N20/00

Inventor 粟锐张伟楠俞勇

Owner SHANGHAI JIAO TONG UNIV

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Reinforcement learning method based on mixed behavior space

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology