A method and system for generating state data for reinforcement learning

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A state data and reinforcement learning technology, applied in the field of deep reinforcement learning, can solve problems such as instability in the training process, and achieve the effect of shortening the time required for exploration, reducing instability, and increasing the number of rewards

Active Publication Date: 2021-04-02

PEKING UNIV

View PDF6 Cites 0 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

However, the introduction of internal rewards deviates from the original goal, and it is easy to cause instability in the training process

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0041] Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. Although exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited by the embodiments set forth herein. Rather, these embodiments are provided for more thorough understanding of the present disclosure and to fully convey the scope of the present disclosure to those skilled in the art.

[0042] According to an embodiment of the present application, a method for generating state data for reinforcement learning is proposed, such as figure 1 shown, including:

[0043] S101. Obtain all first state data of the agent in the first learning stage, and obtain second state data in all first state data that is within a preset step range from the learning goal;

[0044] S102, using all the first state data to train a variatio...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a state data generation method and system for reinforcement learning, and the method comprises the steps: obtaining all first state data of an intelligent agent in a first learning stage, and obtaining second state data in the first state data, wherein the distance between the second state data and a learning target accords with a preset step number range; training the variational automatic encoder by using the first state data to obtain a trained encoder of the variational automatic encoder, and sampling to obtain a plurality of first potential variables; inputting second state data into the trained encoder to obtain a plurality of second potential variables; selecting a third potential variable meeting a preset condition from the first potential variable and the second potential variable; and inputting the third potential variable into a decoder of the variational automatic encoder to generate initial state data of the second learning stage. By generating thenew state data, the agent starts to explore from the new state data instead of the original state data at a certain probability, the exploration time is shortened, the reward frequency is increased, and the method is suitable for any reinforcement learning method.

Description

technical field [0001] This application relates to the field of deep reinforcement learning, in particular to a method and system for generating state data for reinforcement learning. Background technique [0002] Deep Reinforcement Learning (DRL) has achieved remarkable success in continuous decision-making tasks such as Go and robotic arms. In reinforcement learning (Reinforcement Learning, RL), the agent (Agent) observes the environment (Environment) state (State), selects the action with the greatest reward (Reward) expectation, and receives the feedback given by the environment. Training by temporal difference or policy gradient, or an Actor-Critic Algorithm combining the two. However, in real-world applications, a common problem is that the rewards (Reward) are sparse. Some tasks can only be rewarded when the goal is completed, and the rewards are zero in other cases. And the target is difficult to explore through traditional exploration strategies, which brings grea...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Patents(China)

IPC IPC(8): G06N20/00

CPCG06N20/00

Inventor 卢宗青姜杰川

Owner PEKING UNIV

A method and system for generating state data for reinforcement learning

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology