State data generation method and system for reinforcement learning

A state data, reinforcement learning technology, applied in the field of deep reinforcement learning, can solve problems such as instability in the training process, and achieve the effect of shortening the time required for exploration, reducing instability, and increasing the number of rewards

Active Publication Date: 2019-07-19
PEKING UNIV
View PDF6 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, the introduction of internal rewards deviates from the original goal, and it is easy to cause instability in the training process

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • State data generation method and system for reinforcement learning
  • State data generation method and system for reinforcement learning
  • State data generation method and system for reinforcement learning

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0041] Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. Although exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited by the embodiments set forth herein. Rather, these embodiments are provided for more thorough understanding of the present disclosure and to fully convey the scope of the present disclosure to those skilled in the art.

[0042] According to an embodiment of the present application, a method for generating state data for reinforcement learning is proposed, such as figure 1 shown, including:

[0043] S101. Obtain all first state data of the agent in the first learning stage, and obtain second state data in all first state data that is within a preset range of steps from the learning goal;

[0044] S102, using all the first state data to train a vari...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a state data generation method and system for reinforcement learning, and the method comprises the steps: obtaining all first state data of an intelligent agent in a first learning stage, and obtaining second state data in the first state data, wherein the distance between the second state data and a learning target accords with a preset step number range; training the variational automatic encoder by using the first state data to obtain a trained encoder of the variational automatic encoder, and sampling to obtain a plurality of first potential variables; inputting second state data into the trained encoder to obtain a plurality of second potential variables; selecting a third potential variable meeting a preset condition from the first potential variable and the second potential variable; and inputting the third potential variable into a decoder of the variational automatic encoder to generate initial state data of the second learning stage. By generating thenew state data, the agent starts to explore from the new state data instead of the original state data at a certain probability, the exploration time is shortened, the reward frequency is increased, and the method is suitable for any reinforcement learning method.

Description

technical field [0001] This application relates to the field of deep reinforcement learning, in particular to a method and system for generating state data for reinforcement learning. Background technique [0002] Deep Reinforcement Learning (DRL) has achieved remarkable success in continuous decision-making tasks such as Go and robotic arms. In reinforcement learning (Reinforcement Learning, RL), the agent (Agent) observes the environment (Environment) state (State), selects the action with the greatest reward (Reward) expectation, and receives the feedback given by the environment. Training by temporal difference or policy gradient, or an Actor-Critic Algorithm combining the two. However, in real-world applications, a common problem is that the rewards (Reward) are sparse. Some tasks can only be rewarded when the goal is completed, and the rewards are zero in other cases. And the target is difficult to explore through traditional exploration strategies, which brings grea...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06N20/00
CPCG06N20/00
Inventor 卢宗青姜杰川
Owner PEKING UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products