A method to improve the efficiency of reinforcement learning exploration under the condition of limited resources
A reinforcement learning and resource-constrained technology, which is applied in the field of improving the exploration efficiency of reinforcement learning, can solve problems such as resource-constrained reinforcement learning, and achieve the effect of improving exploration efficiency and saving resources
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0054] like figure 1 As shown, this embodiment provides a method for improving the exploration efficiency of reinforcement learning under the condition of limited resources, that is, the Resource-Aware Exploration Bonus (RAEB) method. The detailed technical scheme of the invention is explained:
[0055] Mission Background:
[0056] Given a decision-making task in a real-world application, the given problem can be modeled as a Markov decision problem ; with a tuple represents the Markov decision problem; where, is the state space, is the action space, and both the state space and the action space are continuous; let is the state transition probability density, let is a deterministic reward function, let is the discount factor; the strategy, that is, the mapping from the state space to the probability distribution on the action space, is recorded as ;remember is the probability density function on the action space; denote the set of feasible strategies as ; T...
Embodiment 2
[0095] This embodiment verifies the effect of the method of the present invention by simulating a simulation environment in which a robot transports goods. Specifically, a series of robot handling tasks are designed based on the classical control in OpenAI Gym and the robot motion in Mujoco. The loading environment of each simulated robot is as follows:
[0096] One is to carry a mountain bike-shaped robot (Delivery Mountain Car) Delivery Mountain Car (see figure 2 ): control the mountain bike-shaped robot to climb the mountain in a two-dimensional plane, and the mountain-bike-shaped robot can choose to unload or not unload at any place, and the goal is to unload the mountain-bike-shaped robot at the top of the mountain. The state space is 3-dimensional and the action space is 2-dimensional.
[0097] Another is to carry the spider-shaped robot (Delivery Ant) Delivery Ant: control the spider-shaped robot to move forward in the corridor, and the spider-shaped robot can choose...
PUM
Login to View More Abstract
Description
Claims
Application Information
Login to View More 


