A method to improve the efficiency of reinforcement learning exploration under the condition of limited resources

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A reinforcement learning and resource-constrained technology, which is applied in the field of improving the exploration efficiency of reinforcement learning, can solve problems such as resource-constrained reinforcement learning, and achieve the effect of improving exploration efficiency and saving resources

Active Publication Date: 2022-07-15

UNIV OF SCI & TECH OF CHINA

View PDF14 Cites 0 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

Therefore, existing techniques are difficult to solve the problem of resource-constrained reinforcement learning

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment 1

[0054] like figure 1 As shown, this embodiment provides a method for improving the exploration efficiency of reinforcement learning under the condition of limited resources, that is, the Resource-Aware Exploration Bonus (RAEB) method. The detailed technical scheme of the invention is explained:

[0055] Mission Background:

[0056] Given a decision-making task in a real-world application, the given problem can be modeled as a Markov decision problem ; with a tuple represents the Markov decision problem; where, is the state space, is the action space, and both the state space and the action space are continuous; let is the state transition probability density, let is a deterministic reward function, let is the discount factor; the strategy, that is, the mapping from the state space to the probability distribution on the action space, is recorded as ;remember is the probability density function on the action space; denote the set of feasible strategies as ; T...

Embodiment 2

[0095] This embodiment verifies the effect of the method of the present invention by simulating a simulation environment in which a robot transports goods. Specifically, a series of robot handling tasks are designed based on the classical control in OpenAI Gym and the robot motion in Mujoco. The loading environment of each simulated robot is as follows:

[0096] One is to carry a mountain bike-shaped robot (Delivery Mountain Car) Delivery Mountain Car (see figure 2 ): control the mountain bike-shaped robot to climb the mountain in a two-dimensional plane, and the mountain-bike-shaped robot can choose to unload or not unload at any place, and the goal is to unload the mountain-bike-shaped robot at the top of the mountain. The state space is 3-dimensional and the action space is 2-dimensional.

[0097] Another is to carry the spider-shaped robot (Delivery Ant) Delivery Ant: control the spider-shaped robot to move forward in the corridor, and the spider-shaped robot can choose...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a method for improving the exploration efficiency of reinforcement learning under the condition of limited resources. The new state and the corresponding external reward after executing the action of the agent; step 3, calculating the intrinsic reward used by the agent to explore the environment by means of resource perception; step 4, calculating the total reward calculation formula of the agent according to the following: Overall reward; Step 5, update the strategy network of the reinforcement learning algorithm of the agent according to the overall reward of the agent; Step 6, judge whether the cumulative value of all the external rewards obtained by the agent's current environmental exploration is not Maximize, if not, go back to step 1 to repeat the process, if yes, end the current round of environmental exploration. This method can improve the exploration efficiency of mainstream reinforcement learning methods applied by agents under the condition of limited resources.

Description

technical field [0001] The invention relates to the field of reinforcement learning of an agent, in particular to a method for improving the exploration efficiency of reinforcement learning under the condition of limited resources. Background technique [0002] Reinforcement learning (RL) has been widely used in autonomous driving robots, intelligent robots and other intelligent bodies. The reinforcement learning method has a powerful ability to learn complex behaviors, so the application of reinforcement learning has attracted widespread attention recently. In many real-world tasks, performing actions requires consuming certain types of resources, such as autonomous driving, intelligent robotic missions, military deployment, game AI, and business decision-making. Acceleration, for example, consumes the car's gas in autonomous driving. Furthermore, resources may all be scarce and unrefillable. In video games, certain actions that can significantly affect the final score re...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Patents(China)

IPC IPC(8): G06N20/00

CPCG06N20/00

Inventor 王杰王治海潘涛星周祺李厚强

Owner UNIV OF SCI & TECH OF CHINA

A method to improve the efficiency of reinforcement learning exploration under the condition of limited resources

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment 1

Embodiment 2

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology