Robot path exploration method based on double-agent competitive reinforcement learning

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A reinforcement learning and intelligent body technology, applied in instruments, computer parts, biological neural network models, etc., can solve problems such as difficulty in designing an ideal reward function, easy to be affected by random noise, etc., to solve the problem of sparse reward, strong The effect of robustness

Pending Publication Date: 2022-04-19

TONGJI UNIV

View PDF0 Cites 1 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0004] 1. Hardly hand-coded reward functions, by introducing a large amount of relevant domain knowledge to construct reward functions, to guide the agent to explore and learn according to the ideal trajectory, but this method needs to achieve sufficient understanding of domain knowledge, and it is difficult Design an ideal reward function under the environment

[0005] 2. Imitation learning, that is, to learn expert strategies, by introducing expert samples to directly clone behaviors or construct reward functions to guide agent strategies close to expert strategies, so as to obtain ideal strategies, but this method requires a certain number of expert samples in advance , which is not possible in some environments

However, this method has model constraints and is easily affected by random noise

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment

[0040] A kind of method of the present invention, the present invention is a kind of robot path exploration method based on dual-agent competitive reinforcement learning, utilizes two intelligent bodies (in this example, be two identical robots or a robot in repeatable experimental environment) The similarity of the exploration state is used to generate intrinsic rewards, thereby enhancing the agent's exploration, without the need for complex reward function design, avoiding the introduction of domain knowledge, and using the certainty of the agent's explored trajectories to eliminate the influence of random noise.

[0041] The framework diagram of this method is shown in the figure 1 As shown, the verification environment adopts the reinforcement learning test environment MultiRooms trajectory exploration environment, that is, a series of exploration environments with 4 to 6 rooms are randomly generated in the two-dimensional grid plane, as shown in figure 2 As shown, the tr...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention relates to a robot path exploration method based on double-agent competitive reinforcement learning, and the method comprises the following steps: S1, constructing a Markov decision model, and initializing agents and an experience pool; s2, recording a current state st of an agent Agent1, exploring k steps, and recording a current track sequence to an experience pool Buffer 1; s3, the intelligent agent Agent2 is placed at the state st, the intelligent agent Agent2 explores k steps, and a current track sequence is recorded to an experience pool Buffer 2; s4, taking the similarity between the exploration trajectories as an additional reward of the agent Agent1, and taking an opposite number as an additional reward of the agent Agent2; s5, updating strategies of the agents Agent1 and Agent2 when the number of data in the experience pool meets the requirement; s6, repeatedly executing the steps S2-S5 until the intelligent agent Agent1 reaches the target state or exceeds the set time tlimit; and S7, repeatedly executing the steps S1-S6 until the set training episode number is completed. Compared with the prior art, the method has the advantages that the intelligent agent can explore more effectively, the training speed is increased, the utilization efficiency of samples is improved, random noise can be effectively eliminated, and the robustness is higher.

Description

technical field [0001] The invention relates to the field of robot trajectory planning, in particular to a robot path exploration method based on dual-agent competitive reinforcement learning. Background technique [0002] Reinforcement learning technology has made remarkable achievements in the field of robot control, but reinforcement learning is based on the reward mechanism, that is, the goal of the agent is to obtain the cumulative maximum reward, and for most of the existing reinforcement learning applications for robot path exploration The scenarios are all sparsely rewarded environments, that is, the agent will only receive positive rewards when it reaches the final goal, and there will be no rewards in other cases, and the agent that does not receive feedback will lack an effective mechanism to update its own strategy. Converge to the ideal policy. [0003] The current solution to sparse rewards is as follows: [0004] 1. Hardly hand-coded reward functions, by int...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G06K9/62G06N3/02

CPCG06N3/02G06F18/295G06F18/214

Inventor 刘成菊陈启军张浩

Owner TONGJI UNIV

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Robot path exploration method based on double-agent competitive reinforcement learning

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology