Robot path exploration method based on double-agent competitive reinforcement learning

A reinforcement learning and intelligent body technology, applied in instruments, computer parts, biological neural network models, etc., can solve problems such as difficulty in designing an ideal reward function, easy to be affected by random noise, etc., to solve the problem of sparse reward, strong The effect of robustness

Pending Publication Date: 2022-04-19
TONGJI UNIV
View PDF0 Cites 1 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] 1. Hardly hand-coded reward functions, by introducing a large amount of relevant domain knowledge to construct reward functions, to guide the agent to explore and learn according to the ideal trajectory, but this method needs to achieve sufficient understanding of domain knowledge, and it is difficult Design an ideal reward function under the environment
[0005] 2. Imitation learning, that is

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Robot path exploration method based on double-agent competitive reinforcement learning
  • Robot path exploration method based on double-agent competitive reinforcement learning
  • Robot path exploration method based on double-agent competitive reinforcement learning

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0040] A kind of method of the present invention, the present invention is a kind of robot path exploration method based on dual-agent competitive reinforcement learning, utilizes two intelligent bodies (in this example, be two identical robots or a robot in repeatable experimental environment) The similarity of the exploration state is used to generate intrinsic rewards, thereby enhancing the agent's exploration, without the need for complex reward function design, avoiding the introduction of domain knowledge, and using the certainty of the agent's explored trajectories to eliminate the influence of random noise.

[0041] The framework diagram of this method is shown in the figure 1 As shown, the verification environment adopts the reinforcement learning test environment MultiRooms trajectory exploration environment, that is, a series of exploration environments with 4 to 6 rooms are randomly generated in the two-dimensional grid plane, as shown in figure 2 As shown, the tr...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a robot path exploration method based on double-agent competitive reinforcement learning, and the method comprises the following steps: S1, constructing a Markov decision model, and initializing agents and an experience pool; s2, recording a current state st of an agent Agent1, exploring k steps, and recording a current track sequence to an experience pool Buffer 1; s3, the intelligent agent Agent2 is placed at the state st, the intelligent agent Agent2 explores k steps, and a current track sequence is recorded to an experience pool Buffer 2; s4, taking the similarity between the exploration trajectories as an additional reward of the agent Agent1, and taking an opposite number as an additional reward of the agent Agent2; s5, updating strategies of the agents Agent1 and Agent2 when the number of data in the experience pool meets the requirement; s6, repeatedly executing the steps S2-S5 until the intelligent agent Agent1 reaches the target state or exceeds the set time tlimit; and S7, repeatedly executing the steps S1-S6 until the set training episode number is completed. Compared with the prior art, the method has the advantages that the intelligent agent can explore more effectively, the training speed is increased, the utilization efficiency of samples is improved, random noise can be effectively eliminated, and the robustness is higher.

Description

technical field [0001] The invention relates to the field of robot trajectory planning, in particular to a robot path exploration method based on dual-agent competitive reinforcement learning. Background technique [0002] Reinforcement learning technology has made remarkable achievements in the field of robot control, but reinforcement learning is based on the reward mechanism, that is, the goal of the agent is to obtain the cumulative maximum reward, and for most of the existing reinforcement learning applications for robot path exploration The scenarios are all sparsely rewarded environments, that is, the agent will only receive positive rewards when it reaches the final goal, and there will be no rewards in other cases, and the agent that does not receive feedback will lack an effective mechanism to update its own strategy. Converge to the ideal policy. [0003] The current solution to sparse rewards is as follows: [0004] 1. Hardly hand-coded reward functions, by int...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06K9/62G06N3/02
CPCG06N3/02G06F18/295G06F18/214
Inventor 刘成菊陈启军张浩
Owner TONGJI UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products