The invention relates to a
robot path exploration method based on double-agent competitive
reinforcement learning, and the method comprises the following steps: S1, constructing a Markov
decision model, and initializing agents and an experience
pool; s2, recording a current state st of an agent Agent1, exploring k steps, and recording a current track sequence to an experience
pool Buffer 1; s3, the
intelligent agent Agent2 is placed at the state st, the
intelligent agent Agent2 explores k steps, and a current track sequence is recorded to an experience
pool Buffer 2; s4, taking the similarity between the exploration trajectories as an additional reward of the agent Agent1, and taking an opposite number as an additional reward of the agent Agent2; s5, updating strategies of the agents Agent1 and Agent2 when the number of data in the experience pool meets the requirement; s6, repeatedly executing the steps S2-S5 until the
intelligent agent Agent1 reaches the target state or exceeds the set time tlimit; and S7, repeatedly executing the steps S1-S6 until the set training episode number is completed. Compared with the prior art, the method has the advantages that the intelligent agent can explore more effectively, the training speed is increased, the utilization efficiency of samples is improved,
random noise can be effectively eliminated, and the robustness is higher.