The invention relates to a robot path exploration method based on double-agent competitive reinforcement learning, and the method comprises the following steps: S1, constructing a Markov decision model, and initializing agents and an experience pool; s2, recording a current state st of an agent Agent1, exploring k steps, and recording a current track sequence to an experience pool Buffer 1; s3, the intelligent agent Agent2 is placed at the state st, the intelligent agent Agent2 explores k steps, and a current track sequence is recorded to an experience pool Buffer 2; s4, taking the similarity between the exploration trajectories as an additional reward of the agent Agent1, and taking an opposite number as an additional reward of the agent Agent2; s5, updating strategies of the agents Agent1 and Agent2 when the number of data in the experience pool meets the requirement; s6, repeatedly executing the steps S2-S5 until the intelligent agent Agent1 reaches the target state or exceeds the set time tlimit; and S7, repeatedly executing the steps S1-S6 until the set training episode number is completed. Compared with the prior art, the method has the advantages that the intelligent agent can explore more effectively, the training speed is increased, the utilization efficiency of samples is improved, random noise can be effectively eliminated, and the robustness is higher.