The invention discloses a reinforcement learning algorithm, including a new Q learning algorithm. The new Q learning algorithm includes the implementation steps of: inputting collected data to a BP neural network, and calculating input and output of each unit of a hidden layer and an output layer in the state; calculating a maximum output value m in a t state, based on the output, judging whether a collision with a barrier occurs, if a collision occurs, recording each unit threshold value and each connection weight of the BP neural network, and otherwise calculating T+1 moment, collecting data and performing normalization, calculating input and output of each unit of the hidden layer and the output layer in the t+1 state, calculating an expected output value of a t state, adjusting output and the threshold value of each unit of the hidden layer, judging whether an error is smaller than a given threshold value or the number of times of learning is larger than a given value, if the condition is not satisfied, performing learning again, and otherwise recording the threshold value of each unit and each connection weight, finishing learning. The reinforcement learning algorithm provided by the invention has good real-time performance and good rapidity, and allows relearning in a later period.