Mobile robot path planning algorithm based on single-chain sequential backtracking Q-learning

A mobile robot and path planning technology, which is applied in the direction of two-dimensional position/channel control, etc., can solve the problems of long learning time and slow convergence speed, and achieve the effect of short learning time, high learning efficiency and improved learning efficiency

Inactive Publication Date: 2012-11-28
SHANDONG UNIV
View PDF3 Cites 77 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0010] Aiming at the shortcomings of the existing reinforcement learning algorithm in the path planning application of mobile robots in unknown environments, such as long learning time and slow convergence speed,

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Mobile robot path planning algorithm based on single-chain sequential backtracking Q-learning
  • Mobile robot path planning algorithm based on single-chain sequential backtracking Q-learning
  • Mobile robot path planning algorithm based on single-chain sequential backtracking Q-learning

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0043] 1. Q-learning algorithm

[0044] The Q-learning algorithm is an iterative algorithm that assigns a corresponding Q value to each state-action pair. The Q value is defined as the sum of reinforcement learning discount rewards. If an action strategy changes the state of the environment, it will obtain a strengthening signal. According to Strengthen the signal, iteratively update the Q value, the Q value corresponding to the correct action will continue to increase, and the Q value corresponding to the wrong action will continue to decrease, until the Q value of each state-action pair stabilizes and converges, the optimal path from the starting point to the target point is determined up. The iterative process is as follows:

[0045]

[0046] where s 0 Indicates the initial state (starting position) of the robot, s 1 Indicates the state of the robot (the location in the environment) at t=1, ..., s n Indicates the state of the robot (the location in the environment) ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a mobile robot path planning algorithm based on single-chain sequential backtracking Q-learning. According to the mobile robot path planning algorithm based on the single-chain sequential backtracking Q-learning, a two-dimensional environment is expressed by using a grid method, each environment area block corresponds to a discrete location, the state of a mobile robot at some moment is expressed by an environment location where the robot is located, the search of each step of the mobile robot is based on a Q-learning iterative formula of a non-deterministic Markov decision process, progressively sequential backtracking is carried out from the Q value of the tail end of a single chain, namely the current state, to the Q value of the head end of the single chain until a target state is reached, the mobile robot cyclically and repeatedly finds out paths to the target state from an original state, the search of each step is carried out according to the steps, and Q values of states are continuously iterated and optimized until the Q values are converged. The mobile robot path planning algorithm based on the single-chain sequential backtracking Q-learning has the advantages that the number of steps required for optimal path searching is far less than that of a classic Q-learning algorithm and a Q(lambda) algorithm, the learning time is shorter, and the learning efficiency is higher; and particularly for large environments, the mobile robot path planning algorithm based on the single-chain sequential backtracking Q-learning has more obvious advantages.

Description

technical field [0001] The invention relates to a method for path planning of a mobile robot by using an improved reinforcement learning algorithm, and belongs to the technical field of artificial intelligence. Background technique [0002] The path planning problem is one of the key technologies in the research of mobile robots. The path planning algorithm means that the mobile robot searches for an optimal or suboptimal collision-free path from the starting position to the target position according to a certain performance index. [0003] According to whether the environment information is completely known, path planning can be divided into offline global path planning with fully known environment information and online local path planning with completely or partially unknown environment information, also known as static path planning and dynamic path planning. At present, the research on global path planning in a certain environment is relatively mature, and the problem o...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G05D1/02
Inventor 马昕孙国强许亚宋锐荣学文李贻斌
Owner SHANDONG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products