A Navigation Path Planning Method Based on Policy Reuse and Reinforcement Learning

A navigation path and reinforcement learning technology, applied in navigation, mapping and navigation, navigation computing tools, etc., can solve the problem of insufficient source strategy reuse, and achieve the effect of rapid planning of navigation paths, accurate navigation tasks, and avoidance of negative transfer.

Active Publication Date: 2020-09-25
DONGGUAN UNIV OF TECH
View PDF7 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0007] The purpose of the present invention is to solve the problem of insufficient reuse of source policies in existing methods

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A Navigation Path Planning Method Based on Policy Reuse and Reinforcement Learning
  • A Navigation Path Planning Method Based on Policy Reuse and Reinforcement Learning
  • A Navigation Path Planning Method Based on Policy Reuse and Reinforcement Learning

Examples

Experimental program
Comparison scheme
Effect test

specific Embodiment approach 1

[0031] Specific implementation mode one: a navigation path planning method based on policy reuse and reinforcement learning described in this implementation mode, the method includes the following steps:

[0032] Step 1. Select the strategy library corresponding to the current road network map, and calculate the important status of the source strategy that does not contain the key map position in the strategy library;

[0033] Step 2: Set the maximum number of training cycles to K (K can be set larger, and if the actual self-learning condition is reached, it will automatically jump out and no longer use the strategy), use the confidence to select the reuse strategy from the source strategy of the strategy library, and Reuse its own strategy or selected reuse strategy;

[0034] Step 3. The new policy obtained by policy reuse is updated through reinforcement learning to obtain an updated new policy;

[0035] Step 4: Determine whether to add the updated strategy to the strategy ...

specific Embodiment approach 2

[0036] Specific implementation mode two: the difference between this implementation mode and specific implementation mode one is: the specific process of the step one is:

[0037] Select the strategy library corresponding to the current road network map. For the source strategy that does not contain the key map position (important state) in the strategy library, you need to calculate the important state of the source strategy that does not include the key map position;

[0038] For any source policy that needs to calculate important states, initialize the floating threshold θ=0, and then enter M' (M'≥8, smaller) policy execution cycles, and select the road network map in the first step of each policy execution cycle An edge position of as the initial state s 0 (The first eight can take the edge positions of the eight directions of the road network map as the initial state), for the tth step of each strategy execution cycle, the current state of the vehicle navigation system is...

specific Embodiment approach 3

[0045] Specific implementation mode three: the difference between this implementation mode and specific implementation mode two is: the specific process of said step two is:

[0046] The selection of the source strategy considers the state value function of each strategy in the strategy library, and excludes some strategies in the strategy library from the alternative strategies based on the change of confidence in real time:

[0047] Step 21. In the first training cycle, each source strategy π k The initial confidence p k are set to 0.5; for each subsequent training cycle, each source strategy π k The confidence level of will be determined by whether the vehicle navigation system reached the target position s in the previous training cycle G and by source policy π k important state of To determine, as shown in formula (3):

[0048]

[0049] Among them: I k represent the conditions of judgment;

[0050] Let τ′ be a trajectory containing all the states passed in the ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The present invention provides a navigation path planning method based on strategy reuse and reinforcement learning, and belongs to the technical field of navigation path planning, which solves the problem of insufficient reuse of the source strategy in the existing method. According to the method provided by the present invention, a function representing state importance is introduced to assist strategy selection, strategy reuse and strategy library reconstruction, so that the purpose of rapidly planning a navigation path in a road network map is achieved; compared with the existing traditional path planning method, a reinforcement learning algorithm based on strategy reuse is used in the algorithm ARES-TL, the complete strategy library is updated in real time, the algorithm time is savedby occupying some space storage strategy library, and the reinforcement learning algorithm can cope with the online micro-updated map; compared with the same type of strategy reuse method, by using the algorithm ARES-TL of the present invention, the negative migration caused by the reuse of the irrelevant source strategy is avoided with respect to PRQL and OPS-TL, and exploration efficiency is improved and navigation tasks are accurately completed; and the method provided by the present invention can be applied to the technical field of navigation path planning.

Description

technical field [0001] The invention belongs to the technical field of navigation path planning, and in particular relates to a navigation path planning method. Background technique [0002] Navigation path planning is an important part of the navigation system, and its application is reflected in the fields of automatic driving and logistics transportation. The purpose of navigation path planning is to calculate a shortest path between the starting location and the target location under a given road network map. In practical applications, the navigation path planning algorithm can generate a corresponding navigation strategy for a given road network and target location, and the navigation strategy can give the direction of travel at the real-time location through existing knowledge. Existing navigation systems generally implement path planning through deterministic dynamic programming methods, and common methods include Dijkstra algorithm, Floyd algorithm, and A* algorithm...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G01C21/34G01C21/20
CPCG01C21/20G01C21/3407
Inventor 郝建业王汉超侯韩旭
Owner DONGGUAN UNIV OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products