A Navigation Path Planning Method Based on Policy Reuse and Reinforcement Learning

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A navigation path and reinforcement learning technology, applied in navigation, mapping and navigation, navigation computing tools, etc., can solve the problem of insufficient source strategy reuse, and achieve the effect of rapid planning of navigation paths, accurate navigation tasks, and avoidance of negative transfer.

Active Publication Date: 2020-09-25

DONGGUAN UNIV OF TECH

View PDF7 Cites 0 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0007] The purpose of the present invention is to solve the problem of insufficient reuse of source policies in existing methods

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

specific Embodiment approach 1

[0031] Specific implementation mode one: a navigation path planning method based on policy reuse and reinforcement learning described in this implementation mode, the method includes the following steps:

[0032] Step 1. Select the strategy library corresponding to the current road network map, and calculate the important status of the source strategy that does not contain the key map position in the strategy library;

[0033] Step 2: Set the maximum number of training cycles to K (K can be set larger, and if the actual self-learning condition is reached, it will automatically jump out and no longer use the strategy), use the confidence to select the reuse strategy from the source strategy of the strategy library, and Reuse its own strategy or selected reuse strategy;

[0034] Step 3. The new policy obtained by policy reuse is updated through reinforcement learning to obtain an updated new policy;

[0035] Step 4: Determine whether to add the updated strategy to the strategy ...

specific Embodiment approach 2

[0036] Specific implementation mode two: the difference between this implementation mode and specific implementation mode one is: the specific process of the step one is:

[0037] Select the strategy library corresponding to the current road network map. For the source strategy that does not contain the key map position (important state) in the strategy library, you need to calculate the important state of the source strategy that does not include the key map position;

[0038] For any source policy that needs to calculate important states, initialize the floating threshold θ=0, and then enter M' (M'≥8, smaller) policy execution cycles, and select the road network map in the first step of each policy execution cycle An edge position of as the initial state s 0 (The first eight can take the edge positions of the eight directions of the road network map as the initial state), for the tth step of each strategy execution cycle, the current state of the vehicle navigation system is...

specific Embodiment approach 3

[0045] Specific implementation mode three: the difference between this implementation mode and specific implementation mode two is: the specific process of said step two is:

[0046] The selection of the source strategy considers the state value function of each strategy in the strategy library, and excludes some strategies in the strategy library from the alternative strategies based on the change of confidence in real time:

[0047] Step 21. In the first training cycle, each source strategy π k The initial confidence p k are set to 0.5; for each subsequent training cycle, each source strategy π k The confidence level of will be determined by whether the vehicle navigation system reached the target position s in the previous training cycle G and by source policy π k important state of To determine, as shown in formula (3):

[0048]

[0049] Among them: I k represent the conditions of judgment;

[0050] Let τ′ be a trajectory containing all the states passed in the ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The present invention provides a navigation path planning method based on strategy reuse and reinforcement learning, and belongs to the technical field of navigation path planning, which solves the problem of insufficient reuse of the source strategy in the existing method. According to the method provided by the present invention, a function representing state importance is introduced to assist strategy selection, strategy reuse and strategy library reconstruction, so that the purpose of rapidly planning a navigation path in a road network map is achieved; compared with the existing traditional path planning method, a reinforcement learning algorithm based on strategy reuse is used in the algorithm ARES-TL, the complete strategy library is updated in real time, the algorithm time is savedby occupying some space storage strategy library, and the reinforcement learning algorithm can cope with the online micro-updated map; compared with the same type of strategy reuse method, by using the algorithm ARES-TL of the present invention, the negative migration caused by the reuse of the irrelevant source strategy is avoided with respect to PRQL and OPS-TL, and exploration efficiency is improved and navigation tasks are accurately completed; and the method provided by the present invention can be applied to the technical field of navigation path planning.

Description

technical field [0001] The invention belongs to the technical field of navigation path planning, and in particular relates to a navigation path planning method. Background technique [0002] Navigation path planning is an important part of the navigation system, and its application is reflected in the fields of automatic driving and logistics transportation. The purpose of navigation path planning is to calculate a shortest path between the starting location and the target location under a given road network map. In practical applications, the navigation path planning algorithm can generate a corresponding navigation strategy for a given road network and target location, and the navigation strategy can give the direction of travel at the real-time location through existing knowledge. Existing navigation systems generally implement path planning through deterministic dynamic programming methods, and common methods include Dijkstra algorithm, Floyd algorithm, and A* algorithm...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Patents(China)

IPC IPC(8): G01C21/34G01C21/20

CPCG01C21/20G01C21/3407

Inventor 郝建业王汉超侯韩旭

Owner DONGGUAN UNIV OF TECH

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

A Navigation Path Planning Method Based on Policy Reuse and Reinforcement Learning

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

specific Embodiment approach 1

specific Embodiment approach 2

specific Embodiment approach 3

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology