Path planning decision optimization method based on least square truncation time domain difference learning

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A time-domain difference and least-squares technology, applied in two-dimensional position/course control, vehicle position/route/altitude control, instruments, etc., can solve problems such as difficult selection of learning rate, reduced control efficiency, and slow convergence speed , to achieve the effect of improving the learning speed and accuracy of the evaluator, improving efficiency and accuracy, and avoiding internal conflicts

Pending Publication Date: 2022-05-20

NAT UNIV OF DEFENSE TECH

View PDF0 Cites 1 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0004] 1. If the policy evaluation function uses a nonlinear neural network as the approximator and uses the stochastic gradient descent method for optimization training, the convergence of the evaluator is difficult to guarantee and the convergence speed is slow, which will affect the policy learning effect; and the linear approximator structure based Although the evaluator has the theoretical guarantee of convergence, its approximation ability depends on the learning effect of the feature representation

[0005] 2. If the approximation method of the policy evaluator adopts the classic linear time-domain difference method or the least squares time-domain difference method for learning, the linear time-domain difference method has defects such as low sample utilization and difficulty in selecting a learning rate. Although the multiplication time domain difference method does not need to select the learning rate and improves the sample utilization rate, it has the disadvantage of poor asymptotic optimality, and in the actual use process, it is prone to the problem of ill-conditioned solution due to the dissatisfaction of the matrix.

[0006] 3. Robot path planning is a continuous task. If the reinforcement learning method based on the linear time domain difference method is used as the strategy evaluation method, due to the low sample utilization rate, it will lead to problems such as low control efficiency and poor control accuracy. However, if the least squares method is used The time-domain difference method is used as a strategy evaluation, due to problems such as poor asymptotic optimality and ill-conditioned solution, it will still lead to problems such as reduced control efficiency and accuracy as the planning time increases

[0007] 4. The deep reinforcement learning method uses the convolutional neural network coupling to learn the feature representation and value function or policy function of high-dimensional observations, which will also make the learning process sample less efficient

Both the linear time domain difference method and the least squares time domain difference method need to rely on the traditional kernel-based feature representation. This kind of feature representation method needs to rely on more manual feature construction, which is difficult to meet the requirements of deep reinforcement learning to solve high-dimensional and complex problems. Strategy evaluation and sequential decision-making requirements under observation conditions will further increase control complexity and reduce control efficiency when applied to robot path planning

[0008] In summary, the existing deep reinforcement learning has problems such as difficulty in feature representation learning and low learning efficiency of the policy evaluator, which makes the direct use of deep reinforcement learning technology in robot path planning decision-making optimization lead to low learning efficiency of the controller

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0065] The present invention will be further described below in conjunction with the accompanying drawings and specific preferred embodiments, but the protection scope of the present invention is not limited thereby.

[0066] Such as figure 1 As shown, the steps of the path planning decision optimization method based on least squares truncated time domain difference learning in this embodiment include:

[0067] S1. Basis function learning: Use the first strategy to collect the states, actions and rewards of the agent in the process of interacting with the environment. The states, actions and rewards obtained from the above collection constitute a sample data set. The first strategy is the initial strategy or the allowable strategy; according to The sample data set adopts the pre-training method to learn the basis function of feature representation;

[0068] S2. Evaluator learning: The evaluator uses the second strategy generated by the executor to collect the sample data of t...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a path planning decision optimization method based on least square truncation time domain difference learning, and the method comprises the steps: S1, collecting an intelligent agent and environment interaction sample through a first strategy, and learning a primary function represented by features; s2, the evaluator collects an agent and environment interaction sample by using a second strategy generated by the actuator, obtains a sample feature by using a primary function, and uses a projection mean square Bellman error as a truncation index to control and adopt a least square time domain difference or a linear time domain difference to perform parameter updating so as to obtain an approximately optimal strategy evaluator; s3, collecting a sample interacting with the environment by using the strategy generated by the executor, obtaining sample characteristics by using a primary function, and obtaining control strategy output by using the evaluator in the step S2 as an evaluation function of the strategy executor; and S4, controlling the intelligent agent to carry out path planning according to the obtained control strategy. The method has the advantages of simple implementation method, high planning decision efficiency, high accuracy and the like.

Description

technical field [0001] The invention relates to the technical field of robot intelligent control, in particular to a path planning decision optimization method based on least squares truncated time domain difference learning. Background technique [0002] Path planning is to find a collision-free safe path from the start point to the end point for the robot within the specified range. For the path planning of robots, traditional methods represented by graph search-based algorithms, artificial potential field methods, and random sampling methods are usually used at present. However, the above-mentioned technical methods can only complete path planning decision-making optimization tasks in specific environments. , lacks learning ability and generalization performance, and also relies on prior information related to robot dynamics. Another type of data-driven method with deep reinforcement learning method as the representative algorithm can achieve autonomous learning to solve...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G05D1/02

CPCG05D1/0259G05D1/0276

Inventor 方强兰奕星徐昕任君凯张一川周星

Owner NAT UNIV OF DEFENSE TECH

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Path planning decision optimization method based on least square truncation time domain difference learning

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology