Unmanned aerial vehicle trajectory optimization method and device based on deep reinforcement learning and unmanned aerial vehicle

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A technology of reinforcement learning and trajectory optimization, applied in the field of drones, can solve problems such as dimension explosion, information loss, and limited flight action plans, and achieve the effect of avoiding losses and improving energy efficiency

Active Publication Date: 2019-11-22

BEIJING UNIV OF POSTS & TELECOMM

View PDF8 Cites 79 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

However, in reality, states and actions are usually infinite or continuous, and there will be information loss when converted into a finite number, and there will be a potential crisis of dimension explosion

[0005] It can be seen that some technical solutions for optimizing the flight trajectory of UAVs in the prior art have relatively limited flight scenarios and flight action plans, and it is difficult to cope with the dynamically changing environmental information during the flight of UAVs. actual flight needs

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment 1

[0053] Deep reinforcement learning technology is a machine learning technique that combines reinforcement learning and deep neural networks. Specifically, reinforcement learning individuals collect reward information for taking different actions in different environmental states by interacting with the environment, and inductively learn the optimal behavior strategy based on the collected data, so as to obtain the ability to adapt to the unknown dynamic environment . The deep neural network can significantly improve the generalization ability of the algorithm in high-dimensional state space and high-dimensional action space, so as to obtain the ability to adapt to more complex environments.

[0054] Embodiment 1 of the present invention provides a UAV trajectory optimization method based on deep reinforcement learning, such as figure 1 As shown, the method includes the following steps:

[0055] S101, pre-constructing a deep reinforcement learning network based on a PPO algor...

Embodiment 2

[0065] Embodiment 2 of the present invention provides another embodiment of a UAV trajectory optimization method based on deep reinforcement learning.

[0066] In Embodiment 2 of the present invention, the PPO algorithm adopts the deep reinforcement learning structure of the actor-critic (Actor-Critic) framework, and is composed of two networks: the action network and the evaluation network: the action network uses the PPO algorithm and the deep neural network to fit the strategy function, decision-making action; the evaluation network uses the deep neural network to fit the state-value function and optimize the policy parameters. The overall structure and related data interaction of the optimization method provided by Embodiment 2 of the present invention are as follows: figure 2 shown.

[0067] In this embodiment, the UAV communication scenario used is that a single UAV base station provides services for multiple fixed IoT devices, and the IoT devices are randomly activate...

Embodiment 3

[0075] Embodiment 3 of the present invention provides a preferred embodiment of the UAV trajectory optimization method based on deep reinforcement learning. Through this embodiment, the UAV communication modeling method used in the present invention and the UAV energy-efficient trajectory based on deep reinforcement learning The optimization method is described in further detail.

[0076] The UAV communication model established in this embodiment considers a scenario where a UAV provides delay-tolerant services for N terrestrial Internet of Things devices. The Internet of Things devices are randomly distributed and their positions are fixed, and data is collected and transmitted periodically or randomly. to drones. The goal is to optimize the flight trajectory of the UAV to maximize the cumulative energy efficiency under energy-limited conditions. In order to achieve this goal, the UAV should be able to detect its own remaining energy and determine the optimal return charging...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses an unmanned aerial vehicle trajectory optimization method and device based on deep reinforcement learning and an unmanned aerial vehicle. The method comprises the steps: constructing a reinforcement learning network in advance, and generating state data and action decision data in real time in a flight process of an unmanned aerial vehicle; and taking the state data as input, the action decision data as output and the instantaneous energy efficiency as reward return, optimizing strategy parameters by utilizing a PPO algorithm, and outputting an optimal strategy. The device comprises a construction module, a training data collection module and a training module. The unmanned aerial vehicle comprises a processor, and the processor is used for executing the unmanned aerial vehicle trajectory optimization method based on deep reinforcement learning. The method has the capability of carrying out autonomous learning from accumulated flight data, can intelligently determine the optimal flight speed, acceleration, flight direction and return time in an unknown communication scene, concludes a flight strategy with the optimal energy efficiency, and is higher in environment adaptability and generalization capability.

Description

technical field [0001] The invention relates to the technical field of wireless communication, in particular to a method and device for optimizing trajectory of an unmanned aerial vehicle based on deep reinforcement learning, and an unmanned aerial vehicle. Background technique [0002] UAV communication technology is considered to be an indispensable part of the fifth generation (5G) and subsequent evolution (5G+) mobile communication networks. However, the UAV communication system has a unique air-to-ground channel model, highly dynamic three-dimensional flight capability and limited flight energy, which makes the UAV communication system more complex than the traditional communication system. [0003] At present, the methods used for UAV trajectory optimization mainly include traditional convex optimization algorithm and reinforcement learning algorithm. For example, there is a Chinese patent application with the application number "201811144956.3", which discloses a met...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G05D1/10

CPCG05D1/101

Inventor 许文俊徐越吴思雷张治张平林家儒

Owner BEIJING UNIV OF POSTS & TELECOMM

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Unmanned aerial vehicle trajectory optimization method and device based on deep reinforcement learning and unmanned aerial vehicle

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment 1

Embodiment 2

Embodiment 3

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology