Unmanned aerial vehicle trajectory optimization method and device based on deep reinforcement learning and unmanned aerial vehicle

A technology of reinforcement learning and trajectory optimization, applied in the field of drones, can solve problems such as dimension explosion, information loss, and limited flight action plans, and achieve the effect of avoiding losses and improving energy efficiency

Active Publication Date: 2019-11-22
BEIJING UNIV OF POSTS & TELECOMM
View PDF8 Cites 79 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, in reality, states and actions are usually infinite or continuous, and there will be information loss when converted into a finite number, and there will be a potential crisis of dimension explosion
[0005] It can be seen that some technical solutions

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Unmanned aerial vehicle trajectory optimization method and device based on deep reinforcement learning and unmanned aerial vehicle
  • Unmanned aerial vehicle trajectory optimization method and device based on deep reinforcement learning and unmanned aerial vehicle
  • Unmanned aerial vehicle trajectory optimization method and device based on deep reinforcement learning and unmanned aerial vehicle

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0053] Deep reinforcement learning technology is a machine learning technique that combines reinforcement learning and deep neural networks. Specifically, reinforcement learning individuals collect reward information for taking different actions in different environmental states by interacting with the environment, and inductively learn the optimal behavior strategy based on the collected data, so as to obtain the ability to adapt to the unknown dynamic environment . The deep neural network can significantly improve the generalization ability of the algorithm in high-dimensional state space and high-dimensional action space, so as to obtain the ability to adapt to more complex environments.

[0054] Embodiment 1 of the present invention provides a UAV trajectory optimization method based on deep reinforcement learning, such as figure 1 As shown, the method includes the following steps:

[0055] S101, pre-constructing a deep reinforcement learning network based on a PPO algor...

Embodiment 2

[0065] Embodiment 2 of the present invention provides another embodiment of a UAV trajectory optimization method based on deep reinforcement learning.

[0066] In Embodiment 2 of the present invention, the PPO algorithm adopts the deep reinforcement learning structure of the actor-critic (Actor-Critic) framework, and is composed of two networks: the action network and the evaluation network: the action network uses the PPO algorithm and the deep neural network to fit the strategy function, decision-making action; the evaluation network uses the deep neural network to fit the state-value function and optimize the policy parameters. The overall structure and related data interaction of the optimization method provided by Embodiment 2 of the present invention are as follows: figure 2 shown.

[0067] In this embodiment, the UAV communication scenario used is that a single UAV base station provides services for multiple fixed IoT devices, and the IoT devices are randomly activate...

Embodiment 3

[0075] Embodiment 3 of the present invention provides a preferred embodiment of the UAV trajectory optimization method based on deep reinforcement learning. Through this embodiment, the UAV communication modeling method used in the present invention and the UAV energy-efficient trajectory based on deep reinforcement learning The optimization method is described in further detail.

[0076] The UAV communication model established in this embodiment considers a scenario where a UAV provides delay-tolerant services for N terrestrial Internet of Things devices. The Internet of Things devices are randomly distributed and their positions are fixed, and data is collected and transmitted periodically or randomly. to drones. The goal is to optimize the flight trajectory of the UAV to maximize the cumulative energy efficiency under energy-limited conditions. In order to achieve this goal, the UAV should be able to detect its own remaining energy and determine the optimal return charging...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses an unmanned aerial vehicle trajectory optimization method and device based on deep reinforcement learning and an unmanned aerial vehicle. The method comprises the steps: constructing a reinforcement learning network in advance, and generating state data and action decision data in real time in a flight process of an unmanned aerial vehicle; and taking the state data as input, the action decision data as output and the instantaneous energy efficiency as reward return, optimizing strategy parameters by utilizing a PPO algorithm, and outputting an optimal strategy. The device comprises a construction module, a training data collection module and a training module. The unmanned aerial vehicle comprises a processor, and the processor is used for executing the unmanned aerial vehicle trajectory optimization method based on deep reinforcement learning. The method has the capability of carrying out autonomous learning from accumulated flight data, can intelligently determine the optimal flight speed, acceleration, flight direction and return time in an unknown communication scene, concludes a flight strategy with the optimal energy efficiency, and is higher in environment adaptability and generalization capability.

Description

technical field [0001] The invention relates to the technical field of wireless communication, in particular to a method and device for optimizing trajectory of an unmanned aerial vehicle based on deep reinforcement learning, and an unmanned aerial vehicle. Background technique [0002] UAV communication technology is considered to be an indispensable part of the fifth generation (5G) and subsequent evolution (5G+) mobile communication networks. However, the UAV communication system has a unique air-to-ground channel model, highly dynamic three-dimensional flight capability and limited flight energy, which makes the UAV communication system more complex than the traditional communication system. [0003] At present, the methods used for UAV trajectory optimization mainly include traditional convex optimization algorithm and reinforcement learning algorithm. For example, there is a Chinese patent application with the application number "201811144956.3", which discloses a met...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G05D1/10
CPCG05D1/101
Inventor 许文俊徐越吴思雷张治张平林家儒
Owner BEIJING UNIV OF POSTS & TELECOMM
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products