Multi-unmanned aerial vehicle 3D hovering position joint optimization method and device and unmanned aerial vehicle base station

A multi-UAV, joint optimization technology, applied in synchronization devices, vehicle position/route/altitude control, wireless communication, etc., can solve the problems of inability to apply to the actual communication environment, many limiting factors, and large gaps.

Active Publication Date: 2019-12-03
BEIJING UNIV OF POSTS & TELECOMM
View PDF9 Cites 19 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] However, when this existing solution uses the game equilibrium method for optimization, it usually has very strong assumptions about the wireless communication environment, such as the following assumptions: 1) The ground communication terminal is stationary, so the optimized control of the UAV is only for a real network 2) The UAV and the ground communication terminal only establish a one-to-one static link; 3) Assume that the UAV only moves in the vertical direction and remains stationary in the horizontal direction, and the joint estimation of the two-dimensional state takes fixed value; 4) single base station type, etc.
The above assumptions greatly limit the operating mechanism of the UAV system, and cannot be applied to heterogeneous networks of multi-base station types, which is far from the real scene.
[0005] In summary, the UAV hovering position optimization method in the prior art cannot be applied to the actual communication environment due to many restrictive factors

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Multi-unmanned aerial vehicle 3D hovering position joint optimization method and device and unmanned aerial vehicle base station
  • Multi-unmanned aerial vehicle 3D hovering position joint optimization method and device and unmanned aerial vehicle base station
  • Multi-unmanned aerial vehicle 3D hovering position joint optimization method and device and unmanned aerial vehicle base station

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0061] Embodiment 1 of the present invention provides a joint optimization method for multi-UAV 3D hovering positions, such as figure 1 As shown, the method includes the following steps:

[0062] S101. Obtain status information of the heterogeneous network where the drone is located.

[0063] S102, input the state information into the pre-built deep reinforcement learning network, and decide the hovering position at the next moment through the current strategy.

[0064] A policy is a mapping from state to action.

[0065] S103, and obtain the reward function value of the hovering position of the drone at the current moment from the environment.

[0066] S104. Determine the gradient strategy algorithm based on the different strategy depth, and obtain an updated gradient.

[0067] S105, multiple UAVs synchronously update policy parameters.

[0068] S106, according to the gradient obtained in step S104, iteratively execute the steps from obtaining state information to synchro...

Embodiment 2

[0071] Embodiment 2 of the present invention provides another embodiment of a joint optimization method for 3D hovering positions of multiple UAVs.

[0072] The main flow chart of the optimization method provided by Embodiment 2 of the present invention is as follows figure 2 shown. The application scenario of the embodiment of the present invention is a heterogeneous network where a ground macro base station, a ground micro base station, and a UAV base station coexist. The ground communication terminal selects the base station for connection by judging the signal received power (RSRP). When the signal reception power of the adjacent base station meets the switching condition, the terminal switches the connected base station.

[0073] In this embodiment, first obtain the state information of the heterogeneous network environment, input the pre-established deep reinforcement learning network, the network uses the current policy function to determine the hovering position at t...

Embodiment 3

[0086] Embodiment 3 of the present invention provides another preferred embodiment of a joint optimization method for 3D hovering positions of multiple UAVs.

[0087] In the OPDPG algorithm, a different policy learning method is adopted, so the target policy obtained through training and the action policy of exploring the environment are different from each other. The target policy is a deterministic equation, in a given state s i next a i = π(s i ), for the UAV to greedily select the optimal action. However, the greedy algorithm cannot guarantee sufficient exploration and learning of the environment state, so the action strategy β(a|s) is introduced to take actions in the form of a random process for UAVs to explore unknown environments.

[0088] In the embodiment of the present invention, the OPDPG algorithm uses the action-evaluation method. The action-evaluation method combines value function-based and policy gradient-based reinforcement learning methods, inherits the ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a multi-unmanned aerial vehicle 3D hovering position joint optimization method and device and an unmanned aerial vehicle base station. The method comprises the following steps:firstly, acquiring state information of a heterogeneous network where unmanned aerial vehicles are located; inputting the state information to a deep reinforcement learning network constructed in advance, determining hovering positions at next moment through a current strategy, and obtaining reward function values of the hovering positions of the unmanned aerial vehicles at the current moment from the environment; determining a gradient strategy algorithm based on different strategy depths, calculating an updated gradient, and synchronously updating strategy parameters of the plurality of unmanned aerial vehicles; according to the updated gradient, iteratively executing the steps from obtaining state information to synchronously updating the strategy parameters of the plurality of unmanned aerial vehicles, so that an objective strategy function is gradually converged until an optimal strategy is obtained. The device comprises a state acquisition unit, a reward unit, a gradient updating unit and a training unit. The method can be executed by a processor of the unmanned aerial vehicle base station. According to the method, the device and the unmanned aerial vehicle base station in the invention, autonomous learning of the multi-unmanned aerial vehicle in the environment can be realized, and dynamic and non-stable environment changes can be adapted to.

Description

technical field [0001] The invention relates to the technical field of wireless communication, in particular to a joint optimization method and device for multi-UAV 3D hovering positions and a UAV base station. Background technique [0002] The multi-UAV hovering position optimization technology is an indispensable key technology in the UAV communication system, and the wireless communication system is developing into a diversified and heterogeneous form. In a heterogeneous network, macro base stations, small base stations, and UAV base stations exist at the same time. The hovering position of the UAV base station determines the communication rate between the UAV and the ground communication terminal, as well as the interference noise to other base stations in the communication system, and indirectly affects the communication load of the ground base station. [0003] The current method for multi-UAV hovering position optimization is mainly a game equilibrium method. For ex...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G05B13/04G05B13/02G05D1/04H04W64/00H04W56/00
CPCG05B13/027G05B13/042G05B13/048G05D1/042H04W56/001H04W64/00
Inventor 许文俊徐越吴思雷冯志勇张平林家儒
Owner BEIJING UNIV OF POSTS & TELECOMM
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products