An unmanned aerial vehicle based internet of things information age and power optimization method
By constructing an IoT system model and using deep reinforcement learning algorithms and directional antennas to optimize the flight trajectory and scheduling of drones, the comprehensive optimization problem of data collection and energy transmission of drones in IoT communication was solved, minimizing information age and energy consumption and improving system performance.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- SOUTH CHINA NORMAL UNIV
- Filing Date
- 2023-11-24
- Publication Date
- 2026-06-19
AI Technical Summary
Existing research has failed to fully leverage the dual advantages of drones in data collection and energy transmission in the field of IoT communication, and lacks comprehensive optimization methods for drone-assisted IoT devices.
An IoT system model is constructed, and a deep reinforcement learning algorithm is used to optimize the flight trajectory and scheduling of UAVs. Combined with directional antennas and K-means clustering algorithm, the energy and data transmission strategies of UAVs are optimized. The optimization problem is solved by deep Q-network (DQN).
It minimizes the information age and energy consumption of drones in IoT communication systems, improves data collection efficiency and energy transmission efficiency, and is superior to traditional GA and RW algorithms.
Smart Images

Figure CN117615386B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of unmanned aerial vehicle (UAV) communication technology, and in particular to a method for optimizing the age and power of IoT information based on UAVs. Background Technology
[0002] In recent years, with the rapid development of the Internet of Things (IoT) and 5G technologies, drones have played a crucial role. In particular, due to their inherent attributes such as mobility, flexibility, and adaptive altitude, drones have several key potential applications in wireless communication systems. On the one hand, drones can be used as aerial base stations to enhance the coverage, capacity, reliability, and energy efficiency of wireless networks. On the other hand, drones can be operated as flying mobile terminals within cellular networks.
[0003] With the development and advancement of drone and sensor technologies, people have placed higher demands and expectations on research in the field of drone-assisted Internet of Things (IoT). However, most existing research focuses only on one aspect of drone-assisted IoT device data collection or energy transmission, failing to fully leverage the role and advantages of drones in the field of IoT communication. Summary of the Invention
[0004] The purpose of this invention is to solve the problems mentioned in the background art. To achieve the purpose of this invention, the following technical solution is adopted:
[0005] A method for optimizing the age and power of IoT information based on drones includes the following steps:
[0006] An Internet of Things (IoT) system model is constructed, which includes a rotary-wing UAV, a base station, and k IoT devices. In the system model, the UAV starts from the starting point and flies to the base station. During the flight, the UAV transmits energy to the IoT devices and collects data from the IoT devices. When it flies to the base station, it unloads the data to the base station.
[0007] Determine the channel model, energy model, and information age model;
[0008] Construct an optimization problem that minimizes the weighted sum of the average information age and average energy consumption of all IoT devices;
[0009] Deep reinforcement learning algorithms are used to solve optimization problems.
[0010] A further improvement lies in the fact that, in the system model, each IoT device is randomly distributed in a grid world, and the IoT device is given coordinates C. k =(X k ,Y kThe IoT device is served by a rotary-wing drone. The drone starts from the starting point U0 = (0,0,H), collects data from the IoT device during flight and transfers energy to the IoT device. The drone eventually stops at the ending point U0. f =(X f ,Y f The data collected during flight is unloaded to the base station, where H is the constant altitude of the UAV, and the UAV's position is represented as (X, H). t ,Y t In the grid world, H), the UAV moves in four directions: east, west, north, and south, or hovers to maintain its position. The UAV's flight process is discretely divided into [τ, 2τ, ...], where τ is the time required for the UAV to move from one grid center to an adjacent grid center. The time τ of a time slot is determined by the ratio between the distance d0 between two adjacent grid centers and the UAV's flight speed V.
[0011] A further improvement is that the channel model is as follows:
[0012] Assume that IoT devices d(t)∈D={1,2,…,k}, where d(t)=(1,2,…) represents IoT devices k1,k2,… sending information to a drone in the t-th time slot. Assume that there is only LOS communication between the IoT devices and the drone, and the channel gain between the drone and the k-th IoT device in the t-th time slot is:
[0013]
[0014] In equation (1), g0 refers to the channel gain when the reference distance is 1m, H is the altitude of the UAV, and d t,k It is the distance between the drone and the kth IoT device.
[0015] A further improvement is that the energy model is as follows:
[0016] The power consumption of a drone while it is moving or hovering is expressed as follows:
[0017]
[0018] In equation (2), P0 and P1 are the blade profile power and derived power of the UAV when hovering, respectively. t S represents the speed of the drone. tip S0 represents the tip velocity of the blade, S0 represents the average rotor induced velocity when hovering, d0 is the fuselage drag coefficient, ρ is the air density, μ0 is the rotor solidity, and Z is the rotor disk area.
[0019] The drone is equipped with a directional antenna with adjustable beamwidth, and its azimuth and elevation half-power beamwidths are... The direction is equal to Θ radians, where θ and These represent the azimuth and elevation angles, respectively. The antenna is located at... The gain model in the direction is as follows:
[0020]
[0021] In equation (3), G0 is a fixed value of 2.2846, and the feasible range of the half-beamwidth is...
[0022] The energy harvested by the kth IoT device in the tth time slot is:
[0023]
[0024] In equation (4), σ represents the energy harvesting efficiency of the IoT device, and P is the transmission power of the UAV;
[0025] When d(t) = k, the corresponding IoT device needs to upload data to the drone in the corresponding time slot. All IoT devices collect data packets of the same size, M. The formula for calculating the transmission power of the kth IoT device is as follows:
[0026]
[0027] In equation (5), B is the signal bandwidth, σ 2 This is the noise power, M is the data packet size, and d is the noise power. t,k It is the horizontal distance between the drone and the kth IoT device.
[0028] A further improvement is that the information age model is as follows:
[0029] IoT devices collect data at fixed time intervals, forming data packets. When no drone requests data, these packets are stored in the IoT device's cache. When the cache is full, following a first-in, first-out (FIFO) principle, the packet at the head of the queue is removed, the newest packet is placed at the tail, and other packets are moved forward one position. All IoT devices update data at the same time interval. The cache size is the size of N data packets, and the data packet size is M. (U...) k (t) is used to track the lifetime of the latest data packets in IoT devices:
[0030]
[0031] In equation (6), T is the update interval of the IoT device;
[0032] The information age of the kth IoT device is:
[0033]
[0034] A further improvement is that the optimization problem is formulated as follows:
[0035]
[0036] P(t)>0 (8a)
[0037] 0 <E k (t) <E max (8b)
[0038] In equation (8), d(t) represents the energy transfer and data collection scheduling of the UAV, C(t) represents the flight trajectory scheduling of the UAV, and δ k The weight representing the importance of the k-th IoT device, P(t) is the remaining battery power of the drone, and E k (t) is the remaining power of the kth IoT device; Equation (8a) ensures that the drone has sufficient power to complete the entire flight process, and Equation (8b) ensures that the IoT device has sufficient power to collect data and transmit data to the drone; λ is a variable used to control the trade-off between the average information age and the average energy consumption of the IoT device.
[0039] A further improvement is that the k-means algorithm is used in the system model to cluster IoT devices based on their locations to improve the efficiency of drone data collection.
[0040] A further improvement is that the method of using the k-means algorithm to cluster IoT devices based on their location is as follows: assuming all IoT devices are assigned to L clusters, and each cluster contains n... l There are L IoT devices, and L clusters are represented as l = {1, 2, ..., L}. The drone uses OFDMA to collect information from all IoT devices in the cluster simultaneously. The k-means algorithm is used for clustering, and the drone scheduling strategy is adjusted to d(t) ∈ D = {1, 2, ..., L}, where when d(t) = L, it means that all IoT devices in the Lth cluster must send information to the drone.
[0041] A further improvement is that the method of using deep reinforcement learning algorithms to solve the optimization problem includes:
[0042] The optimization problem is formulated as a Markov decision process, by...<s,a,r,p> This tuple consists of s representing the state space, a representing the action space, r representing the reward function, and p representing the state transition function. In each training set, the UAV observes the current state s(t) and then selects an action a(t) to execute. Once an action is selected, the UAV receives the corresponding reward r(t) and continues to observe the state s(t+1) in the next time slot. The optimal trajectory and scheduling strategy are obtained when training converges, where:
[0043] State space: The system's state consists of the drone's state and the IoT device's state, represented as s(t) = (c(t), P(t)), where c(t) contains the drone's position information and remaining energy, represented as c(t) = (c... u (t),c e P(t) represents the remaining battery power and information age of the IoT device, expressed as P(t) = (E(t)). k (t),A k (t));
[0044] Action Space: The system's actions include two aspects: the UAV's flight trajectory scheduling and the control scheduling of data uplink and energy downlink, denoted as q(t) and s(t) respectively, where:
[0045]
[0046] When s(t) = 0, the drone transmits energy to IoT devices within range; when s(t) = k, all IoT devices in the kth cluster will upload data to the drone.
[0047] Transfer function: The transfer function depends on the transitions between states, c u The transformation function of (t) is given by equation (9), A k The transfer function of s(t) is Equation (6). The energy consumption of the drone is determined by itself and whether it transmits energy to the IoT device. If s(t) = 0, in addition to the energy consumed by the drone during its flight or hovering, it also needs to consume additional energy to charge the IoT device. The remaining power of the IoT device is determined by the energy it receives from the drone and the loss of data it transmits to the drone. When s(t) = 0, the IoT devices in the corresponding range receive energy from the drone; when s(t) = k, all IoT devices in the corresponding cluster need to consume energy to send data to the drone.
[0048] Reward Function: The reward function is defined as minimizing the weighted sum of the average information age and average energy consumption of all IoT devices. The instantaneous reward for the drone in the t-th time slot is defined as:
[0049]
[0050] Here, r1 represents the penalty when the remaining energy of the drone and IoT device is less than 0, and r2 represents the penalty when the drone flies out of the defined grid world.
[0051] A further improvement is that the method for solving the optimization problem using a deep reinforcement learning algorithm is a deep reinforcement learning algorithm based on a deep Q-network, and the steps include:
[0052] (1) Initialize state s0, experience replay buffer, online network and target network;
[0053] (2) Repeat the training until the predetermined number of training iterations is reached;
[0054] (3) For each time slot t, for a given state s t Execute action a t ;
[0055] (4) Obtain feedback r based on the state transition function t and obtain a new state s t+1 ;
[0056] (5) (s) t ,a t ,r t ,s t+1 Stored in the experience replay buffer;
[0057] (6) Sampling from the buffer (s i ,a i ,r i ,s i+1 );
[0058] (7) Update the online network. After updating the online network 0 times, update the target network.
[0059] (8) Terminate training.
[0060] The beneficial effects of this invention are as follows:
[0061] This invention constructs an IoT system model that, by considering UAVs assisting IoT devices in data collection and energy transmission, jointly optimizes the UAV's flight trajectory and scheduling control to minimize the weighted sum of the average information age of information in the communication system and the average energy consumption of IoT devices. It uses a K-means clustering algorithm to perform IoT device clustering, enabling the UAV to receive more information from IoT devices simultaneously. Furthermore, it employs directional antennas to concentrate radio frequency energy in a specific direction, enhancing the energy collected by IoT devices from the UAV. Simulation results demonstrate that the proposed solution outperforms other baseline algorithms, such as the GA and RW algorithms. Attached Figure Description
[0062] Figure 1 This is a flowchart of an IoT information age and power optimization method based on drones according to the present invention;
[0063] Figure 2 System model diagram;
[0064] Figure 3 A model diagram illustrating the power transfer from a drone to IoT devices;
[0065] Figure 4 Information age model diagram;
[0066] Figure 5 This is a diagram showing the interaction between the architecture and environment of DQN;
[0067] Figure 6 The graph shows the change of the reward function of each algorithm with the weight λ, assuming a fixed number of IoT devices.
[0068] Figure 7 The graph shows how the reward function of each algorithm changes with the number of IoT devices, assuming a fixed weight λ.
[0069] Figure 8 With a fixed weight λ, the reward function of the algorithm varies with the number of IoT devices for various scenarios. Detailed Implementation
[0070] The accompanying drawings are for illustrative purposes only and should not be construed as limiting the scope of this patent. The technical solution of the present invention will be further described below in conjunction with the accompanying drawings and embodiments.
[0071] Drones play a crucial role in many emerging Internet of Things (IoT) applications. In this invention, drones serve as both data collectors and energy transmitters within the IoT. Drones can act as mobile relays, transmitting data from IoT devices to base stations; they can also function as energy transmitters, charging low-power ground-based IoT devices via radio frequency technology. This invention addresses an optimization problem that, by considering drones' assistance in data collection and energy transmission for IoT devices, jointly optimizes the drone's flight trajectory and scheduling control to minimize the weighted sum of the average information age (AoI) of the information received by the base station and the average energy consumption of the IoT devices.
[0072] The technical solution of the present invention is described in detail below:
[0073] Please refer to the attached document. Figure 1 - Appendix Figure 6 This invention proposes a method for optimizing the age and power of IoT information based on unmanned aerial vehicles (UAVs), comprising the following steps:
[0074] Step S1: Construct an Internet of Things (IoT) system model. The system model includes a rotary-wing UAV, a base station, and k IoT devices. In the system model, the UAV starts from the starting point and flies to the base station. During the flight, the UAV transmits energy to the IoT devices and collects data from the IoT devices. When it arrives at the base station, it unloads the data to the base station.
[0075] Step S2: Determine the channel model, energy model, and information age model.
[0076] Step S3: Construct an optimization problem that minimizes the weighted sum of the average information age and average energy consumption of all IoT devices.
[0077] Step S4: Solve the optimization problem using a deep reinforcement learning algorithm.
[0078] Specifically, in the system model, each IoT device is randomly distributed in the grid world, and the IoT device is given coordinates C. k =(X k ,Y k The IoT device is served by a rotary-wing drone. The drone starts from the starting point U0 = (0,0,H), collects data from the IoT device during flight and transfers energy to the IoT device. The drone eventually stops at the ending point U0. f =(X f ,Y f The data collected during flight is unloaded to the base station (end point), where H is the constant altitude of the UAV, and the UAV's position is represented as (X). t ,Y tIn the grid world, H), the UAV moves in four directions: east, west, north, and south, or hovers to maintain its position. The UAV's flight process is discretely divided into [τ, 2τ, ...], where τ is the time required for the UAV to move from one grid center to an adjacent grid center. The time τ of a time slot is determined by the ratio between the distance d0 between two adjacent grid centers and the UAV's flight speed V.
[0079] Therefore, the drone's position within each grid can be roughly approximated as a point. In each time slot, the drone can be considered as hovering at the center of the square grid. Simultaneously, to avoid interference from uplink and downlink in the communication system, the drone cannot simultaneously transmit power to IoT devices and collect data from them in the same time slot. The system model is as follows: Figure 2 As shown.
[0080] Specifically, the channel model is as follows:
[0081] Assume that IoT devices d(t)∈D={1,2,…,k}, where d(t)=(1,2,…) represents IoT devices k1,k2,… sending information to a drone in the t-th time slot. Since the drone is at a certain altitude H, it is assumed that there is only LOS communication between the IoT devices and the drone. The channel gain between the drone and the k-th IoT device in the t-th time slot is:
[0082]
[0083] In equation (1), g0 refers to the channel gain when the reference distance is 1m, H is the altitude of the UAV, and d t,k It is the distance between the drone and the kth IoT device.
[0084] Specifically, the energy model is as follows:
[0085] The system's energy model primarily considers the energy consumption of the drone and the remaining power of the IoT devices. The drone's energy consumption mainly includes the energy consumed during flight (or hovering) and the energy consumed in transmitting power to the IoT devices. Therefore, the power consumption of the drone during movement or hovering is expressed as:
[0086]
[0087] In equation (2), P0 and P1 are the blade profile power and derived power of the UAV when hovering, respectively. t S represents the speed of the drone. tip S0 represents the blade tip velocity, S0 represents the average rotor induced velocity during hovering, d0 is the fuselage drag coefficient, ρ is the air density, μ0 is the rotor solidity, and Z is the rotor disk area.
[0088] In existing technologies, a common method for drones to transmit power to IoT devices is to broadcast energy using an omnidirectional antenna. While broadcasting is the simplest method, it suffers from significant energy loss and low energy efficiency for IoT devices located far from the drone. To improve energy efficiency, directional antennas are preferred over omnidirectional antennas for drones. Compared to omnidirectional antennas, directional antennas offer numerous advantages, including increased capacity, longer transmission time, and greater spatial reusability.
[0089] Specifically, in this embodiment, the UAV is equipped with a directional antenna with adjustable beamwidth, whose azimuth and elevation half-power beamwidths are within... The direction is equal to Θ radians, where θ and These represent the azimuth and elevation angles, respectively. The antenna is located at... The gain model in the direction is as follows:
[0090]
[0091] In equation (3), G0 is a fixed value of 2.2846, and the feasible range of the half-beamwidth is... Because the drone uses a directional antenna, when it transmits energy to IoT devices, only IoT devices within a specific range can receive the energy. This range refers to an area with a radius of R = HtanΘ centered on the drone's horizontal position. The energy model is as follows: Figure 3 As shown.
[0092] Therefore, the energy harvested by the k-th IoT device in the t-th time slot is:
[0093]
[0094] In equation (4), σ represents the energy harvesting efficiency of the IoT device. After receiving the radio frequency signal from the drone, the IoT device converts the signal into energy, which results in energy loss. Additionally, some energy is used for the daily operation of the IoT device. Here, it is assumed that the energy harvesting efficiency of all IoT devices is the same and constant. P is the transmission power of the drone.
[0095] When d(t) = k, the corresponding IoT device needs to upload data to the drone in the corresponding time slot. All IoT devices collect data packets of the same size, M. The formula for calculating the transmission power of the kth IoT device is as follows:
[0096]
[0097] In equation (5), B is the signal bandwidth, σ 2This is the noise power, M is the data packet size, and d is the noise power. t,k It is the horizontal distance between the drone and the kth IoT device.
[0098] Specifically, the information age model is as follows:
[0099] For ease of analysis, we assume that the wake-up time and information sampling time of IoT devices are negligible compared to the information upload time.
[0100] In this embodiment, the IoT device does not use an arbitrary generation model to collect data, but instead collects data at fixed time intervals. The IoT device forms data packets from these periodically collected data packets. When no drone requests data, these packets are placed in the IoT device's cache space. When the cache space is full, following the first-in, first-out (FIFO) principle, the data packet at the head of the queue is removed, the latest data packet is placed at the tail of the queue, and other data packets are moved forward one position. Therefore, the latest data packet in the IoT device is always at the tail of the cache space. For ease of calculation, all IoT devices update data at the same time interval. The size of the cache space is N times the size of the data packets, and the size of each data packet is M. U k (t) is used to track the lifetime of the latest data packets in IoT devices:
[0101]
[0102] In equation (6), T represents the update interval of the IoT device. To quantify the timeliness and freshness of the data packets collected by the drone, this invention expresses AoI as the time elapsed since the last data packet transmission by the IoT device. If the IoT device sends a data packet to the drone, the AoI of the corresponding IoT device data packet is reset to U. k (t), otherwise the AoI of the corresponding IoT device data packet is incremented by one. Information age model as follows: Figure 4 As shown.
[0103] Figure 4 The solid line in the diagram represents the information age of the data packets from the IoT device, while the dashed line represents the time the latest data packet has existed in the IoT device. Because IoT devices update their data periodically, the time the latest data packet has existed in the IoT device is also periodically reset. When the IoT device uploads data to the drone, that is... Figure 4 The time indicated by the middle arrow does not change the information age of the IoT device to 1; instead, it changes to the lifespan of the latest data packet in the IoT device at that moment. Therefore, the information age (AoI) of the k-th IoT device is represented as:
[0104]
[0105] In summary, the main objective of the UAVs in the system is to jointly minimize the weighted average AoI and the average transmission power of IoT devices. The optimization problem is formulated as follows:
[0106]
[0107] P(t)>0 (8a)
[0108] 0 <E k (t) <E max (8b)
[0109] In equation (8), d(t) represents the energy transfer and data collection scheduling of the UAV, C(t) represents the flight trajectory scheduling of the UAV, and δ k The weight representing the importance of the k-th IoT device, P(t) is the remaining battery power of the drone, and E k (t) represents the remaining power of the k-th IoT device; Equation (8a) ensures that the drone has sufficient power to complete the entire flight, and Equation (8b) ensures that the IoT devices have sufficient power to collect data and transmit data to the drone; λ is a variable used to control the trade-off between the average information age and the average energy consumption of the IoT devices. The larger the value of λ, the more the optimization problem cares about the average energy consumption of the IoT devices, and vice versa.
[0110] The optimization problem in Equation (8) is a nonlinear integer programming optimization problem, the complexity of which increases with the number of deployed IoT devices. Furthermore, the drone experiences a large-dimensional state space, which is almost a continuous state space. To overcome the curse of dimensionality, this invention proposes a deep reinforcement learning algorithm based on deep Q-networks (DQN). This method acts as a function approximation to estimate the Q-function and effectively and feasiblely solves the given problem.
[0111] Specifically, in the system model, the k-means algorithm is used to cluster IoT devices based on their locations to improve the efficiency of drone data collection.
[0112] The method of clustering IoT devices based on their location using the k-means algorithm is as follows: Assume all IoT devices are assigned to L clusters, and each cluster contains n... lThere are L IoT devices, and the clusters are represented as l = {1, 2, ..., L}. The drone uses OFDMA (Orthogonal Frequency Division Multiple Access, a multiple access technology in wireless communication systems) to collect information from all IoT devices in the cluster simultaneously. The k-means algorithm is used for clustering, and the drone scheduling strategy is adjusted to d(t) ∈ D = {1, 2, ..., L}, where when d(t) = L, it means that all IoT devices in the Lth cluster must send information to the drone.
[0113] The method of using deep reinforcement learning algorithms to solve the optimization problem includes:
[0114] The optimization problem is formulated as a Markov decision process (MDP), which consists of...<s,a,r,p> This tuple consists of s representing the state space, a representing the action space, r representing the reward function, and p representing the state transition function. In each training set, the agent (i.e., the drone) observes the current state s(t) and then selects an action a(t) to execute. Once an action is selected, the agent (i.e., the drone) receives the corresponding reward r(t) and continues to observe the state s(t+1) in the next time slot. This allows for obtaining the optimal trajectory and scheduling strategy when training converges.
[0115] (1) State Space: The system's state consists of the UAV's state and the IoT device's state, represented as s(t) = (c(t), P(t)), where c(t) contains the UAV's position information and remaining energy, represented as c(t) = (c u (t),c e P(t) represents the remaining battery power and information age of the IoT device, expressed as P(t) = (E(t)). k (t),A k (t));
[0116] (2) Action Space: The system's actions include two aspects: the flight trajectory scheduling of the UAV and the control scheduling of data uplink and energy downlink, which are represented by q(t) and s(t) respectively, where:
[0117]
[0118] When s(t) = 0, the drone transmits energy to IoT devices within range; when s(t) = k, all IoT devices in the kth cluster upload data to the drone.
[0119] (3) Transfer function: The transfer function depends on the transitions between states, c u The transformation function of (t) is given by equation (9), Ak The transfer function of (t) is Equation (6). The energy consumption of the drone is determined by itself and whether it transmits energy to the IoT device. If s(t) = 0, in addition to the energy consumed by the drone during its flight or hovering, it also needs to consume additional energy to charge the IoT device. The remaining power of the IoT device is determined by the energy it receives from the drone and the loss of data it transmits to the drone. When s(t) = 0, the IoT devices in the corresponding range receive energy transmission from the drone. When s(t) = k, all IoT devices in the corresponding cluster need to consume energy to send data to the drone.
[0120] (4) Reward Function: The reward function is defined as minimizing the weighted sum of the average information age and average energy consumption of all IoT devices. In this invention, the instantaneous reward of the drone in the t-th time slot is defined as:
[0121]
[0122] Here, r1 represents the penalty when the remaining energy of the drone and IoT device is less than 0, and r2 represents the penalty when the drone flies out of the defined grid world.
[0123] (5) DQN Solution: DQN is an optimal decision network based on deep learning and reinforcement learning. DQN uses three techniques:
[0124] The first technique is the target network. DQN consists of two neural networks: the first (the online network) acts as the Q-function estimator, and the second (the target network) acts as the target Q-function network. The online network evaluates the reward for the action performed in the current state, while the target network predicts the actual action. When using gradient descent, the parameters of the online network are updated first, and the parameters of the target network are updated only after the online network has been updated O times.
[0125] The second technique is exploration, specifically the exploration rate ε. The exploration rate ε addresses the exploration-exploitation dilemma. Exploration refers to doing things you've never done before in hopes of obtaining a higher reward; exploitation refers to doing what you currently know will produce the greatest reward. Because the number of attempts is limited, exploration and exploitation are contradictory; strengthening one inevitably weakens the other. To maximize the reward, a good balance must be struck between exploration and exploitation. This is generally solved using an ε-greedy exploration approach. Initially, there's a higher probability of random exploration during training, but as the number of training sessions increases, you'll gain a general understanding of which actions are better, thus reducing the probability of random exploration—this is the ε-greedy approach.
[0126] The third technique is experience replay. To decompose the correlations between training data, an experience replay buffer is often used to store (s(t), a(t), r(t), s(t+1)). When the experience replay buffer is full, older data is discarded.
[0127] Specifically, the method for solving the optimization problem using a deep reinforcement learning algorithm is a deep reinforcement learning algorithm based on a deep Q-network (DQN). Figure 5 This demonstrates the interaction between the DQN architecture and its environment. The DQN training algorithm steps include:
[0128] (1) Initialize state s0, experience replay buffer, online network and target network;
[0129] (2) Repeat the training until the predetermined number of training iterations is reached;
[0130] (3) For each time slot t, for a given state s t Action a is executed based on Q(ε-greedy) t ;
[0131] (4) Obtain feedback r based on the state transition function t and obtain a new state s t+1 ;
[0132] (5) (s) t ,a t ,r t ,s t+1 Stored in the experience replay buffer;
[0133] (6) Sample from the buffer in batches (s i ,a i ,r i ,s i+1 );
[0134] (7) Update the online network. After updating the online network 0 times, update the target network.
[0135] (8) Terminate training.
[0136] Simulation results:
[0137] This invention provides numerical results to verify the proposed DQN algorithm and compares it with the Greedy Algorithm (GA) and the Random Walk Algorithm (RW). This invention provides a 1000m × 1000m square region, which is uniformly divided into grids of equal size and length 100m. The UAV flies from the starting station U0 = (0,0) to the destination station U... F = (10, 10). The specific simulation parameters are shown in Table 1 below.
[0138] parameter symbol value Drone flight altitude H 100m Channel gain at a reference distance of 1m <![CDATA[g0]]> 30dB Drone flight speed V 20m / s Energy harvesting efficiency of IoT devices σ 0.8 drone transmission power P 1W Data packet size M 5MB bandwidth B 1MHz Noise power <![CDATA[σ 2 ]]> -100dBm IoT device update interval T 3s
[0139] Table 1
[0140] Figure 6 This illustrates how the reward function changes with the weight λ when the number of IoT devices is fixed. When the weight λ changes, the DQN algorithm proposed in this invention can more adaptively adjust to the changes in weight λ. Therefore, when the weight λ increases, the decrease in the reward function is less significant compared to the other two algorithms. Figure 7 This illustrates how the reward function changes with the number of IoT devices while the weight λ remains constant. As the number of IoT devices increases, the reward function decreases to varying degrees. This is because as the number of IoT devices increases, they transmit data to the drone more frequently to reduce their average AoI, leading to an increase in the average energy consumption of the IoT devices, and vice versa. Figure 6 and Figure 7 The data results show that the DQN algorithm proposed in this invention can find better UAV trajectories and scheduling strategies compared with the RW and GA algorithms, thereby achieving higher information transmission performance and lower energy consumption.
[0141] Figure 8 This illustrates how the reward function changes with the number of IoT devices under different conditions. Clearly, using clustering to divide IoT devices into several clusters provides a performance improvement that is generally greater than using the new energy model. However, when the number of IoT devices in the system is small, clustering can actually decrease the overall system gain. This is because while clustering allows the drone to collect data from more IoT devices simultaneously, it also significantly increases the energy consumption of those devices. It's worth noting that when using directional antennas, the performance is highly dependent on the drone's altitude and beamwidth; if these parameters are not properly adjusted, the results may be worse than using the original omnidirectional antenna.
[0142] This invention constructs an IoT system model that, by considering UAVs assisting IoT devices in data collection and energy transmission, jointly optimizes the UAV's flight trajectory and scheduling control to minimize the weighted sum of the average information age of information in the communication system and the average energy consumption of IoT devices. It uses a K-means clustering algorithm to perform IoT device clustering, enabling the UAV to receive more information from IoT devices simultaneously. Furthermore, it employs directional antennas to concentrate radio frequency energy in a specific direction, enhancing the energy collected by IoT devices from the UAV. Simulation results demonstrate that the proposed solution outperforms other baseline algorithms, such as the GA and RW algorithms.
[0143] Obviously, the above embodiments of the present invention are merely examples for clearly illustrating the present invention, and are not intended to limit the implementation of the present invention. Those skilled in the art can make other variations or modifications based on the above description. It is neither necessary nor possible to exhaustively describe all embodiments here. Any modifications, equivalent substitutions, and improvements made within the spirit and principles of the present invention should be included within the scope of protection of the claims of the present invention.
Claims
1. A method for optimizing the age and power of IoT information based on unmanned aerial vehicles (UAVs), characterized in that, Includes the following steps: An Internet of Things (IoT) system model is constructed, which includes a rotary-wing UAV, a base station, and k IoT devices. In the system model, the UAV starts from the starting point and flies to the base station. During the flight, the UAV transmits energy to the IoT devices and collects data from the IoT devices. When it flies to the base station, it unloads the data to the base station. Determine the channel model, energy model, and information age model; The information age model is as follows: IoT devices collect data at fixed time intervals, forming data packets. When no drone requests data, these packets are stored in the IoT device's cache. When the cache is full, following a first-in, first-out (FIFO) principle, the packet at the head of the queue is removed, the newest packet is placed at the tail, and other packets are shifted forward one position. All IoT devices update data at the same time interval. The cache size is the size of N data packets, and the data packet size is M. To track the lifespan of the latest data packets in IoT devices: (6) In equation (6), T is the update interval of the Internet of Things (IoT) device; The information age of the kth IoT device is: (7) Construct an optimization problem that minimizes the weighted sum of the average information age and average energy consumption of all IoT devices; The optimization problem is described as follows: ; ; ; In equation (8), Represents the power transfer and data collection scheduling of drones. Represents the scheduling of drone flight paths. The weight representing the importance of the k-th IoT device. This refers to the drone's remaining battery power. It is the remaining power of the kth IoT device; Equation (8a) ensures that the drone has enough power to complete the entire flight process, and Equation (8b) ensures that the IoT device has enough power to collect data and transmit data to the drone; It is a variable used to control the trade-off between the average age of information and the average energy consumption of IoT devices; Deep reinforcement learning algorithms are used to solve optimization problems.
2. The method for optimizing the age and power of IoT information based on unmanned aerial vehicles (UAVs) according to claim 1, characterized in that, In the system model, each IoT device is randomly distributed in the grid world, and the IoT devices are given coordinates. The IoT devices are served by a rotary-wing drone, which starts from the point of origin. The drone departs, collects data from IoT devices during flight, and transfers energy to those devices before finally landing at its destination. The data collected during flight is unloaded to the base station, where H is the constant altitude of the UAV, and the UAV's position is represented as ( The grid world is divided into several square cells. The drone moves in four directions: east, west, north, and south, or maintains its position by hovering. The drone's flight process is discretized into [τ, 2τ, ...], where τ is the time required for the drone to move from one grid center to an adjacent grid center, and the time τ of a time slot is the distance between two adjacent grid centers. The ratio between the speed of the drone and its flight speed V is determined.
3. The method for optimizing the age and power of IoT information based on unmanned aerial vehicles (UAVs) according to claim 2, characterized in that, The channel model is as follows: Assuming IoT devices ∈D ={1, 2, ..., k}, where = ( , ...) represents the Internet of Things (IoT) , ...sends information to the drone in the t-th time slot. Assuming only LOS communication exists between the IoT device and the drone, the channel gain between the drone and the k-th IoT device in the t-th time slot is: (1) In equation (1), This refers to the channel gain at a reference distance of 1m, where H is the altitude of the UAV. It is the distance between the drone and the kth IoT device.
4. The method for optimizing the age and power of IoT information based on unmanned aerial vehicles (UAVs) according to claim 3, characterized in that, The energy model is as follows: The power consumption of a drone while it is moving or hovering is expressed as follows: (2) In equation (2), and These are the blade profile power and derived power when the drone is hovering, respectively. Indicates the speed of the drone. This indicates the tip velocity of the blade. This represents the average rotor induced velocity during hovering. The drag coefficient of the fuselage. air density, For rotor solidity, The rotor disk area; The drone is equipped with a directional antenna with adjustable beamwidth, and its azimuth and elevation half-power beamwidths are within ( , The direction is equal to Θ radians, where and These represent the azimuth and elevation angles, respectively. The antenna is in ( , The gain model in the direction is as follows: (3) In equation (3), It is a fixed value of 2.2846, and the feasible range for the half-beamwidth is... ; The energy harvested by the kth IoT device in the tth time slot is: (4) In equation (4), Indicates the energy harvesting efficiency of IoT devices. It refers to the transmission power of the drone; when At any given time, the corresponding IoT device needs to upload data to the drone in the corresponding time slot. All IoT devices collect data packets of the same size, M. The formula for calculating the transmission power of the k-th IoT device is as follows: (5) In equation (5), B is the signal bandwidth. M is the noise power, and M is the data packet size. It is the horizontal distance between the drone and the kth IoT device.
5. The method for optimizing the age and power of IoT information based on unmanned aerial vehicles (UAVs) according to claim 4, characterized in that, In the system model, the k-means algorithm is used to cluster IoT devices based on their locations to improve the efficiency of drone data collection.
6. The method for optimizing the age and power of IoT information based on unmanned aerial vehicles (UAVs) according to claim 5, characterized in that, The method of clustering IoT devices based on their location using the k-means algorithm is as follows: Assume all IoT devices are assigned to L clusters, and each cluster contains... There are L IoT devices, and L clusters are represented as... ={1,2,…L}, the drone uses OFDMA to collect information from all IoT devices in the cluster simultaneously, uses the k-means algorithm for clustering, and adjusts the drone scheduling strategy to d(t)∈D ={1,2,…,L}, where when d(t)=L, it means that all IoT devices in the Lth cluster must send information to the drone.
7. The method for optimizing the age and power of IoT information based on unmanned aerial vehicles (UAVs) according to claim 6, characterized in that, The method of using deep reinforcement learning algorithms to solve the optimization problem includes: The optimization problem is formulated as a Markov decision process, by... This tuple is composed of, Represents the state space. Represents the action space, Represents the reward function. Represented as a state transition function, in each training set, the UAV observes the current state s(t) and then selects an action a(t) to execute. Once an action is selected, the UAV receives a corresponding reward r(t) and continues to observe state s(t+1) in the next time slot. The optimal trajectory and scheduling policy are obtained when training converges, where: State space: The system's state consists of the states of the drone and the IoT devices, represented as s(t) = ( , ), c(t) contains the drone's location information and remaining energy, expressed as c(t) = ( ), Representing the remaining battery power and information age of IoT devices, denoted as =( ); Action Space: The system's actions include two aspects: the UAV's flight trajectory scheduling and the control scheduling of data uplink and energy downlink, denoted as q(t) and s(t) respectively, where: (9) When s(t)=0, the drone transmits energy to IoT devices within range; when s(t)=k, all IoT devices in the kth cluster upload data to the drone. Transition function: The transition function depends on the transitions between states. The transformation function is given by equation (9). The transfer function is Equation (6). The energy consumption of the drone is determined by itself and whether it transmits energy to the IoT device. If s(t)=0, in addition to the energy consumed by the drone during its flight or hovering, it also needs to consume additional energy to charge the IoT device. The remaining power of the IoT device is determined by the energy it receives from the drone and the loss of data it transmits to the drone. When s(t)=0, the IoT devices in the corresponding range receive energy from the drone; when s(t)=k, all IoT devices in the corresponding cluster consume energy to send data to the drone. Reward Function: The reward function is defined as minimizing the weighted sum of the average information age and average energy consumption of all IoT devices. The instantaneous reward for the drone in the t-th time slot is defined as: (10) in, This represents the penalty when the remaining energy of drones and IoT devices is less than 0. This represents the penalty for a drone flying outside the designated grid world.
8. The method for optimizing the age and power of IoT information based on unmanned aerial vehicles (UAVs) according to claim 7, characterized in that, The method for solving the optimization problem using a deep reinforcement learning algorithm is based on a deep Q-network, and the steps include: (1) Initialization state Experience replay buffer, online network, and target network; (2) Repeat the training process until the predetermined number of training iterations is reached; (3) For each time slot t, for a given state Execute actions ; (4) Obtain feedback based on the state transition function and obtain a new state. ; (5) will ( Stored in the experience replay buffer; (6) Sampling from the buffer ( ); (7) Update the online network. After updating the online network 0 times, update the target network. (8) Terminate training.