An IRS configuration optimization and AoI scheduling optimization method in a UAV-IRS assisted IoT network
By integrating UAV and IRS, and optimizing the movement direction of UAV and the reflection phase shift matrix of IRS, the problem of AoI not being minimized in UAV-assisted IoT networks is solved, achieving low-latency and reliable communication and extended flight endurance.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- NORTHWEST UNIV
- Filing Date
- 2023-03-16
- Publication Date
- 2026-06-26
AI Technical Summary
In existing UAV-assisted IoT networks, the scheduling algorithm cannot dynamically adapt to the traffic generation patterns of different IoT devices, resulting in the AoI not being minimized effectively. Furthermore, the power consumption of UAVs affects flight endurance, making it impossible to achieve low-latency and reliable communication.
By integrating UAV and IRS, the movement direction of UAV and the reflection phase shift matrix of IRS are optimized using pre-trained Q-tables and DQN models to ensure that the signal-to-noise ratio of IoT devices reaches the threshold. Furthermore, the reflection phase shift matrix is optimized using a tabu search algorithm to achieve effective signal relay.
It achieves low-latency and reliable communication, reduces the AoI of IoT devices, improves signal transmission rate, and extends the flight endurance of UAVs.
Smart Images

Figure CN116436545B_ABST
Abstract
Description
Technical Field
[0001] This invention belongs to the field of wireless communication technology, specifically relating to a method for optimizing IRS configuration and AoI scheduling in a UAV-IRS-assisted IoT network. Background Technology
[0002] Unmanned Aerial Vehicles (UAVs), serving as aerial base stations or relay-assisted wireless communication, offer advantages such as high mobility, ease of line-of-sight transmission, and low deployment costs, effectively solving the transmission problems of temporary communication hotspots. On the other hand, Intelligent Reflecting Surfaces (IRS) are a smart radio technology that has garnered significant attention in recent years. An IRS consists of numerous low-power, low-cost reconfigurable reflective elements. Each element can reflect the incident signal by adjusting its reflection phase; by jointly adjusting the phase shift of all elements, beamforming can be achieved, thus focusing the signal at the receiver. IRS devices are lightweight, allowing them to be mounted on UAVs and integrated to form UAV-IRS, providing more flexible signal enhancement for ground-based Internet of Things (IoT) devices in the air. By optimizing the UAV's flight path and communication resource allocation, UAV-IRS-assisted communication has been widely applied in various scenarios to improve wireless communication quality.
[0003] The construction of smart cities requires a large number of IoT sensing devices to provide efficient, timely, and reliable real-time monitoring data. For example, intelligent transportation systems, industrial remote control systems, and intelligent environmental monitoring all require real-time monitoring and low-latency data transmission. Outdated data updates can lead to flawed decision-making. However, many IoT devices cannot communicate over long distances, or their data transmission back to the base station is hindered by obstacles such as buildings. Therefore, providing low-latency, reliable communication services for IoT devices is a challenging task.
[0004] Many existing studies on minimizing AoI scheduling in IoT networks primarily use UAVs as airborne mobile relays. The significant power consumption of UAVs when processing relay communication signals impacts their flight endurance. Therefore, there is still room for improvement in enhancing the performance gains of wireless communication.
[0005] In reality, the traffic generation patterns of IoT monitoring devices are closely related to the services they support. For example, weather monitoring and smart meters need to upload their data periodically, following a periodic traffic generation model where devices upload updated data at fixed time intervals. In contrast, health monitoring or intelligent traffic control devices upload their data randomly, generating new data packets in each time slot. The scheduling algorithm is unaware of which traffic generation pattern the device will adopt before execution. Therefore, if the proposed scheduling algorithm cannot dynamically adapt to the different traffic generation patterns of different IoT devices, it will fail to effectively minimize AoI.
[0006] In other words, many existing methods still have room for improvement in terms of enhancing the performance gain of wireless communication; in addition, some existing scheduling algorithms cannot dynamically learn the traffic generation patterns on the device and do not minimize AoI. Summary of the Invention
[0007] To address the aforementioned problems in related technologies, this invention provides a method for optimizing IRS configuration and AoI scheduling in a UAV-IRS-assisted IoT network. The technical problem to be solved by this invention is achieved through the following technical solution:
[0008] This invention provides a method for optimizing IRS configuration and AoI scheduling in a UAV-IRS-assisted IoT network, applicable to drones equipped with IRS, wherein the drone corresponds to multiple IoT devices in the IoT network, and the method includes:
[0009] Obtain its own current location information;
[0010] The optimal movement direction is determined based on the current location information and the pre-trained Q-table. The Q-table stores multiple location information, a preset movement direction corresponding to each location information, and the Q value of each location information under each corresponding preset movement direction.
[0011] Move along the optimal direction to the next location;
[0012] At the next location information, the reflection phase shift matrix of the IRS is determined by a combination optimization method based on the number of surface array elements of the IRS, the preset phase resolution, and the signal amplitude of the IRS.
[0013] At the next location information, the signal-to-noise ratio of each corresponding IoT device is determined based on the reflection phase shift matrix;
[0014] When the signal-to-noise ratio of each corresponding IoT device is greater than or equal to a preset threshold, the AoI of each corresponding IoT device in the current time slot is obtained, the AoI is input into the pre-trained DQN model, and the IoT devices to be relayed in the current time slot are output.
[0015] The information sent by the IoT device to be relayed is sent to the base station.
[0016] In some embodiments, the method further includes:
[0017] When the signal-to-noise ratio of at least one corresponding IoT device is less than the preset threshold, the optimal movement direction is re-determined based on the next location information;
[0018] Move along the newly determined optimal direction to the next location;
[0019] The reflection phase shift matrix of the IRS is redefined at the next location information;
[0020] At the next location information, the signal-to-noise ratio of each corresponding IoT device is re-determined based on the reflection phase shift matrix of the re-determined IRS;
[0021] When the signal-to-noise ratio of each corresponding IoT device that is re-determined is greater than or equal to a preset threshold, the IoT device to be relayed is re-determined, and the information sent by the re-determined IoT device to be relayed is sent to the base station.
[0022] In some embodiments, the combined optimization method is a tabu search algorithm; the step of determining the reflection phase shift matrix of the IRS at the next location information based on the number of surface array elements of the IRS, the preset phase resolution, and the signal amplitude of the IRS through the combined optimization method includes:
[0023] At the next location information, a reflection phase shift matrix is initialized based on the number of surface array elements of the IRS, the preset phase resolution, and the signal amplitude of the IRS;
[0024] The initial reflection phase shift matrix is used as the initial solution, the signal-to-noise ratio is used as the objective function, the tabu search algorithm is used to search for the optimal reflection phase shift matrix, and the optimal reflection phase shift matrix is used as the determined reflection phase shift matrix.
[0025] In some embodiments, before determining the optimal movement direction based on the current location information and the pre-trained Q-table, the method further includes:
[0026] Initialize the Q-table and the drone's position to obtain the initial Q-table and initial position information;
[0027] The i-th round of training is performed based on the initial Q-table and the initial position information to iteratively update the initial Q-table in the i-th round. The i-th round of training ends when the number of training iterations in the i-th round reaches E1 or the movement direction obtained in the z-th training iteration in the i-th round meets the preset conditions, and the updated Q-table in the i-th round is obtained. Each round of training includes E1 training iterations; z and E1 are both integers greater than or equal to 1.
[0028] Reinitialize the drone's position to obtain updated initial position information;
[0029] Training is performed in the (i+1)th round based on the Q-table updated in the i-th round and the initial position information of the update, so as to perform the (i+1)th round iterative update of the Q-table updated in the i-th round. This continues until the training round reaches E2, at which point the Q-table updated in the E2th round is used as the pre-trained Q-table; E2 is an integer greater than 1.
[0030] In some embodiments, the step of training the i-th round based on the initial Q-table and the initial position information to iteratively update the initial Q-table in the i-th round, and ending the i-th round of training when the number of training iterations in the i-th round reaches E1 or the movement direction obtained in the z-th training iteration in the i-th round meets a preset condition, and obtaining the updated Q-table in the i-th round, includes:
[0031] During the c-th training iteration in the i-th training round, the c-th Q-table and the c-th position information of the UAV are obtained; c is an integer greater than or equal to 1; when c is 1, the c-th Q-table is the initial Q-table, and the c-th position information is the initial position information.
[0032] At the c-th location information, the sum of the transmission rates of all corresponding IoT devices is determined to obtain the first transmission rate value;
[0033] Select the movement direction with the largest Q value from the multiple preset movement directions corresponding to the c-th position information in the Q table for the c-th time;
[0034] When the selected movement direction is hovering, the i-th round of training ends, and the Q-table of the c-th round is used as the Q-table of the i-th round;
[0035] If the selected direction of movement is not hovering, move along the selected direction to obtain the (c+1)th position information;
[0036] At the (c+1)th location information, determine the sum of the transmission rates of all corresponding IoT devices to obtain the second transmission rate value;
[0037] Based on the first transmission rate value, the second transmission rate value, the transmission rate of each corresponding IoT device at the (c+1)th location information, and the preset threshold, the cth reward value is determined;
[0038] Update the Q-table for the c-th reward based on the c-th reward value to obtain the Q-table for the (c+1)-th reward.
[0039] When c is less than E1, the (c+1)th training is performed based on the Q-table of the (c+1)th training and the (c+1)th position information. This process continues until the number of training iterations reaches E1 or the movement direction obtained from the z-th training iteration is hovering. At this point, the i-th round of training ends, and the updated Q-table of the i-th round is obtained.
[0040] In some embodiments, determining the c-th reward value based on the first transmission rate value, the second transmission rate value, the transmission rate of each corresponding IoT device at the (c+1)-th location information, and the preset threshold includes:
[0041] The preset threshold is compared with the transmission rate of each corresponding IoT device at the (c+1)th location information to obtain the number of transmission rates that are greater than or equal to the preset threshold;
[0042] A penalty value is obtained based on the relationship between the obtained quantity and the total number of IoT devices corresponding to the drone; wherein, when the obtained quantity is less than the total quantity, the penalty value is a preset value greater than 0; when the obtained quantity is equal to the total quantity, the penalty value is 0.
[0043] The difference between the second transmission rate value and the first transmission rate value is obtained;
[0044] The c-th reward value is obtained by subtracting the difference from the penalty value.
[0045] In some embodiments, determining the sum of the transmission rates of all corresponding IoT devices at the c-th location information to obtain a first transmission rate value includes:
[0046] At the c-th location information, the channel gain between the UAV and the base station is determined. Based on the determined channel gain, the preset reflection phase shift matrix, the preset noise variance, and the channel gain between each corresponding IoT device and the base station, the signal-to-noise ratio of each corresponding IoT device is determined.
[0047] Based on the signal-to-noise ratio of each corresponding IoT device at the c-th location information, determine the transmission rate of each corresponding IoT device at the c-th location information;
[0048] The first transmission rate value is obtained by calculating the sum of the transmission rates of each corresponding IoT device at the c-th location information.
[0049] In some embodiments, before obtaining the AoI of each corresponding IoT device in the current time slot, inputting the AoI into a pre-trained DQN model, and outputting the IoT devices to be relayed in the current time slot, the method includes:
[0050] Initialize the DQN model to obtain the initial DQN model;
[0051] During the j-th iteration of training, N experience samples are acquired and placed into the experience replay pool. The d-th experience sample includes: the AoI of each corresponding IoT device in the d-th time slot, the IoT device to be relayed in the d-th time slot, the reward value in the d-th time slot, and the AoI of each corresponding IoT device in the (d+1)-th time slot after relaying the IoT device to be relayed in the d-th time slot; j is an integer greater than or equal to 1; d is an integer from 1 to N.
[0052] Experience samples are randomly obtained from the experience replay pool, and the initial DQN model is trained in the j-th round to obtain the DQN model trained in the j-th round.
[0053] During the (j+1)th iteration of training, N new experience samples are acquired and placed into the experience replay pool. Experience samples are then randomly acquired from the experience replay pool to perform the (j+1)th iteration of training on the DQN model trained in the jth round. This process continues until the preset number of training rounds is reached, at which point training stops, and the pre-trained DQN model is obtained.
[0054] In some embodiments, the reward value for the d-th time slot is calculated using the following formula:
[0055]
[0056] Where K is the total number of IoT devices corresponding to the drone itself, and A k (d) represents the AoI of the k-th device in the d-th time slot, τ is the preset information age threshold, and P{A k (d)>τ} represents the probability that the AoI of the k-th device in the d-th time slot is greater than τ, and w is the preset amplification factor.
[0057] The present invention has the following beneficial technical effects:
[0058] This invention integrates UAV and IRS, enabling remote IoT devices to relay their transmitted information to a base station for processing. This increases the freshness of information from IoT devices, thereby minimizing their AoI (Aspect-Oriented Integrity) and facilitating low-latency, reliable communication. The invention uses a Q-table derived from reinforcement learning to hover the UAV in an optimal position and determine the optimal IRS reflection phase shift matrix at that position. By combining UAV position optimization with IRS beamforming, the signal-to-noise ratio (SNR) of the uplink signal transmitted by each device is greater than a predefined threshold, ensuring successful data decoding at the base station receiver and maximizing the transmission rate of the transmitted signal. Furthermore, this invention utilizes a pre-trained DQN model for relaying information between IoT devices, increasing the freshness of their information and minimizing their AoI.
[0059] The present invention will be further described in detail below with reference to the accompanying drawings and embodiments. Attached Figure Description
[0060] Figure 1 An exemplary interaction scenario diagram between UAV-IRS, BS, and IoTD is provided for embodiments of the present invention;
[0061] Figure 2 A flowchart of the IRS configuration optimization and AoI scheduling optimization method in a UAV-IRS-assisted IoT network provided in this embodiment of the invention;
[0062] Figure 3 This is a schematic diagram illustrating an exemplary training process for a DQN model, provided as an embodiment of the present invention.
[0063] Figure 4 The training results obtained by applying the UAV position optimization algorithm of the present invention are provided as an example of embodiments of the present invention.
[0064] Figure 5 A schematic diagram illustrating the comparison between the results obtained by the proposed position optimization algorithm and the random position optimization algorithm, provided as an example of an embodiment of the present invention;
[0065] Figure 6 An exemplary schematic diagram showing the comparison of the performance effects of the TS algorithm-based optimization of IRS array element phase shift and the stochastic optimization of phase shift, provided for embodiments of the present invention;
[0066] Figure 7 An exemplary algorithm convergence graph provided for an embodiment of the present invention;
[0067] Figure 8 A comparative diagram showing the impact of the number of exemplary IoTDs on the proposed DQN algorithm and the baseline scheduling algorithm provided in this embodiment of the invention;
[0068] Figure 9 The figure shows a comparison of the proposed DQN scheduling optimization algorithm and two other benchmark scheduling optimization algorithms under two traffic generation modes, provided for embodiments of the present invention.
[0069] Figure 10 A schematic diagram showing the comparison of the impact of the exemplary number of UAV-IRS provided in this embodiment of the invention on the proposed DQN algorithm and the baseline scheduling algorithm. Detailed Implementation
[0070] The present invention will be further described in detail below with reference to specific embodiments, but the implementation of the present invention is not limited thereto.
[0071] In the description of this invention, the terms "first" and "second" are used for descriptive purposes only and should not be construed as indicating or implying relative importance or implicitly specifying the number of indicated technical features. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature. In the description of this invention, "a plurality of" means two or more, unless otherwise explicitly specified.
[0072] In the description of this specification, the references to terms such as "one embodiment," "some embodiments," "example," "specific example," or "some examples," etc., indicate that a specific feature, structure, material, or characteristic described in connection with that embodiment or example is included in at least one embodiment or example of the invention. In this specification, the illustrative expressions of the above terms do not necessarily refer to the same embodiment or example. Furthermore, the specific features, structures, materials, or characteristics described may be combined in any suitable manner in one or more embodiments or examples. In addition, those skilled in the art can combine and integrate the different embodiments or examples described in this specification.
[0073] Although the invention has been described herein in conjunction with various embodiments, those skilled in the art will understand and implement other variations of the disclosed embodiments by reviewing the accompanying drawings, disclosure, and appended claims in carrying out the claimed invention. In the claims, the word "comprise" does not exclude other components or steps, and "a" or "an" does not exclude a plurality. A single processor or other unit can implement several functions listed in the claims. While different dependent claims may recite certain measures, this does not mean that these measures cannot be combined to produce good results.
[0074] In this invention, an exemplary IoT network deploys a base station (BS), K Internet of Things devices (IoTDs), M UAVs equipped with IRS (i.e., UAV-IRS), and R IRS surface array elements. Figure 1 As shown, these IoTDs need to upload timestamped information to the BS for processing. However, due to energy limitations, limited wireless range, or environmental obstacles, a direct line-of-sight (LoS) communication link cannot be established between the IoTDs and the BS. Therefore, this invention deploys M UAVs equipped with IRSs at appropriate locations in the air. Each UAV-IRS corresponds to multiple IoTDs in the IoT network, with different UAV-IRS corresponding to different IoTDs, to establish a virtual line-of-sight link and relay the IoTD information to the BS. The deployable location range of each UAV-IRS in 3D space is a preset range, for example: x min <x<x max y min <y<y max , z min <z<z max L m (x,y,z) represents the deployment location coordinates of the m-th UAV-IRS.
[0075] For example, the network is based on FDMA multiplexing, where each IoTD operates on a different frequency band, thus eliminating interference between IoTDs.
[0076] Figure 2 This is a flowchart of a method for optimizing IRS configuration and AoI scheduling in a UAV-IRS-assisted IoT network, provided in an embodiment of the present invention. This method is applied to UAVs with IRS. Figure 2 As shown, the method includes the following steps:
[0077] S101. Obtain your current location information.
[0078] Here, the UAV-IRS location information is three-dimensional spatial information (x, y, z).
[0079] S102. Determine the optimal movement direction based on the current position information and the pre-trained Q-table. The Q-table stores multiple position information, the preset movement direction corresponding to each position information, and the Q value of each position information under each corresponding preset movement direction.
[0080] Here, the optimal movement direction is selected from the multiple preset movement directions corresponding to the current position information in the Q table, with the largest Q value.
[0081] Here, the preset movement directions corresponding to each location information include: up, down, left, right, forward, backward, and hover.
[0082] S103. Move along the optimal direction to the next location information.
[0083] Here, the UAV-IRS can move a preset distance each time, and this preset distance can be set arbitrarily.
[0084] S104. At the next location information location, the reflection phase shift matrix of the IRS is determined by a combination optimization method based on the number of surface array elements of the IRS, the preset phase resolution, and the signal amplitude of the IRS.
[0085] Here, at the next location information, a reflection phase shift matrix can be initialized based on the number of surface array elements of the IRS, the preset phase resolution, and the signal amplitude of the IRS. The initialized reflection phase shift matrix is used as the initial solution, the signal-to-noise ratio is used as the objective function, and the tabu search algorithm is used to search for the optimal reflection phase shift matrix. The optimal reflection phase shift matrix is then used as the determined reflection phase shift matrix.
[0086] Here, the tabu search algorithm initializes by randomly generating an initial solution *i* in the search space, setting the tabu list *H* to empty, and dedicating the current solution *i* to the historical best solution *s*. Then, it enters an iterative search process. In each iteration, starting from the current solution *i*, and under the constraints of the current tabu list *H*, it constructs a neighborhood *A* of solution *i*. Then, it selects the solution *j* with the best fitness value from *A* to replace solution *i*, and simultaneously updates the tabu list *H*. After solution *j* replaces solution *i*, if the quality of solution *i* improves, then the historical best solution *s* will be replaced by solution *i*; otherwise, *s* remains unchanged. Even if solution *i* temporarily deteriorates, the expanded search space still helps to escape local optima. After obtaining the new current solution *i*, the algorithm returns to the beginning of the iteration and continues until it finds the optimal solution or reaches a certain number of iterations, thus terminating the algorithm.
[0087] For example, assuming the number of IRS array elements R = 4, and the number of quantization bits (phase resolution) b = 1 bit for the array element phase shift, since the reflection phase shift matrix of the IRS array element is... r = 1, 2, 3, ..., R, where b represents the phase resolution in bits. For example, using a discrete phase shift of b = 1 bit, assuming the signal amplitude β = 1, then θ r [n]∈{0,π}, r∈R={1,…,R} is the phase shift of the r-th reflecting element in time slot n. Therefore, the element reflection coefficient βe can be calculated. jθThe set of values for is {1, -1}. Then, first, a random IRS reflection matrix Θ = diag([1,1,1,1]) is generated as the initial solution i, and the optimal solution s = i is designed based on the signal-to-noise ratio calculation formula. The value of the objective function f(s) is obtained. The subsequent iteration steps are as follows:
[0088] 1) By transforming the reflection coefficient values at all untabbed positions in the reflection matrix, we can obtain the neighborhood solutions of the initial solution i, such as A = {[-1,1,1,1], [1,-1,1,1], [1,1,-1,1], [1,1,-1,1], [1,1,1,-1]}.
[0089] 2) Calculate the objective function value corresponding to each new solution in neighborhood A, select the best solution j to replace i, make i = j and update the tabu list H = {1, (3)}. Assuming that the objective function value corresponding to the solution [-1, 1, 1, 1] with index 1 in neighborhood A (i.e. the solution at the first position in neighborhood A) is optimal, then the index 1 is placed in the tabu list. The 3 in the tabu list H represents the tabu period, which means that the reflection coefficient value at the first position will not be changed in the next three iterations.
[0090] 3) Compare f(s) and f(i), and update the optimal solution s. If the current optimal value does not change within the preset number of steps, the calculation can be terminated, and the last optimal solution s obtained can be taken as the final solution found.
[0091] S105. At the next location information location, determine the signal-to-noise ratio of each corresponding IoT device based on the reflection phase shift matrix.
[0092] Here, the formula for calculating the signal-to-noise ratio of the signal transmitted from the k-th IoTD to the BS through the m-th UAV-IRS is:
[0093]
[0094]
[0095]
[0096] Where, p k Let θ represent the signal transmission power of the k-th IoTD (for example, all IoTDs can have the same transmission power). m Let σ be the reflection phase shift matrix of the IRS. 2 H represents the preset noise variance, and H is the transpose sign. mLet ρ be the channel gain between the BS and the m-th UAV-IRS, ρ be the path loss when the reference distance is 1 meter, α be the path loss exponent, d1 be the 3D spatial distance between the BS and the m-th UAV-IRS, φ be the cosine of the signal angle of arrival (AoA), R be the number of elements in the IRS, λ be the signal wavelength, and d be the antenna spacing. d1 represents the channel gain between the m-th UAV-IRS and the k-th IoTD, and d2 represents the 3D spatial distance between the BS and the k-th IoTD.
[0097] S106. When the signal-to-noise ratio of each corresponding IoT device is greater than or equal to the preset threshold, obtain the AoI of each corresponding IoT device in the current time slot, input the AoI into the pre-trained DQN model, and output the IoT devices to be relayed in the current time slot.
[0098] For example, suppose the total working time is T, and then it is divided into N time slots of duration t. Let... As an indicator variable, if device k is selected for data upload in the m-th UAV-IRS scheduling in time slot n, then Otherwise, it is 0. The following constraints must be met:
[0099]
[0100]
[0101]
[0102]
[0103] The second constraint guarantees that each IoTD is scheduled and served by only one UAV-IRS in each time slot. The third constraint guarantees that each UAV-IRS schedules (relays) only one IoTD in each time slot. That is, although there are multiple UAV-IRS and multiple IoTDs, the service between them is one-to-one during operation. The fourth constraint guarantees that the number of IoTDs scheduled and served in each time slot should be equal to the number of UAV-IRS, M.
[0104] Here, AoI is defined as the time elapsed since the generation time of the most recently received data. The age of an updated data packet with timestamp u at time t is tu; the smaller this value, the fresher the information is considered. AoI increases linearly in time slots where no data is received, and then decreases at the time of reception. Assuming that the data transmitted by the device in each time slot can be completely transmitted within that time slot, if the UAV-IRS selects device k for scheduling in time slot n, then the AoI of device k in time slot n+1 is equal to 1. Otherwise, if another device is selected in time slot n, then the AoI of k in time slot n+1 is equal to the AoI of k in time slot n plus 1. If a... k (n+1) represents the AoI of device k at time slot n+1, then the formula for calculating AoI is: in,
[0105] S107. Send the information sent by the IoT device to be relayed to the base station.
[0106] In some embodiments, after S105 described above, the method further includes:
[0107] S201. When the signal-to-noise ratio of at least one corresponding IoT device is less than a preset threshold, the optimal moving direction is re-determined based on the next location information.
[0108] S202. Move along the newly determined optimal moving direction to the next location information.
[0109] S203. Re-determine the reflection phase shift matrix of the IRS at the next location information.
[0110] S204. At the next location information, the signal-to-noise ratio of each corresponding IoT device is re-determined based on the reflection phase shift matrix of the re-determined IRS.
[0111] S205. When the signal-to-noise ratio of each corresponding IoT device that has been re-determined is greater than or equal to a preset threshold, the IoT device to be relayed is re-determined, and the information sent by the re-determined IoT device to be relayed is sent to the base station.
[0112] Here, the specific implementation of S201 to S205 is the same as the specific implementation principle of S101 to S105.
[0113] In some embodiments, prior to S102, the method further includes:
[0114] S001. Initialize the Q-table and the drone's position to obtain the initial Q-table and initial position information.
[0115] S002. Based on the initial Q-table and initial position information, perform training for the i-th round to iteratively update the initial Q-table for the i-th round. When the number of training iterations in the i-th round reaches E1 or the movement direction obtained in the z-th training in the i-th round meets the preset conditions, the training of the i-th round ends, and the updated Q-table for the i-th round is obtained. Each round of training includes E1 training iterations; z and E1 are both integers greater than or equal to 1.
[0116] Here, the i-th round of training specifically includes the following steps:
[0117] S1. During the c-th training iteration in the i-th training round, obtain the c-th Q-table and the c-th position information of the UAV; c is an integer greater than or equal to 1; when c is 1, the c-th Q-table is the initial Q-table and the c-th position information is the initial position information.
[0118] S2. At the c-th location information, determine the sum of the transmission rates of all corresponding IoT devices to obtain the first transmission rate value.
[0119] Specifically, the signal-to-noise ratio (SNR) of each corresponding IoTD at the c-th location information can be calculated first. Then, based on the SNR of each IoTD, the transmission rate of that IoTD can be calculated. The sum of the transmission rates of K IoTDs is used as the first transmission rate value.
[0120] For example, according to Shannon's law, the formula for calculating the transmission rate of the signal transmitted from the k-th device to the BS through the m-th UAV-IRS is:
[0121] S3. From the Q table of the cth time, select the movement direction with the largest Q value among the multiple preset movement directions corresponding to the cth position information.
[0122] Specifically, it can be expressed by the formula as follows:
[0123] S4. When the selected movement direction is hovering, end the i-th round of training and use the Q-table of the c-th round as the Q-table of the i-th round.
[0124] S5. When the selected movement direction is not hovering, move along the selected movement direction to obtain the (c+1)th position information.
[0125] S6. At the (c+1)th location information, determine the sum of the transmission rates of all corresponding IoT devices to obtain the second transmission rate value.
[0126] Specifically, the calculation principle for the second transmission rate value is the same as that for the first transmission rate value.
[0127] S7. Determine the c-th reward value based on the first transmission rate value, the second transmission rate value, the transmission rate of each corresponding IoT device at the c+1-th location information, and the preset threshold.
[0128] Specifically, a reward function can be used to calculate the reward value. The formula for the reward function is: reward = Rate S1 -Rate S -p0; where Rate S1 Rate is the second transmission rate value. S p0 is the first transmission rate value, and p0 is the penalty value. When the number D of the K transmission rates corresponding to the K IoTDs at the (c+1)th location information is less than K, p0 is a preset value greater than 0; when D equals K, p0 is 0.
[0129] S8. Update the Q-table for the c-th reward based on the c-th reward value to obtain the Q-table for the (c+1)-th reward.
[0130] Here, when updating the Q-table, the following Q-value update function can be used: Q(s,a)←Q(s,a)+α[r+γmax] a′ Q(s′,a′)-Q(s,a)], where r is the reward value, s is the state (representing the position information of the UAV-IRS), a is the action (representing the movement direction of the UAV-IRS), γ is the preset discount factor, s′ is the next state, a′ is the action with the largest Q value in the actions corresponding to the next state, and α is the learning rate.
[0131] S9. When c is less than E1, perform the (c+1)th training based on the Q-table of the (c+1)th training and the (c+1)th position information. Continue in this way until the number of training sessions reaches E1 or the movement direction obtained in the z-th training session is hovering. Then the training of the i-th round ends and the updated Q-table of the i-th round is obtained.
[0132] S003. Reinitialize the drone's position to obtain updated initial position information.
[0133] S004. Based on the Q-table updated in the i-th round and the updated initial position information, train in the (i+1)-th round to iterate and update the Q-table updated in the i-th round in the (i+1)-th round. Continue in this way until the training round reaches E2, and use the Q-table updated in the E2-th round as the pre-trained Q-table; E2 is an integer greater than 1.
[0134] Here, the principle of each round of training is the same, as described in the above description of the principle of the i-th round of training.
[0135] The training process of obtaining the pre-trained Q-table is illustrated below with a specific example.
[0136] 1) Initialize the Q table used to store Q values, i.e., Q(s,a); initialize the loop counters E1=0, E2=0;
[0137] 2) Randomly initialize the position (X,Y,Z) of the UAV-IRS;
[0138] 3) Obtain the state s from the location (X,Y,Z) of the UAV-IRS, and then calculate the sum of the transmission rates (Rate) of all IoTDs corresponding to the UAV-IRS at that location. S ;
[0139] 4) According to Select the action a that has the highest Q value relative to state s;
[0140] 5) After the UAV-IRS takes action A, it moves to a new position (X', Y', Z') and obtains a new state s'. Then, it calculates the sum of the transmission rates (Rate) of all IoTDs corresponding to the UAV-IRS at the new position. S1 ;
[0141] 6) Use reward=Rate S1 -Rate S -p0 calculates the reward function value r;
[0142] 7) From Q(s,a)←Q(s,a)+α[r+γmax a′ Calculate and update the Q value using Q(s′,a′)-Q(s,a)], so that E1=E1+1;
[0143] 8) If action A is a hovering action or E1 reaches the maximum number of loops, proceed to step 9; otherwise, make (X,Y,Z) = (X',Y',Z'), proceed to step 3), and continue looping from step 3) to step 7).
[0144] 9) Set E2 = E2 + 1. If E2 reaches the maximum number of iterations, stop all calculations; otherwise, proceed to step 2) and continue looping from step 2) to step 8).
[0145] In some embodiments, prior to S106 above, the method further includes:
[0146] S301. Initialize the DQN model to obtain the initial DQN model.
[0147] S302. During the j-th round of iterative training, N experience samples are obtained and placed into the experience replay pool. The d-th experience sample includes: the AoI of each corresponding IoT device in the d-th time slot, the IoT device to be relayed in the d-th time slot, the reward value in the d-th time slot, and the AoI of each corresponding IoT device in the (d+1)-th time slot after relaying the IoT device to be relayed in the d-th time slot; j is an integer greater than or equal to 1; d is an integer from 1 to N.
[0148] S303. Randomly obtain experience samples from the experience replay pool and perform the j-th round of iterative training on the initial DQN model to obtain the DQN model trained in the j-th round.
[0149] S304. During the (j+1)th iteration of training, N new experience samples are acquired and put into the experience replay pool. Experience samples are then randomly acquired from the experience replay pool to perform the (j+1)th iteration of training on the DQN model trained in the jth round. This process continues until the preset number of training rounds is reached, at which point training stops, and the pre-trained DQN model is obtained.
[0150] The following combination Figure 3 The training process diagram shown illustrates the training process of the DQN model through a specific example.
[0151] 1) Initialize the two neural networks required for the DQN model: the evaluation Q-network and the target Q-network, with hyperparameters θ and θ', respectively. - Let θ - =θ; Initialize the loop counter E 11 =0;
[0152] 2) Set time slot n=1 to obtain state s(n);
[0153] 3) Based on the ε-greedy strategy, generate a random number between 0 and 1. If the random number is less than ε, randomly select an action a(n); otherwise, select the action a(n) with the highest Q value according to Formula 15.
[0154] 4) Execute action a(n), calculate and update the AoI value of each corresponding IoTD, and obtain the new state s(n+1);
[0155] 5) Adopt The reward function value r(n) is calculated, where K is the total number of IoT devices corresponding to the UAV-IRS, and A k (n) represents the AoI of the k-th device in the n-th time slot, τ is the preset information age threshold, and P{A k (n)>τ} represents the probability that the AoI of the k-th device in the n-th time slot is greater than τ, and w is the preset amplification factor;
[0156] Here, each IoTD has a predefined threshold τ (which can be the same or different), and a counter Count can be defined. In the nth time slot, Count is initialized to 0, and then each IoTD is iterated through. For each IoTD, if its AoI value is greater than the corresponding threshold τ, Count is incremented by 1. For example, when a UAV-IRS corresponds to 10 IoTDs, if the AoI of 3 IoTDs in the nth time slot exceeds the corresponding threshold τ, then Count equals 3. The value is 3 / 10 = 0.3.
[0157] 6) Save the experience samples {s(n), a(n), r(n), s(n+1)} to the experience replay pool;
[0158] 7) Randomly select a small batch of samples {s(n'),a(n'),r(n'),s(n'+1)} from the experience replay pool to train the network;
[0159] 8) Calculate the objective function of DQN
[0160] 9) In the loss function (y(n')-Q(s(n'),a(n');θ)) 2 We perform gradient descent in the direction of parameter θ to update parameter θ;
[0161] 10) Update the parameters θ of the target Q-network every 10 time slots. - That is, let θ - =θ;
[0162] 11) Set n = n + 1. If n has not reached the upper limit, proceed to step 3) and continue looping from step 3) to step 10); otherwise, proceed to step 12).
[0163] 12) Make E 11 =E 11 +1, E 11 If the maximum number of iterations is reached, all calculations are terminated; otherwise, proceed to step 2) and continue looping from step 2) to step 11).
[0164] Here, s(n) represents the state at time slot n, s[n] = (A[n]), A[n] = (A1[n], ..., A i [n],......A k [n]), A k [n] represents the AoI of the k-th IoTD in the n-th time slot; when n=1, the AoI of each IoTD is initialized to 1, i.e., A k[1]=1,k∈K. Here, a(n) is the action at time slot n, a[n]=(ζ[n]), ζ[n]=k means that in the nth time slot, the UAV-IRS decides to schedule the service for the kth device and updates and uploads its information to the BS.
[0165] This invention integrates UAV and IRS, enabling remote IoT devices to relay their transmitted information to the base station for processing. This increases the freshness of IoTD information, thereby minimizing the AoI (Aspect-Oriented Integrity) and facilitating low-latency, reliable communication. The invention uses a Q-table obtained through reinforcement learning to hover the UAV in an optimal position and determine the optimal IRS reflection phase shift matrix at that position. By combining UAV position optimization with IRS beamforming, the signal-to-noise ratio (SNR) of the uplink signal transmitted by each device is greater than a predefined threshold, ensuring successful data decoding at the base station receiver and maximizing the transmission rate of the device's signals. Furthermore, this invention utilizes a pre-trained DQN model for relaying IoTD information, increasing the freshness of IoTD information and minimizing its AoI.
[0166] The technical concept of the present invention will be further elaborated below:
[0167] This invention minimizes an objective function, which is the sum of two terms. The first term is the sum of the AoIs of all IoTDs corresponding to each UAV-IRS, taking into account ultra-reliable low-latency communication. The second term is designed as the sum of the probabilities that the AoI of each corresponding IoTD (i.e., device k) exceeds a predefined threshold in any time slot n. Therefore, the AoI minimization problem should consider a trade-off between minimizing the AoI of each corresponding IoTD and minimizing the probability that the AoI of each corresponding IoTD exceeds its predefined threshold. Thus, the optimization problem to be solved and the constraints of this invention can be expressed as follows:
[0168]
[0169]
[0170]
[0171]
[0172]
[0173]
[0174]
[0175]
[0176]
[0177] in, Constraint (a) indicates whether each IoTD is scheduled using 1 and 0 respectively; constraints (b) and (c) state that each IoTD can only be scheduled by one UAV-IRS in each time slot, and each UAV-IRS can only schedule and serve one IoTD; constraint (d) restricts the number of IoTDs scheduled and served in each time slot to be equal to the number of UAV-IRS M. Constraints (e), (f), and (g) represent the spatial constraints of UAV flight. Constraint (h) restricts the value of the phase shift of the reflection array elements when performing beamforming design on the IRS.
[0178] Specifically, this invention decomposes the aforementioned non-convex problem into three sub-problems: the first sub-problem is optimizing the deployment of UAVs; the second sub-problem is optimizing the reflection phase shift matrix of the IRS; and the third sub-problem is optimizing the AoI (Aspect-Oriented Integration) minimization scheduling. Then, corresponding solutions are formulated to address these three sub-problems. First, unlike when an IRS is installed on a ground-based building, its deployment range is limited to a straight line or a two-dimensional plane. When an IRS is suspended above a UAV, its deployment range expands to a certain space, significantly increasing the number of deployable locations and making it more flexible. While brute-force search can obtain a globally optimal solution, its computational complexity is too high. Heuristic algorithms are faster but struggle to find the optimal solution. Therefore, it is necessary to propose a new solution that balances complexity and optimal solution. Thus, this invention uses a reinforcement learning algorithm (Q-learning) model to solve this problem, optimizing the deployment location of the UAV-IRS. Furthermore, the beamforming design scheme generally differs depending on whether the IRS array elements use continuous or discrete phase shifts. For IRSs using continuous phase shifts, a numerical optimization problem is typically considered, and the optimization of the IRS phase shift matrix is often a non-convex problem. For IRSs employing discrete phase shifts, the number of IRS elements and element resolution are fixed, resulting in a limited number of selectable phase shift matrices. Furthermore, existing literature demonstrates that even low-resolution discrete phase shifts, such as 3-bit phase shifts, can approximate the performance of elements using continuous phase shifts. Therefore, this invention proposes an IRS with discrete phase shifts and a novel beamforming scheme based on the tabu search (TS) algorithm for the limited phase shift matrices. Finally, compared to traditional data networks, IoT real-time monitoring systems possess a unique characteristic: Markov characteristics. Under this characteristic, existing state updates can be completely replaced by newly arrived state updates. Therefore, this invention describes the IoT device data state update problem as a Markov decision process. Considering the challenge of unknown traffic generation patterns in IoT devices, the optimization problem is difficult to solve. Therefore, this invention proposes a scheduling algorithm based on the reinforcement learning DQN model to solve the AoI minimization scheduling optimization problem.
[0179] The following experimental data further illustrates the technical effects achievable by the method proposed in this invention.
[0180] (1) Evaluation of UAV position optimization algorithm
[0181] Here, we set up 8 IoTDs randomly placed within a 1000m*1000m area, with the BS (Base Station) positioned at (0,0,50). The distance between the BS and the IoTDs is 500 meters, with building obstacles blocking the line-of-sight link between the IoTDs and the BS. A reinforcement learning-based location optimization algorithm was simulated using the Tensorflow framework. Figure 4These are the training results obtained by applying the algorithm of this invention. We trained a total of 100 episodes, or 100 rounds. We observed that as the number of training rounds increases, the transmission rate increases significantly and eventually converges to a stable point. This proves that the reinforcement learning-based position optimization algorithm of this invention can successfully achieve optimized deployment of UAV hovering positions.
[0182] Figure 5 In this paper, we compare the proposed location optimization algorithm (Opt UAV-IRS) with the random location optimization algorithm (Random UAV-IRS). The random location optimization algorithm involves randomly generating 1000 hovering positions for each specific transmission power, calculating the sum of the rates of all devices, and then summing and averaging the results to obtain the transmission rate. It can be seen that as the transmission power increases, the sum of the device rates corresponding to both location deployment methods continuously increases. The sum of the rates obtained from the optimal deployment position is significantly higher than that obtained from the random deployment position, and the performance gap between the two is widening. This proves that finding the optimal deployment position for UAVs is meaningful, as it ensures that the signal-to-noise ratio of the signals transmitted by each IoT device is greater than a predefined threshold γ. th At the same time, it increases the transmission rate of the device during uplink communication.
[0183] (2) Evaluation of IRS reflection phase shift matrix design method
[0184] exist Figure 6 To illustrate the impact of optimizing IRS element phase shift (Opt-phase shift) and random-phase shift (Random-phase shift) based on the TS algorithm proposed in this invention on performance, where the horizontal axis represents the number of IRS elements, it can be seen that, with the same number of elements, optimizing the reflection phase shift can bring about a significant performance improvement.
[0185] (3) Evaluation of AoI Minimization Scheduling Optimization Algorithm
[0186] We implemented our proposed DQN-based AoI minimization scheduling optimization algorithm using the Tensorflow framework. In the simulation settings, we assumed the number of time slots N = 100, the UAV was hovering in the optimal position, and the IRS reflection phase matrix was set to the default optimal configuration.
[0187] First of all Figure 7 The convergence of the proposed DQN scheduling algorithm was verified in the experiment, with one UAV-IRS and eight IoTDs in the network. It can be observed that, with reasonable setting of neural network hyperparameters, the algorithm proposed in this invention can successfully converge to a stable point. After training, the model can significantly reduce the overall AoI.
[0188] Furthermore, two benchmark scheduling algorithms and the DQN scheduling optimization algorithm of this invention are introduced for comparison to evaluate the effectiveness of our algorithm. These two benchmark algorithms are:
[0189] 1) Greedy scheduling optimization algorithm: This is a heuristic algorithm in which UAV-IRS always selects the IoTD with the largest current AoI in each time slot.
[0190] 2) Random scheduling optimization algorithm: The UAV-IRS randomly selects and schedules an IoTD in each time slot to deliver its state update information, exploring all possible actions, and therefore may perform some actions that reduce AoI.
[0191] Figure 8 The impact of the number of IoTDs on our proposed DQN algorithm (dqn) and the baseline scheduling algorithm was verified. Figure 8 As shown, with the increase in the number of IoTDs in the network, the scheduling situation becomes more complex, and the value of the objective function continuously increases. The random scheduling algorithm performs the worst, followed by the greedy scheduling algorithm. The DQN scheduling optimization algorithm proposed in this invention outperforms other benchmark scheduling algorithms, and the performance gap between its algorithm and the other two benchmark scheduling algorithms widens with the increase in the number of devices. Regardless of the number of IoTDs, the algorithm proposed in this invention can minimize the value of the objective function.
[0192] Figure 9 The paper compares the proposed DQN scheduling optimization algorithm with two other benchmark scheduling optimization algorithms under two traffic generation modes: random traffic generation mode (each IoTD generates new data packets in each time slot) and periodic traffic generation mode (each IoTD generates data packets at fixed time intervals). It can be seen that, regardless of the traffic generation mode, the objective function value continuously increases with the number of IoTDs in the network. Furthermore, the objective function value corresponding to the random traffic generation mode is lower than that corresponding to the periodic traffic generation mode. This is because the random traffic generation mode generates new and updated data packets in each time slot, ensuring the freshness of the data sent by the IoTDs. Regardless of the traffic generation mode, the DQN-based scheduling optimization algorithm proposed in this invention shows better performance compared to the other two benchmark scheduling algorithms. This indicates that the algorithm model proposed in this invention can learn the traffic generation patterns on IoTDs after training, even without prior knowledge of the IoTD traffic generation patterns; this is something other benchmark scheduling algorithms cannot do.
[0193] exist Figure 10In this paper, we assume there are 8 IoTDs and discuss the impact of multiple UAV-IRS on the sum of AoI. We assume these IoTDs are evenly distributed among the multiple UAV-IRS. The results are as follows... Figure 10 As shown, with the increase in the number of available UAV-IRS, the opportunity for each IoTD to be scheduled in a timely manner also increases, so the objective function value shows a decreasing trend. Furthermore, in each case, the DQN scheduling optimization algorithm proposed in this invention still outperforms other benchmark scheduling algorithms. The above description is a further detailed explanation of this invention in conjunction with specific preferred embodiments, and it should not be considered that the specific implementation of this invention is limited to these descriptions. For those skilled in the art, several simple deductions or substitutions can be made without departing from the concept of this invention, and all such deductions or substitutions should be considered to fall within the protection scope of this invention.
Claims
1. A method for optimizing IRS configuration and AoI scheduling in a UAV-IRS-assisted IoT network, characterized in that, The method, applied to a drone equipped with a smart reflective surface (IRS) and corresponding to multiple IoT devices in an IoT network, includes: Obtain its own current location information; The optimal movement direction is determined based on the current location information and the pre-trained Q-table. The Q-table stores multiple location information, a preset movement direction corresponding to each location information, and the Q value of each location information under each corresponding preset movement direction. Move along the optimal direction to the next location; At the next location information, the reflection phase shift matrix of the IRS is determined by a combination optimization method based on the number of surface array elements of the IRS, the preset phase resolution, and the signal amplitude of the IRS; wherein, the combination optimization method is a tabu search algorithm. At the next location information, the signal-to-noise ratio of each corresponding IoT device is determined based on the reflection phase shift matrix; When the signal-to-noise ratio of each corresponding IoT device is greater than or equal to a preset threshold, the information age AoI of each corresponding IoT device in the current time slot is obtained, the AoI is input into the pre-trained DQN model, and the IoT devices to be relayed in the current time slot are output. The information sent by the IoT device to be relayed is sent to the base station.
2. The method for IRS configuration optimization and AoI scheduling optimization in a UAV-IRS-assisted IoT network according to claim 1, characterized in that, The method further includes: When the signal-to-noise ratio of at least one corresponding IoT device is less than the preset threshold, the optimal movement direction is re-determined based on the next location information; Move along the newly determined optimal direction to the next location; The reflection phase shift matrix of the IRS is redefined at the next location information; At the next location information, the signal-to-noise ratio of each corresponding IoT device is re-determined based on the reflection phase shift matrix of the re-determined IRS; When the signal-to-noise ratio of each corresponding IoT device that is re-determined is greater than or equal to a preset threshold, the IoT device to be relayed is re-determined, and the information sent by the re-determined IoT device to be relayed is sent to the base station.
3. The method for IRS configuration optimization and AoI scheduling optimization in a UAV-IRS-assisted IoT network according to claim 1, characterized in that, The step of determining the reflection phase shift matrix of the IRS at the next location information based on the number of surface array elements of the IRS, the preset phase resolution, and the signal amplitude of the IRS through a combined optimization method includes: At the next location information, a reflection phase shift matrix is initialized based on the number of surface array elements of the IRS, the preset phase resolution, and the signal amplitude of the IRS; The initial reflection phase shift matrix is used as the initial solution, the signal-to-noise ratio is used as the objective function, the tabu search algorithm is used to search for the optimal reflection phase shift matrix, and the optimal reflection phase shift matrix is used as the determined reflection phase shift matrix.
4. The method for IRS configuration optimization and AoI scheduling optimization in a UAV-IRS-assisted IoT network according to claim 1, characterized in that, Before determining the optimal movement direction based on the current location information and the pre-trained Q-table, the method further includes: Initialize the Q-table and the drone's position to obtain the initial Q-table and initial position information; The i-th round of training is performed based on the initial Q-table and the initial position information to iteratively update the initial Q-table in the i-th round. The i-th round of training ends when the number of training iterations in the i-th round reaches E1 or the movement direction obtained in the z-th training iteration in the i-th round meets the preset conditions, and the updated Q-table in the i-th round is obtained. Each round of training includes E1 training iterations; z and E1 are both integers greater than or equal to 1. Reinitialize the drone's position to obtain updated initial position information; Training is performed in the (i+1)th round based on the Q-table updated in the i-th round and the initial position information of the update, so as to perform the (i+1)th round iterative update of the Q-table updated in the i-th round. This continues until the training round reaches E2, at which point the Q-table updated in the E2th round is used as the pre-trained Q-table; E2 is an integer greater than 1.
5. The method for IRS configuration optimization and AoI scheduling optimization in a UAV-IRS-assisted IoT network according to claim 4, characterized in that, The process involves training the initial Q-table and the initial position information for the i-th round, iteratively updating the initial Q-table for the i-th round. The training ends when the number of training iterations in the i-th round reaches E1 or the movement direction obtained in the z-th training iteration of the i-th round meets a preset condition, resulting in the updated Q-table for the i-th round. This update includes: During the c-th training iteration in the i-th training round, the c-th Q-table and the c-th position information of the UAV are obtained; c is an integer greater than or equal to 1; when c is 1, the c-th Q-table is the initial Q-table, and the c-th position information is the initial position information. At the c-th location information, the sum of the transmission rates of all corresponding IoT devices is determined to obtain the first transmission rate value; Select the movement direction with the largest Q value from the multiple preset movement directions corresponding to the c-th position information in the Q table for the c-th time; When the selected movement direction is hovering, the i-th round of training ends, and the Q-table of the c-th round is used as the Q-table of the i-th round; If the selected direction of movement is not hovering, move along the selected direction to obtain the (c+1)th position information; At the (c+1)th location information, determine the sum of the transmission rates of all corresponding IoT devices to obtain the second transmission rate value; Based on the first transmission rate value, the second transmission rate value, the transmission rate of each corresponding IoT device at the (c+1)th location information, and the preset threshold, the cth reward value is determined; Update the Q-table for the c-th reward based on the c-th reward value to obtain the Q-table for the (c+1)-th reward. When c is less than E1, the (c+1)th training is performed based on the Q-table of the (c+1)th training and the (c+1)th position information. This process continues until the number of training iterations reaches E1 or the movement direction obtained from the z-th training iteration is hovering. At this point, the i-th round of training ends, and the updated Q-table of the i-th round is obtained.
6. The method for IRS configuration optimization and AoI scheduling optimization in a UAV-IRS-assisted IoT network according to claim 5, characterized in that, The step of determining the c-th reward value based on the first transmission rate value, the second transmission rate value, the transmission rate of each corresponding IoT device at the (c+1)-th location information, and the preset threshold includes: The preset threshold is compared with the transmission rate of each corresponding IoT device at the (c+1)th location information to obtain the number of transmission rates that are greater than or equal to the preset threshold; A penalty value is obtained based on the relationship between the obtained quantity and the total number of IoT devices corresponding to the drone; wherein, when the obtained quantity is less than the total quantity, the penalty value is a preset value greater than 0; when the obtained quantity is equal to the total quantity, the penalty value is 0. The difference between the second transmission rate value and the first transmission rate value is obtained; The c-th reward value is obtained by subtracting the difference from the penalty value.
7. The method for IRS configuration optimization and AoI scheduling optimization in a UAV-IRS-assisted IoT network according to claim 5, characterized in that, The step of determining the sum of the transmission rates of all corresponding IoT devices at the c-th location information to obtain the first transmission rate value includes: At the c-th location information, the channel gain between the UAV and the base station is determined. Based on the determined channel gain, the preset reflection phase shift matrix, the preset noise variance, and the channel gain between each corresponding IoT device and the base station, the signal-to-noise ratio of each corresponding IoT device is determined. Based on the signal-to-noise ratio of each corresponding IoT device at the c-th location information, determine the transmission rate of each corresponding IoT device at the c-th location information; The first transmission rate value is obtained by calculating the sum of the transmission rates of each corresponding IoT device at the c-th location information.
8. The method for IRS configuration optimization and AoI scheduling optimization in a UAV-IRS-assisted IoT network according to claim 1, characterized in that, Before obtaining the AoI of each corresponding IoT device in the current time slot, inputting the AoI into the pre-trained DQN model, and outputting the IoT devices to be relayed in the current time slot, the method includes: Initialize the DQN model to obtain the initial DQN model; During the j-th iteration of training, N experience samples are acquired and placed into the experience replay pool. The d-th experience sample includes: the AoI of each corresponding IoT device in the d-th time slot, the IoT device to be relayed in the d-th time slot, the reward value in the d-th time slot, and the AoI of each corresponding IoT device in the (d+1)-th time slot after relaying the IoT device to be relayed in the d-th time slot; j is an integer greater than or equal to 1; d is an integer from 1 to N. Experience samples are randomly obtained from the experience replay pool, and the initial DQN model is trained in the j-th round to obtain the DQN model trained in the j-th round. During the (j+1)th iteration of training, N new experience samples are acquired and placed into the experience replay pool. Experience samples are then randomly acquired from the experience replay pool to perform the (j+1)th iteration of training on the DQN model trained in the jth round. This process continues until the preset number of training rounds is reached, at which point training stops, and the pre-trained DQN model is obtained.
9. The method for IRS configuration optimization and AoI scheduling optimization in a UAV-IRS-assisted IoT network according to claim 8, characterized in that, The reward value for the d-th time slot is calculated using the following formula: ; in, This refers to the total number of IoT devices associated with the drone itself. For the first The AoI of a device in the d-th time slot, To preset the age threshold for information, For the first The AoI of the device in the d-th time slot is greater than The probability, This is the preset amplification factor.