A Multi-UAV Cooperative Semantic Communication Resource Allocation and Trajectory Optimization Method
By proposing a multi-UAV collaborative semantic communication resource allocation and trajectory optimization method, the problem of deep integration between UAVs and semantic communication systems was solved, achieving efficient communication resource allocation and trajectory optimization in resource-constrained scenarios, thereby improving the system's practical application capabilities and learning efficiency.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- SOUTHWEST JIAOTONG UNIV
- Filing Date
- 2025-04-28
- Publication Date
- 2026-06-30
AI Technical Summary
Existing technologies for the deep integration of UAVs and semantic communication systems suffer from problems such as the strong coupling relationship caused by the limited onboard resources of UAVs and the dynamic channel characteristics, the conflict between the timeliness of semantic information transmission and the accuracy of task execution, partial observability and environmental non-stationarity in multi-UAV collaborative scenarios, and lack a globally optimal solution.
A multi-UAV collaborative semantic communication resource allocation and trajectory optimization method is adopted. By acquiring semantic data to build a model, delay constraints and performance indicators are constructed, and a nonlinear multi-constraint optimization model is established. Reinforcement learning and pheromone mechanisms are used to optimize the UAV state and trajectory, thereby achieving resource allocation and trajectory optimization.
It effectively solves the semantic communication problem in resource-constrained scenarios of UAVs, improves the efficiency of communication resource allocation, addresses the computational complexity and solution time problems of traditional optimization methods, and achieves rapid convergence and realizes sparse rewards and learning efficiency for multi-agent collaboration.
Smart Images

Figure CN120475445B_ABST
Abstract
Description
Technical Field
[0001] This invention belongs to the field of data processing technology, specifically relating to a method for multi-UAV collaborative semantic communication resource allocation and trajectory optimization. Background Technology
[0002] Task-oriented semantic communication systems differ from traditional bit-centric communication systems. Semantic communication focuses on the "meaning" of information and its relevance to a specific task, rather than simple data transmission. This communication model focuses on optimizing the execution of intelligent tasks. By receiving, understanding, and utilizing the sender's semantic information, it supports more intelligent and efficient task processing, avoids unnecessary data transmission, improves communication efficiency, and better meets the quality of service requirements of intelligent tasks.
[0003] Existing research on task-oriented semantic communication systems largely focuses on leveraging deep learning techniques to enhance the information recognition, extraction, and parsing capabilities of semantic codecs, while neglecting the needs of resource-constrained devices. In scenarios such as the Industrial Internet of Things (IIoT) and smart wearable devices, low-power sensors and edge devices have limited computing and storage resources, making it difficult to independently deploy and run large-scale deep network models. Achieving efficient semantic communication under these conditions is a pressing issue. Furthermore, traditional fixed network architectures are insufficient for disaster emergency communications or communications in remote areas. Against this backdrop, a flexible communication architecture utilizing drones as edge servers and micro base stations emerges as a potential solution. Drones can not only act as edge computing devices, providing task-related semantic processing capabilities, but also serve as temporary base stations to enhance communication coverage, providing computing and communication support for resource-constrained devices. This architecture holds immense potential for application in complex scenarios.
[0004] However, the deep integration of UAVs and semantic communication systems still faces multiple challenges: First, the limited onboard resources and dynamic channel characteristics of UAVs lead to a strong coupling relationship between parameters such as semantic compression rate, flight trajectory, and task queue; second, there is an inherent conflict between the timeliness requirements of semantic information transmission and the accuracy of task execution, and traditional resource allocation methods lack explicit modeling of semantic value measurement; third, the partial observability and environmental non-stationarity in multi-UAV collaborative scenarios make it difficult for centralized optimization strategies to achieve global optimal efficiency. Existing research mainly focuses on semantic model optimization under static networks, and a systematic solution has not yet been formed for the collaborative optimization of key elements such as UAV kinematic constraints and multi-agent cooperation mechanisms. Summary of the Invention
[0005] To address the issue of deep integration between unmanned aerial vehicles (UAVs) and semantic communication systems, this invention proposes a method for multi-UAV collaborative semantic communication resource allocation and trajectory optimization.
[0006] The technical solution of this invention is: a method for multi-UAV cooperative semantic communication resource allocation and trajectory optimization, comprising the following steps:
[0007] S1. Obtain semantic data and construct a semantic communication model;
[0008] S2. Based on the semantic communication model, construct the latency constraint model and the overall semantic performance index;
[0009] S3. Based on the time delay constraint model and the overall semantic performance index, construct a nonlinear multi-constraint optimization model for the UAV;
[0010] S4. Based on a nonlinear multi-constraint optimization model, determine the UAV's state information, observation information, action information, and reward function;
[0011] S5. Based on the UAV's state information, observation information, action information, and reward function, perform reinforcement learning to complete the UAV's communication resource allocation and trajectory optimization.
[0012] Furthermore, S1 includes the following sub-steps:
[0013] S11. Obtain semantic data and extract semantic features;
[0014] S12. Compress the semantic features to obtain the compressed semantic features;
[0015] S13. Channel coding is performed on the compressed semantic features to obtain symbols suitable for channel transmission;
[0016] S14. Transmit the symbols suitable for channel transmission to the signal receiver;
[0017] S15. The signal receiver performs channel decoding on the received signal to obtain the recovered semantics; S16. The recovered semantics are input into the semantic decoder to obtain the classification result;
[0018] S17. Based on the classification results, construct a semantic communication model.
[0019] Furthermore, in S11, semantic features The expression is:
[0020]
[0021] In the formula, I represents semantic data, and S α (·) represents a semantic coding network with parameter set α; in S12, the expression for the compressed semantic feature X is:
[0022]
[0023] In the formula, Co (·) represents the semantic compression function, and o represents the semantic compression ratio;
[0024] In S13, the expression for the symbol M applicable to channel transmission is:
[0025] M = Q σ (X);
[0026] In the formula, Q σ (·) denotes a channel encoder network with parameter set σ;
[0027] In S14, the expression for the received signal Y by the signal receiver is:
[0028] Y = hM + n;
[0029] In the formula, h represents the channel gain, and n represents Gaussian white noise;
[0030] In S15, the expression for restoring semantic X′ is:
[0031]
[0032] In the formula, This represents a channel decoder network with network parameters χ.
[0033] In S16, the expression for the classification result p is:
[0034]
[0035] In the formula, This indicates a semantic decoder with parameters set to β;
[0036] In S17, the expression for semantic communication model A is:
[0037]
[0038] In the formula, α1 represents the first fitted variable, α2 represents the second fitted variable, and α3 represents the third fitted variable.
[0039] Furthermore, S2 includes the following sub-steps:
[0040] S21. Based on the semantic communication model, generate a semantic unloading task and determine the semantic feature size of the semantic unloading task;
[0041] S22. Determine the transmission delay based on the semantic feature size of the semantic unloading task;
[0042] S23. Determine semantic performance indicators;
[0043] S24. Construct a delay constraint model based on the transmission delay of the semantic offloading task;
[0044] S25. Determine the overall semantic performance index based on the semantic performance index.
[0045] Furthermore, in S21, the semantic feature size of the semantic unloading task completed by the nth UAV to the mth terminal device in time slot t. The expression is:
[0046]
[0047] In the formula, This represents the initial size of the semantic features to be transmitted before compression. This represents the semantic compression rate decision for all tasks that the nth UAV needs to transmit during the scheduling process of the mth terminal device in time slot t.
[0048] In S22, the transmission delay for the nth UAV to complete the semantic offloading task to the mth terminal device in time slot t. The expression is:
[0049]
[0050] In the formula, L m This represents the semantic offloading task for the m-th terminal device. This represents the scheduling strategy between the nth drone and the mth terminal device in time slot t. P represents the transmission rate between the nth drone and the mth terminal device. u Indicates transmission power;
[0051] In S23, the semantic performance metrics of the nth UAV in time slot t. The expression is:
[0052]
[0053] In the formula, This represents the actual task execution accuracy of a single semantic task, where M represents the number of terminal devices.
[0054] In S24, the expression for the time delay constraint model is:
[0055]
[0056] In the formula, T wait T represents the delay before the last device begins to be served. max This represents the maximum time delay constraint, where T represents the maximum flight time slot and N represents the number of drones.
[0057] In S25, the overall semantic performance index S total The expression is:
[0058]
[0059] Furthermore, in S3, the expression for the nonlinear multi-constraint optimization model of the UAV is:
[0060]
[0061] In the formula, S total This represents the overall semantic performance metric. This represents the set of locations for all drones. This represents the set of scheduling strategies between the drone and the terminal device. Let C1 represent the collision constraint for flight angle, C2 represent the collision constraint for flight speed, C3 represent the collision constraint for flight boundary, C4 represent the collision constraint for the UAV, C5 represent the constraint that at most N terminal devices can simultaneously receive the semantic offloading service provided by the UAV in a time slot, C6 represent the constraint that the UAV needs to provide services to all terminal devices, and C7 represent the constraint that the UAV needs to complete the task within T time slots. max Within this range, C8 indicates that the semantic performance of the constraint must be greater than the threshold. This represents the horizontal direction of the nth UAV flying in time slot t. V represents the flight speed of the nth drone in time slot t. max C represents the maximum speed. L Indicates the upper boundary of the service area. Let C represent the movement strategy of the nth drone in time slot t. U Indicates the lower boundary of the service area. D represents the movement strategy of the i-th iteration variable in time slot t. min This indicates the closest distance allowed during drone flight. This represents the scheduling strategy between the nth drone and the mth terminal device in time slot t. This indicates whether the semantic offloading task of the nth drone to the mth terminal device has been completed. T represents the transmission delay of the nth UAV completing the semantic offloading task to the mth terminal device in time slot t. wait T represents the latency of the last device being served. max This represents the maximum delay constraint. A represents the actual task execution accuracy of a single semantic task. min The threshold is represented by T, the maximum flight time slot is represented by N, the number of drones is represented by M, and the number of terminal devices is represented by M.
[0062] Furthermore, in S4, the UAV's state information adopts a global state; the global state s t The expression is:
[0063]
[0064] In the formula, Indicates the first intermediate variable. This indicates whether the semantic offloading task of the nth drone to the mth terminal device has been completed. Indicates the second intermediate variable. This represents the scheduling strategy between the nth drone and the mth terminal device in time slot t. Indicates the third intermediate variable. This represents the movement strategy of the nth drone in time slot t, where M represents the number of terminal devices. This represents the pheromones collected by the first drone in time slot t. This represents the pheromones collected by the nth drone in time slot t. κ represents the pheromone collected by the nth drone in time slot t-1. cov Let ω represent a constant, and let ω represent the semantic performance weight. This represents the semantic compression rate decision for all tasks that the nth UAV needs to transmit during the scheduling process of the mth terminal device in time slot t. This represents the semantic performance index of the nth UAV in time slot t. κ represents the transmission delay at which the nth UAV completes the semantic offloading task to the mth terminal device in time slot t. dis This represents the penalty that a drone receives for each time slot it occupies, ρ c Indicates the penalty for crossing the boundary, ρ ob Indicates a collision penalty;
[0065] In S4, the observation information of the UAV adopts the local observation state; the local observation state of the nth UAV in time slot t. The expression is:
[0066]
[0067] In S4, the action information of the nth UAV in time slot t. The expression is:
[0068]
[0069] In the formula, This represents the flight speed of the nth drone in time slot t. This indicates the horizontal direction of the nth UAV flying in time slot t;
[0070] In S4, the reward function of the nth drone in time slot t. The expression is:
[0071]
[0072] In the formula, T max N represents the maximum time delay constraint. re T represents the reward received by the drone when semantic offloading services are provided to all terminal devices and latency constraints are met. wait This indicates the latency when the last device was served. This indicates whether the semantic offloading task of the nth drone to the mth terminal device was completed in time slot l, where N represents the number of drones. r represents the transmission delay when the nth UAV completes the semantic offloading task to the mth terminal device in time slot l. tanh This indicates a normalized reward.
[0073] Furthermore, S5 includes the following sub-steps:
[0074] S51. Based on the UAV's state information, observation information, action information, and reward function, form tuples;
[0075] S52. Store the tuple into the experience replay pool until the drone completes all semantic offloading tasks;
[0076] S53. When the UAV completes all semantic offloading tasks, randomly select several empirical data from the empirical replay pool to update the parameters of the Critic network.
[0077] S54. Utilize the updated Critic network to complete the allocation of communication resources and trajectory optimization for the UAV.
[0078] Furthermore, in S51, the expression for a tuple is: In the formula, This represents the local observation state of the nth UAV in time slot t. This represents the action information of the nth drone in time slot t. Let n be the reward function for the nth drone in time slot t. This represents the local observation status of the nth UAV in time slot t+1.
[0079] The beneficial effects of this invention are:
[0080] (1) This invention utilizes UAVs as a supplement to the existing task-oriented semantic communication system architecture, which can effectively apply task-oriented semantic communication systems in scenarios with limited communication resources.
[0081] (2) This invention effectively completes the allocation of communication resources through dynamic decision-making on semantic compression rate, further enhancing the practical application capability of task-oriented semantic communication systems;
[0082] (3) This invention utilizes a multi-agent reinforcement learning algorithm, which can effectively address the challenges of computational complexity and solution time in traditional optimization methods.
[0083] (4) Based on the pheromone mechanism, this invention can effectively solve the problems of sparse rewards and learning efficiency in multi-agent reinforcement learning algorithms. The algorithm can converge and be applied quickly. Attached Figure Description
[0084] Figure 1 A flowchart illustrating a method for multi-UAV collaborative semantic communication resource allocation and trajectory optimization.
[0085] Figure 2 This is a diagram of the neural network structure.
[0086] Figure 3 The first schematic diagram shows the dynamic adjustment process of flight path trajectories and semantic compression rates for different numbers of drones.
[0087] Figure 4 The second schematic diagram illustrates the dynamic adjustment process of flight path trajectories and semantic compression rate under different numbers of drones;
[0088] Figure 5 The third schematic diagram illustrates the dynamic adjustment process of flight path trajectories and semantic compression rates for different numbers of drones;
[0089] Figure 6 The fourth schematic diagram illustrates the dynamic adjustment process of flight path trajectories and semantic compression rates for different numbers of drones; Detailed Implementation
[0090] The embodiments of the present invention will be further described below with reference to the accompanying drawings.
[0091] like Figure 1 As shown, this invention provides a method for multi-UAV cooperative semantic communication resource allocation and trajectory optimization, including the following steps:
[0092] S1. Obtain semantic data and construct a semantic communication model;
[0093] S2. Based on the semantic communication model, construct the latency constraint model and the overall semantic performance index;
[0094] S3. Based on the time delay constraint model and the overall semantic performance index, construct a nonlinear multi-constraint optimization model for the UAV;
[0095] S4. Based on a nonlinear multi-constraint optimization model, determine the UAV's state information, observation information, action information, and reward function;
[0096] S5. Based on the UAV's state information, observation information, action information, and reward function, perform reinforcement learning to complete the UAV's communication resource allocation and trajectory optimization.
[0097] This invention addresses the time-delay constraints of time-sensitive tasks in practical application scenarios such as disaster emergency response. It introduces a time constraint mechanism, proposing a nonlinear optimization problem with spatiotemporal constraints to ensure that UAVs complete area coverage and data collection within a time threshold, avoiding semantic timeliness loss due to task backlog. For the established multi-constraint nonlinear optimization problem, a multi-agent reinforcement learning algorithm is used to solve it, addressing the computational efficiency and practical feasibility issues of traditional optimization methods. This algorithm solves the resource allocation and trajectory coordination problem of TOSC (Task-Oriented Semantic Communication System) in multi-UAV collaborative scenarios. The algorithm does not rely on the precise location information of dynamic users, but rather drives the UAV to adaptively optimize trajectory and allocate resources to terminal devices based on real-time channel gain information. To address the sparse reward and learning efficiency issues in the training process of multi-agent reinforcement learning algorithms, an ant colony algorithm is proposed, utilizing the pheromone acquisition and evaporation mechanism.
[0098] In this embodiment of the invention, S1 includes the following sub-steps:
[0099] S11. Obtain semantic data and extract semantic features;
[0100] S12. Compress the semantic features to obtain the compressed semantic features;
[0101] S13. Channel coding is performed on the compressed semantic features to obtain symbols suitable for channel transmission;
[0102] S14. Transmit the symbols suitable for channel transmission to the signal receiver;
[0103] S15. The signal receiver performs channel decoding on the received signal to obtain the recovered semantics;
[0104] S16. Input the recovered semantics into the semantic decoder to obtain the classification result;
[0105] S17. Based on the classification results, construct a semantic communication model.
[0106] In this embodiment of the invention, in S11, semantic features The expression is:
[0107]
[0108] In the formula, I represents semantic data, and S α (·) denotes a semantic coding network with parameter set α;
[0109] In S12, the expression for the compressed semantic feature X is:
[0110]
[0111] In the formula, C o (·) represents the semantic compression function, and o represents the semantic compression ratio;
[0112] In S13, the expression for the symbol M applicable to channel transmission is:
[0113] M = Q σ (X);
[0114] In the formula, Q σ (·) denotes a channel encoder network with parameter set σ;
[0115] In S14, the expression for the received signal Y by the signal receiver is:
[0116] Y = hM + n;
[0117] In the formula, h represents the channel gain, and n represents Gaussian white noise;
[0118] In S15, the expression for restoring semantic X′ is:
[0119]
[0120] In the formula, This represents a channel decoder network with network parameters χ.
[0121] In S16, the expression for the classification result p is:
[0122]
[0123] In the formula, This indicates a semantic decoder with parameters set to β;
[0124] In S17, the expression for semantic communication model A is:
[0125]
[0126] In the formula, α1 represents the first fitted variable, α2 represents the second fitted variable, and α3 represents the third fitted variable.
[0127] In this embodiment of the invention, S2 includes the following sub-steps:
[0128] S21. Based on the semantic communication model, generate a semantic unloading task and determine the semantic feature size of the semantic unloading task;
[0129] S22. Determine the transmission delay based on the semantic feature size of the semantic unloading task;
[0130] S23. Determine semantic performance indicators;
[0131] S24. Construct a delay constraint model based on the transmission delay of the semantic offloading task;
[0132] S25. Determine the overall semantic performance index based on the semantic performance index.
[0133] In this embodiment of the invention, S21 is the semantic feature size of the semantic offloading task of the nth UAV to the mth terminal device completed in time slot t. The expression is:
[0134]
[0135] In the formula, This represents the initial size of the semantic features to be transmitted before compression. This represents the semantic compression rate decision for all tasks that the nth UAV needs to transmit during the scheduling process of the mth terminal device in time slot t.
[0136] In S22, the transmission delay for the nth UAV to complete the semantic offloading task to the mth terminal device in time slot t. The expression is:
[0137]
[0138] In the formula, L m This represents the semantic offloading task for the m-th terminal device. This represents the scheduling strategy between the nth drone and the mth terminal device in time slot t. P represents the transmission rate between the nth drone and the mth terminal device. u Indicates transmission power;
[0139] In S23, the semantic performance metrics of the nth UAV in time slot t. The expression is:
[0140]
[0141] In the formula, This represents the actual task execution accuracy of a single semantic task, where M represents the number of terminal devices.
[0142] In S24, the expression for the time delay constraint model is:
[0143]
[0144] In the formula, Twait T represents the delay before the last device begins to be served. max This represents the maximum time delay constraint, where T represents the maximum flight time slot and N represents the number of drones.
[0145] In S25, the overall semantic performance index S total The expression is:
[0146]
[0147] In this embodiment of the invention, in S3, the expression for the nonlinear multi-constraint optimization model of the UAV is:
[0148]
[0149]
[0150] In the formula, S total This represents the overall semantic performance metric. This represents the set of locations for all drones. This represents the set of scheduling strategies between the drone and the terminal device. Let C1 represent the collision constraint for flight angle, C2 represent the collision constraint for flight speed, C3 represent the collision constraint for flight boundary, C4 represent the collision constraint for the UAV, C5 represent the constraint that at most N terminal devices can simultaneously receive the semantic offloading service provided by the UAV in a time slot, C6 represent the constraint that the UAV needs to provide services to all terminal devices, and C7 represent the constraint that the UAV needs to complete the task within T time slots. max Within this range, C8 indicates that the semantic performance of the constraint must be greater than the threshold. This represents the horizontal direction of the m-th UAV flying in time slot t. V represents the flight speed of the nth drone in time slot t. max C represents the maximum speed. L Indicates the upper boundary of the service area. Let C represent the movement strategy of the nth drone in time slot t. U Indicates the lower boundary of the service area. D represents the movement strategy of the i-th iteration variable in time slot t. min This indicates the closest distance allowed during drone flight. This represents the scheduling strategy between the nth drone and the mth terminal device in time slot t. This indicates whether the semantic offloading task of the nth drone to the mth terminal device has been completed. T represents the transmission delay of the nth UAV completing the semantic offloading task to the mth terminal device in time slot t. wait T represents the latency of the last device being served. maxThis represents the maximum delay constraint. A represents the actual task execution accuracy of a single semantic task. min The threshold is represented by T, the maximum flight time slot is represented by N, the number of drones is represented by M, and the number of terminal devices is represented by M.
[0151] This is used to encourage drones to quickly establish connections with terminal devices to provide semantic offloading services. cov It is a constant. This is a semantic performance weight used to balance the semantic energy dimension and encourage drones to choose a higher compression rate. Used to constrain the drone to complete all tasks within time-delay constraints. κ dis This means that the drone is subjected to κ every time it experiences a time slot. dis The penalty is used to force the drone to find the optimal path to traverse all users. ρ c This represents a penalty for exceeding the boundary; if the drone's flight path does not meet C3, then ρ c >0, otherwise ρ c =0. ρ ob This represents collision penalties, including collisions between drones and collisions between a drone and an obstacle. If the drone's flight path does not meet C4 or C5, then ρ ob >0, otherwise ρ ob =0.
[0152] In this embodiment of the invention, in S4, the state information of the UAV adopts a global state; global state s t The expression is:
[0153]
[0154] In the formula, Indicates the first intermediate variable. This indicates whether the semantic offloading task of the nth drone to the mth terminal device has been completed. Indicates the second intermediate variable. This represents the scheduling strategy between the nth drone and the mth terminal device in time slot t. Indicates the third intermediate variable. This represents the movement strategy of the nth drone in time slot t, where M represents the number of terminal devices. This represents the pheromones collected by the first drone in time slot t. This represents the pheromones collected by the nth drone in time slot t. Let k represent the pheromones collected by the nth drone in time slot t-1. cov Let ω represent a constant, and let ω represent the semantic performance weight. This represents the semantic compression rate decision for all tasks that the nth UAV needs to transmit during the scheduling process of the mth terminal device in time slot t. This represents the semantic performance index of the nth UAV in time slot t. κ represents the transmission delay at which the nth UAV completes the semantic offloading task to the mth terminal device in time slot t. dis This represents the penalty that a drone receives for each time slot it occupies, ρ c Indicates the penalty for crossing the boundary, ρ ob Indicates a collision penalty;
[0155] In S4, the observation information of the UAV adopts the local observation state; the local observation state of the nth UAV in time slot t. The expression is:
[0156]
[0157] In S4, the action information of the nth UAV in time slot t. The expression is:
[0158]
[0159] In the formula, This represents the flight speed of the nth drone in time slot t. The horizontal direction of the nth drone flying in time slot t is represented in S4. In the model established in this invention, it is assumed that each terminal device contains some pheromones, which can also be represented as some special data that needs to be collected. When the drone traverses the terminal devices and provides semantic offloading services, it will collect pheromones, and at this time, the pheromones will also be transferred to the drone. At the same time, the pheromones on the drone will continuously evaporate, and when the drone's flight trajectory violates the constraints, more pheromones will evaporate. Pheromones, to a certain extent, represent the fusion information between the drone and the environment. Therefore, this invention uses the concentration of pheromones as a reference for rewards, thereby guiding the drone to better explore the environment and avoid training getting stuck in local optima. Furthermore, due to the pheromone evaporation mechanism, the original sparse reward can be transformed into a dense reward, enabling the agent to converge better. The reward function of the nth drone in time slot t. The expression is:
[0160]
[0161] In the formula, T max N represents the maximum time delay constraint. re T represents the reward received by the drone when semantic offloading services are provided to all terminal devices and latency constraints are met. wait This indicates the latency when the last device was served. This indicates whether the semantic offloading task of the nth drone to the mth terminal device was completed in time slot l, where N represents the number of drones. r represents the transmission delay when the nth UAV completes the semantic offloading task to the mth terminal device in time slot l. tanh This indicates a normalized reward.
[0162] In this embodiment of the invention, S5 includes the following sub-steps:
[0163] S51. Based on the UAV's state information, observation information, action information, and reward function, form tuples;
[0164] S52. Store the tuple into the experience replay pool until the drone completes all semantic offloading tasks;
[0165] S53. When the UAV completes all semantic offloading tasks, randomly select several empirical data from the empirical replay pool to update the parameters of the Critic network.
[0166] S54. Utilize the updated Critic network to complete the allocation of communication resources and trajectory optimization for the UAV.
[0167] In this embodiment of the invention, in S51, the expression of the tuple is: In the formula, This represents the local observation state of the nth UAV in time slot t. This represents the action information of the nth drone in time slot t. Let n be the reward function for the nth drone in time slot t. This represents the local observation status of the nth UAV in time slot t+1.
[0168] Table 1 compares the semantic performance of each algorithm under different numbers of terminal devices. As the number of terminal devices increases from 10 to 30, the cumulative semantic performance of MATD3-P continuously increases, outperforming MADDPG-P, MASAC-P, and MADQN-P. In low-load scenarios (M=10), the performance differences among the algorithms are small (MATD3-P only leads MASAC-P by 1.85%), indicating that when resources are sufficient, various algorithms can approximate suboptimal solutions through greedy strategies. However, in medium-to-high-load scenarios (M≥15), the performance advantage of MATD3-P significantly expands. For example, when M=20, its semantic performance is 3.44% higher than MADDPG-P, and when M=30, the difference is 2.37%.
[0169] Table 1
[0170] Number of terminal devices MATD3-P MADDPG-P MASAC-P MADQN-P 10 761.93 744.85 748.02 725.96 15 1145.26 1134.56 1137.17 None 20 1578.35 1525.76 1514.70 None 25 1898.97 1853.97 1838.44 None 30 2183.63 2132.95 2129.84 None
[0171] Table 2 shows the cumulative semantic performance comparison of each algorithm under different numbers of drones. Experimental results show that increasing the number of drones significantly affects the system's semantic performance and collaborative efficiency, while MATD3-P exhibits stronger scalability and resource allocation capabilities in multi-drone collaborative scenarios. When the number of drones increases from 1 to 4, the cumulative semantic performance of MATD3-P improves from 1516.00 to 1664.39, an increase of 9.8%, which is better than MADDPG-P and MASAC-P. In single-drone scenarios, MATD3-P achieves a 2.2% improvement in semantic performance compared to MADDPG-P through dynamic compression rate adjustment and pheromone-driven progressive path planning. MADQN-P fails to generate continuous trajectories due to the limitation of discrete action space, resulting in the failure of all tasks.
[0172] Table 2
[0173] Number of drones MATD3-P MADDPG-P MASAC-P MADQN-P 1 1516.00 1482.89 1462.38 None 2 1549.43 1501.58 1475.42 None 3 1578.35 1525.76 1514.70 None 4 1664.39 1560.00 1552.09 None
[0174] Figure 2 The MATD3-P algorithm neural network structure includes 2N action networks, corresponding to the original action network and target action network of the UAV, and 4N evaluation networks (2 original evaluation networks and 2 target evaluation networks). Since most dimensions of the states in the existing POMDP are related to the coverage identifier of the terminal devices, corresponding to the number of terminal devices M, while the UAV position only has two dimensions and the pheromone only has one dimension, there is a dimensionality imbalance problem. Therefore, an extension network is established to expand the state dimensions. The low-dimensional states (UAV position and pheromone) first pass through a dense network to expand the dimensions to 2M, and then the propagated states are concatenated with the remaining states as input to the Actor and Critic networks.
[0175] Figure 3-6 The flight path trajectories and the dynamic adjustment process of semantic compression ratio under different numbers of drones are demonstrated. Experimental results show that the proposed MATD3-P algorithm can reasonably plan drone flight paths and semantic compression ratio decisions for scenarios with different numbers of drones, providing stable and reliable semantic offloading services.
[0176] In a single drone (N=1) scenario, such as Figure 3 As shown, due to limitations in communication bandwidth and latency, the UAV adopts a high semantic compression rate in the initial stage to reduce transmission latency and ensure that all terminal devices are covered within a limited time. Its flight trajectory exhibits a progressive regional coverage characteristic, advancing along the spatial distribution density gradient of terminal devices to avoid latency constraint violations caused by path detours.
[0177] In multi-drone scenarios (N≥2), such as Figure 4-6As shown, the algorithm uses a lower semantic compression rate in the early stages of the task to increase the semantic performance weight, thereby optimizing the overall semantic performance S. total As the remaining time slots decrease, the drone swarm dynamically increases the compression rate to a high level. By sacrificing local semantic performance to strictly meet latency constraints, the drone swarm trajectory planning exhibits collaborative partitioning characteristics. Based on the real-time pheromone concentration distribution, the drones autonomously divide service areas, reduce path overlap, and improve equipment coverage, thus verifying the effectiveness of the pheromone mechanism in multi-agent collaboration.
[0178] Those skilled in the art will recognize that the embodiments described herein are intended to help the reader understand the principles of the invention, and should be understood that the scope of protection of the invention is not limited to such specific statements and embodiments. Those skilled in the art can make various other specific modifications and combinations based on the technical teachings disclosed in this invention without departing from the spirit of the invention, and these modifications and combinations are still within the scope of protection of this invention.
Claims
1. A method for multi-UAV cooperative semantic communication resource allocation and trajectory optimization, characterized in that, Includes the following steps: S1. Obtain semantic data and construct a semantic communication model; S2. Based on the semantic communication model, construct the latency constraint model and the overall semantic performance index; S3. Based on the time delay constraint model and the overall semantic performance index, construct a nonlinear multi-constraint optimization model for the UAV; S4. Based on a nonlinear multi-constraint optimization model, determine the UAV's state information, observation information, action information, and reward function; S5. Based on the UAV's state information, observation information, action information, and reward function, perform reinforcement learning to complete the UAV's communication resource allocation and trajectory optimization; S1 includes the following sub-steps: S11. Obtain semantic data and extract semantic features; S12. Compress the semantic features to obtain the compressed semantic features; S13. Channel coding is performed on the compressed semantic features to obtain symbols suitable for channel transmission; S14. Transmit the symbols suitable for channel transmission to the signal receiver; S15. The signal receiver performs channel decoding on the received signal to obtain the recovered semantics; S16. Input the recovered semantics into the semantic decoder to obtain the classification result; S17. Based on the classification results, construct a semantic communication model; S2 includes the following sub-steps: S21. Based on the semantic communication model, generate a semantic unloading task and determine the semantic feature size of the semantic unloading task; S22. Determine the transmission delay based on the semantic feature size of the semantic unloading task; S23. Determine semantic performance indicators; S24. Construct a delay constraint model based on the transmission delay of the semantic offloading task; S25. Determine the overall semantic performance index based on the semantic performance index; In step S25, the semantic performance indicators of each UAV in all time slots are summed to obtain the total semantic performance indicator.
2. The multi-UAV cooperative semantic communication resource allocation and trajectory optimization method according to claim 1, characterized in that, In S11, semantic features The expression is: ; In the formula, Represents semantic data, The parameter set is Semantic coding network; In S12, the compressed semantic features The expression is: ; In the formula, This represents a semantic compression function. Indicates semantic compression ratio; In S13, the symbols applicable to channel transmission The expression is: ; In the formula, The parameter set is Channel encoder network; In step S14, the signal received by the signal receiver... The expression is: ; In the formula, Indicates channel gain. Indicates Gaussian white noise; In S15, semantics are restored. The expression is: ; In the formula, Indicates network parameters as The channel decoder network; In S16, the classification result The expression is: ; In the formula, The parameter is set to The semantic decoder; In S17, the semantic communication model The expression is: ; In the formula, Denotes the first fitted variable. Indicates the second fitted variable. This represents the third fitted variable.
3. The multi-UAV cooperative semantic communication resource allocation and trajectory optimization method according to claim 1, characterized in that, In S21, Time slot number The drone completed the mission to the first The semantic feature size of the semantic offloading task of a terminal device The expression is: ; In the formula, This represents the initial size of the semantic features to be transmitted before compression. Indicates in Time slot number The drone was targeting the first The semantic compression rate decision of all tasks that need to be transmitted during the scheduling process of each terminal device. In S22, Time slot number The drone completed the mission to the first Transmission latency of semantic offloading tasks for individual terminal devices The expression is: ; In the formula, Indicates the first Semantic offloading tasks for each terminal device. Indicates the first The drone and the first Each terminal device Time slot scheduling strategy, Indicates the first The drone and the first The transmission rate of each terminal device Indicates transmission power; In S23, Time slot number semantic performance metrics of drones The expression is: ; In the formula, This represents the actual task execution accuracy of a single semantic task. Indicates the number of terminal devices; In S24, the expression for the time delay constraint model is: ; In the formula, This indicates the latency at which the last device begins to be serviced. This represents the maximum delay constraint. Indicates the maximum flight time slot. Indicates the number of drones; In S25, the overall semantic performance index The expression is: 。 4. The multi-UAV cooperative semantic communication resource allocation and trajectory optimization method according to claim 1, characterized in that, In S3, the expression for the nonlinear multi-constraint optimization model of the UAV is: In the formula, This represents the overall semantic performance metric. This represents the set of locations for all drones. This represents the set of scheduling strategies between the drone and the terminal device. Represents semantic features, Collision constraints representing the flight angle, Collision constraints representing flight speed Collision constraints representing flight boundaries This indicates the collision constraints for the drone. This indicates that the constraint has at most only one time slot. Each terminal device simultaneously receives semantic offloading services provided by the drone. This indicates that drones must provide services to all terminal devices. This indicates that the time required for the drone to complete the mission must be within a certain range. Within this range, C8 indicates that the semantic performance of the constraint must be greater than the threshold. Indicates the first A drone The horizontal direction of time-slot flight Indicates the first A drone Time-slot flight speed, This indicates the maximum speed. Indicates the upper boundary of the service area. Indicates the first A drone Time slot movement strategy, Indicates the lower boundary of the service area. Indicates the first The iterative variables are in Time slot movement strategy, This indicates the closest distance allowed during drone flight. Indicates the first The drone and the first Each terminal device Time slot scheduling strategy, Indicates the first The drone to the first Has the semantic offloading task for each terminal device been completed? Indicates in Time slot number The drone completed the mission to the first The transmission latency of semantic offloading tasks for individual terminal devices. This indicates the latency when the last device was served. This represents the maximum delay constraint. This represents the actual task execution accuracy of a single semantic task. Indicates the threshold. Indicates the maximum flight time slot. Indicates the number of drones. This indicates the number of terminal devices.
5. The multi-UAV cooperative semantic communication resource allocation and trajectory optimization method according to claim 1, characterized in that, In step S4, the UAV's state information adopts a global state; the global state The expression is: ; ; In the formula, Indicates the first intermediate variable. Indicates the first The drone to the first Has the semantic offloading task for each terminal device been completed? Indicates the second intermediate variable. Indicates the first The drone and the first Each terminal device Time slot scheduling strategy, Indicates the third intermediate variable. Indicates the first A drone Time slot movement strategy Indicates the number of terminal devices. Indicates the first drone in The pheromones collected in the time slot, Indicates the first A drone in The pheromones collected in the time slot, Indicates the first A drone in Pheromones collected in time slots, Represents a constant. Represents semantic performance weights. Indicates in Time slot number The drone was targeting the first The semantic compression rate decision for all tasks that need to be transmitted during the scheduling process of each terminal device. Indicates in Time slot number Semantic performance metrics of drones Indicates in Time slot number The drone completed the mission to the first The transmission latency of semantic offloading tasks for individual terminal devices. This indicates the penalty that the drone will receive for each time slot it occupies. This indicates a penalty for crossing the boundary. Indicates a collision penalty; In S4, the UAV's observation information adopts a local observation state; A drone in Local observation status of time slot The expression is: ; In S4, the first A drone in Action information of time slots The expression is: ; In the formula, Indicates the first A drone Time-slot flight speed, Indicates the first A drone The horizontal direction of time-slot flight; In S4, the first A drone in Reward function of time slot The expression is: ; In the formula, This represents the maximum delay constraint. This represents the reward a drone receives when it provides semantic offloading services to all terminal devices and meets latency constraints. This indicates the latency when the last device was served. Indicates in Time slot number The drone to the first Has the semantic offloading task for each terminal device been completed? Indicates the number of drones. Indicates in Time slot number The drone completed the mission to the first The transmission latency of semantic offloading tasks for individual terminal devices. This indicates a normalized reward.
6. The multi-UAV cooperative semantic communication resource allocation and trajectory optimization method according to claim 1, characterized in that, S5 includes the following sub-steps: S51. Based on the UAV's state information, observation information, action information, and reward function, form tuples; S52. Store the tuple into the experience replay pool until the drone completes all semantic offloading tasks; S53. When the UAV completes all semantic offloading tasks, randomly select several empirical data from the empirical replay pool to update the parameters of the Critic network. S54. Utilize the updated Critic network to complete the allocation of communication resources and trajectory optimization for the UAV.
7. The multi-UAV cooperative semantic communication resource allocation and trajectory optimization method according to claim 6, characterized in that, In S51, the expression for the tuple is ( In the formula, Indicates the first A drone in Local observation status of the time slot, Indicates the first A drone in Action information of time slots, Indicates the first A drone in The reward function of the time slot, Indicates the first A drone in Local observation status of a time slot.