Intelligent fast beam training method for low-altitude millimeter wave communication
By using a customized 3D beamcodebook and an enhanced multi-agent deep Q-network algorithm, the computational complexity and real-time performance issues of beam training in low-altitude UAV communication are solved, achieving efficient and accurate beam alignment suitable for dynamic low-altitude environments.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- SOUTHEAST UNIV
- Filing Date
- 2025-09-04
- Publication Date
- 2026-06-26
AI Technical Summary
In low-altitude UAV communication, the high maneuverability of UAVs leads to high computational complexity and training overhead for beam training, making it difficult for existing technologies to meet the requirements for real-time performance and convergence speed.
A customized 3D beamcodebook design is adopted, combined with an enhanced multi-agent deep Q-network algorithm. The beam search space is narrowed through preliminary user localization, and the beam alignment process is accelerated by using collaborative learning.
It significantly reduces the computational complexity and time delay of beam training, achieves high-precision, low-latency beam alignment, meets the real-time requirements of high-speed communication for UAVs, and improves the robustness and adaptability of the system.
Smart Images

Figure CN121124880B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of wireless communication technology, and in particular to an intelligent fast beam training method for low-altitude users in a millimeter-wave non-cellular massive MIMO system. Background Technology
[0002] With the increasing application of drones in transportation, logistics, environmental monitoring, and other fields, low-altitude drone wireless communication has become an important application scenario for 6G. Millimeter-wave non-cellular massive MIMO can achieve reliable communication through the inherent macro-diversity gain of the millimeter-wave non-cellular architecture, while the millimeter-wave frequency band gives the signal transmitted to the drone stronger directionality. Meanwhile, beamforming technology can generate directional beams, effectively compensating for the high path loss of millimeter waves and expanding the communication range, and is considered a key technology supporting this scenario. However, the high maneuverability of drones poses challenges to system beamforming design: the 3D coverage requirement exponentially expands the beam search space, requiring high alignment accuracy. Simultaneously, real-time service under high-speed movement strictly limits beam training time, which is even more difficult for millimeter-wave non-cellular massive MIMO joint transmission, facing problems of high training overhead and computational complexity.
[0003] In existing research, while deep reinforcement learning can accelerate beam training convergence, it struggles to meet real-time requirements in millimeter-wave non-cellular multi-node scenarios due to high network dimensionality and the need for frequent online retraining in dynamic environments. Multi-agent reinforcement learning reduces dimensionality through centralized training and distributed decision-making, but still suffers from insufficient convergence speed. Furthermore, beam codebook design directly impacts training performance and convergence time; an inappropriate design can expand the search space and reduce efficiency. Therefore, developing a codebook design method that balances beam accuracy and codebook size based on local real-time information is crucial for 3D beam training. Summary of the Invention
[0004] To address the challenges posed by the high maneuverability of unmanned aerial vehicles (UAVs), existing research often struggles to meet the real-time requirements of system beamforming in dynamic environments and generally suffers from slow convergence speeds. This invention proposes an intelligent and fast beam training method for low-altitude millimeter-wave communication. This scheme can reduce the computational complexity of system training and significantly accelerate the training convergence speed. Through a carefully designed scheme, this invention effectively addresses the problems of high training overhead and computational complexity caused by the high maneuverability of UAVs in system beamforming, ensuring that the system can meet the communication requirements of high-speed UAVs while also achieving low-latency and high-precision beam alignment performance.
[0005] To achieve the above objectives, the present invention adopts the following technical solution:
[0006] A smart, fast beam training method for low-altitude millimeter-wave communication includes the following steps:
[0007] Step 1: Establish the channel model and data transmission model of the low-altitude millimeter-wave non-cellular large-scale multiple-input multiple-output system, and derive the expression for the downlink communication achievable rate.
[0008] Step 2: In the channel estimation stage, the communication channel is estimated using pilot signals, and the user's location is initially located based on the estimated channel.
[0009] Step 3: In the beam training phase, based on the user location information obtained from the preliminary positioning in Step 2, a customized three-dimensional beam codebook is designed.
[0010] Step 4: Using the enhanced multi-agent deep Q-network algorithm, beam search is performed in the customized three-dimensional beam codebook designed in Step 3 to achieve three-dimensional beam alignment.
[0011] Preferably, step 1 includes:
[0012] Step 101: Establish a scenario model of a millimeter-wave non-cellular massive MIMO system managed by a central processing unit, wherein M access nodes are distributed to provide services to K users, and each access node is equipped with N antennas and is distributed in a uniform planar array; establish a channel model from the m-th access node to the k-th UAV user in each coherent time block.
[0013] Step 102: Based on the channel model, construct the received signal model of the kth UAV user, and derive the expression for the user's communication signal-to-interference-plus-noise ratio and achievable rate.
[0014] Preferably, step 2 includes:
[0015] Step 201: The user sends mutually orthogonal uplink pilot signals, the access node receives the pilot signals, and uses the least squares channel estimation method to obtain the estimated channel;
[0016] Step 202: Perform singular value decomposition on the received signal to separate the signal subspace and noise subspace. Use the 2D-MUSIC algorithm to construct the spatial spectrum function and search for its peak value to obtain the estimated values of the user's elevation angle and azimuth angle, thus completing the preliminary positioning of the user's location.
[0017] Preferably, in step 3, the design of the customized three-dimensional beamcodebook specifically includes:
[0018] For the m-th access node to the k-th UAV user, a finite number of phase shifts are used in the azimuth domain and a finite number of phase shifts are used in the elevation domain. A beamcodebook is constructed based on the estimated user position.
[0019] Preferably, step 4 includes:
[0020] Step 401: With the goal of maximizing communication and speed, and with the codebook index selected from the customized codebook as the optimization variable, a beam training optimization problem is established.
[0021] Step 402: Model the beam training optimization problem as a Markov decision process, where each access node is considered an independent agent; define the state, action, and reward of each agent;
[0022] Step 403: The enhanced multi-agent deep Q network algorithm is used for training. Each agent uses the same convolutional neural network structure as the traditional deep Q network, but its state space, action space and reward mechanism are designed collaboratively by multiple agents. Each agent selects an action according to the current state. After execution, the environment transitions to the next state and the agent receives a reward. Through experience replay and network update optimization strategy, beam alignment is finally achieved.
[0023] Preferably, in step 402, the local state observed by the m-th agent is defined as s. m (t)=[s m,1 (t),…,s m,K [(t)], where t represents the sequence of steps in the Markov process, and s m,1 (t) and s m,K (t) represents the state between the m-th access node (AP) and the 1st and Kth users, respectively; the state between each access node (AP) and each user is contained in the customized codebook. The index of the selected beam (p) m,k ,q m,k The elevation and azimuth angles estimated in step 2, where p m,k This represents the set of elevation angles Θ of the selected beam in the beamcodebook. m,k The sequence corresponding to the elevation angle, q m,k This indicates the selected beam in the azimuth set Ψ m,k The sequence of corresponding azimuth angles;
[0024] The local action of the m-th agent at the t-th Markov step is defined as a. m (t)=[a m,1 (t),…,a m,K (t)],a m,1 (t) and a m,K (t) represent the beam selection actions between the m-th access node and the 1st and Kth users, respectively; the beam selection actions of each access node for each user are derived from... The selection is made from the actions of all agents; the specific action combination of all agents is the global action a(t) = [a1(t), ..., a...].M [(t)], where a1(t) and a M (t) represents the local actions of the 1st and Mth agents at the tth Markov step;
[0025] All agents use the sum of user communication rates as the same reward. If the action set If the selected action may cause the beam codebook index to go out of bounds, then the reward at the corresponding time moment will be set to r(t) = 0.
[0026] Preferably, step 403 specifically includes:
[0027] 1) The training process includes multiple training rounds, each round consisting of multiple Markov steps; Q-evaluation network and Q-target network are initialized for each agent, both of which adopt a convolutional neural network structure; an experience replay buffer is initialized to store experience samples shared by all agents;
[0028] 2) In each training step, each agent selects an action based on its current state. After all agents have performed their actions, the environment transitions to the next state and generates a global reward. The experience tuple generated from this interaction is then stored in the common experience replay buffer.
[0029] 3) Periodically sample a batch of historical experience data from the public experience replay buffer to calculate the loss and update the parameters of the Q-evaluation network using gradient descent.
[0030] 4) The parameters of the Q target network are synchronized with the parameters of the Q evaluation network in a periodic manner independent of the gradient descent process.
[0031] 5) After each training round, record the state corresponding to the maximum reward of the current training round and use it as the initial state for the next training round.
[0032] Preferably, when the agent selects an action based on the current state, it adopts an ε-greedy strategy, wherein the exploration rate ε decreases as the training process progresses.
[0033] Preferably, in the target network synchronization step, the update frequency of the Q target network parameters is lower than the update frequency of the Q evaluation network parameters.
[0034] Preferably, the global reward is the sum of the reachable rates of all users.
[0035] Beneficial effects:
[0036] The intelligent fast beam training method for low-altitude millimeter-wave communication proposed in this invention has the following significant advantages compared with existing technologies:
[0037] 1. By fusing preliminary user positioning information, a targeted, customized 3D beamcodebook is generated, narrowing the beam search space from a global scope to the user's potential location area, fundamentally exponentially reducing search complexity. Intelligent search is performed using the Enhanced Multi-Agent Deep Q-Network (EMADQN) algorithm. This algorithm significantly accelerates convergence speed through optimized exploration mechanisms and experience replay strategies, enabling the system to quickly achieve high-precision beam alignment with far lower computational overhead and time latency than exhaustive search and traditional deep reinforcement learning algorithms, effectively meeting the real-time communication needs of UAVs under high maneuverability. Specifically: The Enhanced Multi-Agent Deep Q-Network is an intelligent algorithm proposed in this invention specifically for low-altitude millimeter-wave communication beam training. Its differences from the traditional MADQN are as follows:
[0038] Difference 1: In traditional MADQN, the initial state is randomly generated at the beginning of each training round; while in the enhanced multi-agent deep Q network, after each training round, the state corresponding to the maximum reward of the current training round is recorded and used as the initial state of the next training round. This allows for a gradual gain in each training round, while the ε-greedy strategy ensures that the training does not get stuck in local optima.
[0039] Difference 2: In traditional MADQN, each agent (access node) performs beam actions for each user based on the codebook. Directly select the index (p) of the codebook m,k ,q m,k ), where p m,k There will be N θ A choice, q m,k There will be There are several options, therefore, each intelligent agent (access node) will have a total of [number] beam actions between each user. In enhanced multi-agent deep Q-networks, each agent (access node) performs beam actions for each user from... There are only 5 candidate options, so the action selection space for each AP has 5 dimensions. K Enhanced multi-agent deep Q-networks place the codebook indexes in the state space, reducing the search dimension of the action space from 2 at the cost of only increasing the action space dimension by 2. It dropped to 5 K This reduces the network's search complexity and training convergence rate.
[0040] 2. The customized codebook design employed in this invention, based on real-time estimated user positions (elevation and azimuth), generates a beam vector set highly matched to the current channel spatial characteristics, providing a high-quality solution space for subsequent beam selection. Building upon this, the EMADQN algorithm can accurately find near-optimal beam pairs within this solution space through collaborative learning, thereby achieving stable and accurate three-dimensional beam main lobe alignment and interference nulling in complex and dynamic low-altitude environments, significantly improving the link quality and reliability of downlink communication.
[0041] 3. Simulation results show that the proposed solution significantly outperforms solutions based on DFT codebooks or traditional DQN / MADQN algorithms in key performance indicators such as system throughput and spectral efficiency, with performance approaching the upper bound of exhaustive search performance, which has extremely high computational complexity. Furthermore, this method exhibits stronger adaptability (robustness) to environmental changes and user mobility, avoiding the problem of frequent retraining required by traditional methods due to environmental dynamics. It also boasts high computational efficiency and is more suitable for large-scale practical deployment.
[0042] In summary, this invention, by closely integrating sensing (positioning) and communication (beam training), innovatively adopts a two-level collaborative mechanism of "customized codebook to reduce the search space" and "enhanced intelligent algorithm to accelerate the search for the optimal solution," effectively solving the beam training problem caused by high mobility in low-altitude millimeter-wave communication. It achieves a breakthrough balance and improvement in training efficiency, alignment accuracy, and system practicality. Attached Figure Description
[0043] The above and / or additional aspects and advantages of the present invention will become apparent and readily understood from the following description of the embodiments taken in conjunction with the accompanying drawings, wherein:
[0044] Figure 1 A flowchart illustrating an intelligent fast beam training scheme for low-altitude millimeter-wave communication according to an example of the present invention;
[0045] Figure 2 This is a comparison diagram of the convergence process of the method in this embodiment of the invention and the algorithm based on multi-agent deep Q network (MADQN);
[0046] Figure 3 This is a comparison chart of the beam training performance of the method of this invention and exhaustive search (EXH) under four different codebook parameter configurations.
[0047] Figure 4 This is a comparison of the cumulative distribution function (CDF) of the method of this embodiment of the invention with other different beam alignment schemes under throughput metrics. Detailed Implementation
[0048] Embodiments of the present invention are described in detail below, examples of which are illustrated in the accompanying drawings, wherein the same or similar reference numerals denote the same or similar elements or elements having the same or similar functions throughout. The embodiments described below with reference to the accompanying drawings are exemplary and intended to explain the present invention, and should not be construed as limiting the present invention.
[0049] Based on the problems existing in the aforementioned background technologies, this invention focuses on millimeter-wave non-cellular massive MIMO systems for low-altitude UAV communication scenarios. It approaches the problem from two core directions: beamcodebook design and intelligent learning beam optimization. It proposes a 3D beam alignment scheme that can balance alignment accuracy and training efficiency. The aim is to overcome the bottlenecks of existing technologies in terms of search space, convergence speed, and environmental adaptability, and to provide more efficient and reliable beam alignment support for millimeter-wave communication of low-altitude UAVs.
[0050] Figure 1 A flowchart illustrating an intelligent fast beam training scheme for low-altitude millimeter-wave communication in a millimeter-wave non-cellular massive MIMO system, provided by an example of the present invention.
[0051] like Figure 1 As shown, the node mode selection and beamforming method in this collaborative sensing integration scenario includes the following steps:
[0052] Step 1: Establish the channel model and data transmission model of the low-altitude millimeter-wave non-cellular large-scale multiple-input multiple-output system, and derive the expression for the downlink communication achievable rate.
[0053] In one embodiment of the present invention, step 1 specifically includes:
[0054] Step 101: Establish a scenario model of a millimeter-wave non-cellular massive MIMO system managed by a central processing unit; distribute M access nodes (APs) to jointly provide services to K users; each AP is equipped with N antennas, which are uniformly distributed in a planar array, with N antennas equally spaced along the x-axis and y-axis respectively. x and N y 1 antenna; Considering a quasi-static block fading model, each coherent time block covers τ symbols; m is any integer in the set [1,M], k is any integer in the set [1,K], and the channel from the m-th access node AP to the k-th UAV user in the t-th coherent time block is represented as: Where g represents the antenna gain, β m,k Represents the channel gain coefficient. Represents the steering vector, θ m,k and These represent the elevation and azimuth angles of the drone user and the access point (AP), respectively. Represents Doppler frequency shift; the channel vector from the k-th user to all access nodes (APs) is... The superscript [·] T h represents the vector transpose operation. 1,k and h M,k These represent the channel vectors from the 1st and Mth access nodes (APs) to the kth user, respectively.
[0055] Step 102, during communication data transmission in the system, the signal received by the k-th drone user is: Where w i ρ is a hybrid beam vector for i users, consisting of zero-forcing digital beams and analog beams; i h represents the transmit power for the i-th user. i Let s represent the channel vector from the i-th user to all access points (APs). i This represents the baseband transmission signal sent to the i-th user, n. k It is additive white Gaussian noise; separation y k The useful signal and noise in the signal are used to obtain the signal-to-interference-plus-noise ratio γ of the k-th UAV user. k and achievable rate R k .
[0056] Step 2: In the channel estimation stage, the communication channel is estimated and the user's location is initially determined using pilot signals.
[0057] In one embodiment of the present invention, step 2 specifically includes:
[0058] Step 201, initial τ of each coherent time block p The user sends mutually orthogonal uplink pilot signals; the m-th access node (AP) receives the pilot signal Y. m Using the least squares channel estimation method, the estimated channel from the m-th access node (AP) to the k-th UAV user is expressed as follows: Where ρ P Indicates the pilot transmit power. The pilot signal transmitted by the k-th user is denoted as ; the combined matrix of estimated channels from all access nodes (APs) to all UAV users is denoted as .
[0059] Step 202, the received signal Y is decomposed using singular value decomposition. m The system is divided into a signal subspace and a noise subspace. A 2D-MUSIC algorithm is used to construct a spatial spectrum function based on the orthogonality of the signal and noise subspaces, and its peak position is searched within a preset angle range to obtain estimates of the elevation and azimuth angles. and
[0060] Step 3, in the beam training phase, design a customized three-dimensional (3D) beamcodebook based on positioning information.
[0061] In one embodiment of the present invention, step 3 specifically includes:
[0062] Design a 3D simulated beamcodebook from the m-th access node (AP) to the k-th drone user. At that time, a finite number of phase shifts N are used in the azimuth domain. θ A finite number of phase shifts are used in the elevation domain. Customization It can be represented as in Let the elevation angle be θ and the azimuth angle be... Steering vector, For the set of elevation angles, Let azimuth be the set of angles, Δθ and Δθ. The beam offset angle is the elevation and azimuth angles.
[0063] Step 4: Then, the Enhanced Multi-Agent Deep Q-Network (EMADQN) algorithm is used to efficiently perform beam search in a customized beam codebook, achieving accurate 3D beam alignment.
[0064] In one embodiment of the present invention, step 4 specifically includes:
[0065] Step 401: With maximizing communication and speed as the optimization objective, and codebook indexes selected from the customized codebook designed in Step 3 as optimization variables, a beam training optimization problem is established.
[0066] Step 402: Design a beam training algorithm based on an enhanced multi-agent deep Q-network; model the beam training optimization problem as a Markov decision process, where each access node (AP) is considered an independent agent;
[0067] The local state observed by the m-th agent is defined as s. m (t)=[s m,1 (t),…,s m,K [(t)], where t represents the sequence of steps in the Markov process, and s m,1 (t) and s m,K (t) represents the state between the m-th access node (AP) and the 1st and Kth users, respectively; the state between each access node (AP) and each user is contained in the customized codebook. The index of the selected beam (p) m,k ,q m,k The elevation and azimuth angles estimated in step 2, where p m,k Indicates the selected beam at Θ m,k The sequence corresponding to the elevation angle, qm,k Indicates the selected beam in Ψ m,k The sequence of corresponding azimuth angles;
[0068] The local action of the m-th agent at the t-th Markov step is defined as a. m (t)=[a m,1 (t),…,a m,K (t)],a m,1 (t) and a m,K (t) represent the beam selection actions between the m-th access node (AP) and the 1st and Kth users, respectively; the beam selection actions of each access node (AP) for each user are from... The selection is made from the actions of all agents; the specific action combination of all agents is the global action a(t) = [a1(t), ..., a...]. M [(t)], where a1(t) and a M (t) represents the local actions of the 1st and Mth agents at the tth Markov step;
[0069] All agents use the sum of user communication rates as the same reward. If the action set If the selected action may cause the beam codebook index to go out of bounds, then the reward at the corresponding time moment will be set to r(t) = 0.
[0070] Step 403: Training is performed using the Enhanced Multi-Agent Deep Q-Network (EMADQN) algorithm. Each agent's Q-network employs the same convolutional neural network structure as the traditional deep Q-network, but its state, action, and reward mechanisms are designed collaboratively by multiple agents. The training process includes N... e Training rounds, each round consisting of N s The process consists of several Markov steps; each agent has a Q-evaluation network for selecting local actions and a Q-target network for network updates; at each MDP step, agent m adjusts its actions based on the current state s. m (t), call Q to evaluate the network's selected action a m (t); After executing the global action a(t), the environment transitions to the next state s. m (t+1), and obtain the global reward r(t); at this time, the agent m will record the complete experience tuple: e m (t)= m (t),a m (t),r(t),s m (t+1)>, and store it in the experience replay buffer; every N... up Step by step, each agent samples a small batch from the experience pool. The Q-evaluation network and Q-target network are updated using gradient descent.
[0071] After each training round, record the state corresponding to the maximum reward of the current training round and use it as the initial state for the next training round.
[0072] The present invention provides a detailed description of a smart fast beam training scheme for low-altitude millimeter-wave communication through a specific embodiment.
[0073] Consider a low-altitude millimeter-wave non-cellular massive MIMO (Multi-Input Multiple-Output) system scenario, covering an area of 300m × 300m, serving K = 2 drone users at altitudes between 100m and 300m. There are M = 4 access points (APs), each at an altitude of 25m, and equipped with N = 64 (N... x =N y =8) with the antenna. Both drone users and access points are Poisson distributed. The channel gain coefficient is set to... Where d m,k This represents the distance between the m-th access node and the k-th user, with the communication carrier frequency being f. c =28GHz, pilot transmission power is ρ p =100mW, the transmission power at the access point is P AP =1W. The system employs an average power distribution strategy. The variance of the Gaussian noise is... Each coherent time block contains τ = 200 symbols, where τ p =10 symbols are specifically used for uplink pilot training.
[0074] Figure 2 The diagram illustrates a comparison of the convergence process between the method of this invention and the algorithm based on a Multi-Agent Deep Q-Network (MADQN). For the beam training algorithm based on the Enhanced Multi-Agent Deep Q-Network, the discount factor is set to δ = 0.99, the learning rate is set to 0.0001, and the number of training rounds N is... e =1000, N steps per round s =15, network update interval c up =15, initial exploration rate ∈ 0 = 0.6, final exploration rate ∈ fin=0.01. During the initial 300-round exploration phase, the average reward values of both algorithms fluctuated around the initial reward value. However, once the greedy exploration ended, the convergence speed of the algorithm of this invention significantly accelerated, with the reward value eventually converging to 28.4 bits / s / Hz. Compared to the algorithm based on Multi-Agent Deep Q-Network (MADQN) where the reward value converged to 21.8 bits / s / Hz, the beam training algorithm based on Enhanced Multi-Agent Deep Q-Network (EMADQN) achieved a 30.3% performance improvement, and also outperformed the former in both convergence speed and stability. This performance improvement is attributed to its training mechanism. In existing multi-agent deep Q-network (MADQN) algorithms, the continuous state transitions during action execution and the excessively large action space make it difficult for the network to stably maintain the current optimal state and perform further optimization. In contrast, the beam training algorithm based on Enhanced Multi-Agent Deep Q-Network (EMADQN) retains the historical optimal state and simplifies the action space, thereby enabling fast neighborhood search near the global optimum.
[0075] Figure 3 This paper presents a comparison of the beam training performance of the method of this invention and exhaustive search (EXH) under four different codebook parameter configurations. When designing a customized codebook, different values for codebook size and beam offset granularity directly affect beam coverage and directionality accuracy. The four different codebook parameter configurations are (d). For each codebook design, exhaustive search (EXH) and the beam training algorithm based on Enhanced Multi-Agent Deep Q-Network (EMADQN) designed in this invention are used for beam training. With the same codebook size, configuration (a) has lower beam accuracy and therefore lower spectral efficiency (SE) than configuration (b), when the beam offset granularity is fixed at 4°. Compared to configuration (d), configuration (c) has suboptimal spectral efficiency, indicating that excessively small codebook sizes may exclude optimal beam candidates. Although configuration (d) can achieve an optimal rate sum comparable to configuration (b), it significantly increases the complexity of the selection space and the training difficulty. Therefore, we choose configuration (b), which effectively covers all optimal beam candidates while ensuring computational tractability.
[0076] Figure 4This paper presents a comparison of the cumulative distribution function (CDF) of the method of this invention with other different beam alignment schemes under throughput metrics. Five schemes were evaluated: an exhaustive search method based on a Discrete Fourier Transform (DFT) codebook (EXH(DFT)), an exhaustive search method based on a customized codebook (EXH), a method based on a Deep Q-Network (DQN), a method based on a Multi-Agent Deep Q-Network (MADQN), and the intelligent 3D beam alignment scheme based on an Enhanced Multi-Agent Deep Q-Network (EMADQN) proposed in this invention. EXH(DFT) performed the worst because its DFT codebook is uniformly distributed in the spatial domain and lacks a customized design related to user location information, resulting in insufficient beam selection accuracy. The other four schemes all used a customized codebook based on configuration (b), but differed in their beam training algorithms. In 100 Monte Carlo simulations, the proposed EMADQN scheme demonstrated a significant performance advantage. At the 90% probability point, its system throughput is improved by 1.21 times and 1.15 times compared to DQN and MADQN methods, respectively. Although EMADQN is only 1.09% slower in performance than the exhaustive search scheme (EXH) under a customized codebook, it is more computationally efficient in complex dynamic environments, avoiding the high time complexity and high resource consumption of exhaustive schemes, and demonstrating stronger practicality and robustness.
[0077] In summary, this invention addresses the challenges posed by the high maneuverability of UAVs in dynamic environments by proposing an intelligent and rapid beam training scheme for low-altitude millimeter-wave communication. First, this invention constructs a deployment framework for a low-altitude millimeter-wave non-cellular large-scale multiple-input multiple-output (MIMO) system. Based on specific scenarios, channel and data transmission models are established, and the downlink communication achievable rate expression is derived. During the channel estimation phase, pilot signals are used to perform channel estimation and initially locate the user's position. In the beam training phase, a customized 3D beam codebook is designed based on the location information. Then, an enhanced multi-agent deep Q-network (EMADQN) algorithm is employed to efficiently perform beam search within the customized codebook, ultimately achieving accurate 3D beam alignment. The proposed method significantly accelerates training convergence speed while ensuring effective 3D beam selection, which is of great significance for deploying UAV communication systems in low-altitude, high-speed scenarios.
[0078] Through the above system embodiments, each step of the method invention is completely mapped to specific hardware entities and software modules, realizing the intelligent fast beam training function, significantly reducing training overhead and computational complexity, and is suitable for highly mobile low-altitude communication scenarios.
[0079] In the description of this specification, the references to terms such as "one embodiment," "some embodiments," "example," "specific example," or "some examples," etc., refer to specific features, structures, materials, or characteristics described in connection with that embodiment or example, which are included in at least one embodiment or example of the present invention. In this specification, the illustrative expressions of the above terms do not necessarily refer to the same embodiment or example. Furthermore, the specific features, structures, materials, or characteristics described may be combined in any suitable manner in one or more embodiments or examples. Moreover, without contradiction, those skilled in the art can combine and integrate the different embodiments or examples described in this specification, as well as the features of different embodiments or examples.
[0080] Furthermore, the terms "first" and "second" are used for descriptive purposes only and should not be construed as indicating or implying relative importance or implicitly specifying the number of indicated technical features. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one of that feature. In the description of this invention, "N" means at least two, such as two, three, etc., unless otherwise explicitly specified.
[0081] Any process or method description in the flowchart or otherwise herein can be understood as representing a module, segment, or portion of code comprising one or N executable instructions for implementing custom logic functions or processes, and the scope of preferred embodiments of the invention includes additional implementations in which functions may be performed not in the order shown or discussed, including substantially simultaneously or in reverse order depending on the functions involved, as will be understood by those skilled in the art to which embodiments of the invention pertain.
Claims
1. A smart, fast beam training method for low-altitude millimeter-wave communication, characterized in that, Includes the following steps: Step 1: Establish the channel model and data transmission model of the low-altitude millimeter-wave non-cellular large-scale multiple-input multiple-output system, and derive the expression for the downlink communication achievable rate. Step 2: In the channel estimation stage, the communication channel is estimated using pilot signals, and the user's location is initially located based on the estimated channel. Step 3: In the beam training phase, based on the user location information obtained from the preliminary positioning in Step 2, a customized three-dimensional beam codebook is designed. Step 4: Using the enhanced multi-agent deep Q-network algorithm, beam search is performed in the customized three-dimensional beam codebook designed in Step 3 to achieve three-dimensional beam alignment. Step 4 includes: Step 401: With the goal of maximizing communication and speed, and with the codebook index selected from the customized codebook as the optimization variable, a beam training optimization problem is established. Step 402: Model the beam training optimization problem as a Markov decision process, where each access node is considered an independent agent; define the state, action, and reward of each agent; Step 403: The enhanced multi-agent deep Q-network algorithm is used for training. Each agent uses the same convolutional neural network structure as the traditional deep Q-network, but its state space, action space and reward mechanism are designed collaboratively by multiple agents. Each agent selects an action according to the current state. After execution, the environment transitions to the next state and the agent receives a reward. Through experience replay and network update optimization strategy, beam alignment is finally achieved. In step 402, the first The local state observed by an agent is defined as follows: ,in Represents the sequence of steps in a Markov process. and They represent the first The access node AP and the first and second The state between each user; the state between each access node (AP) and each user is contained in a customized codebook. Index of the selected beam And the elevation and azimuth angles estimated in step 2, where This represents the set of elevation angles of the selected beam in the beamcodebook. The sequence corresponding to the elevation angle, Indicates the selected beam in the azimuth angle set. The sequence of corresponding azimuth angles; No. The agent in the th... The local action of a Markov step is defined as follows: , and They represent the first The access node and the first and second Beam selection actions between individual users; each access node's beam selection actions between each user are derived from... The selection is made from the actions of all agents; the specific actions of all agents are combined into a global action. ,in and For the 1st and the 2nd The agent in the th... Local movements of one Markov step; All agents use the sum of user communication rates as the same reward. If the action set If the selected action may cause the beam codebook index to go out of bounds, then the reward at the corresponding moment will be set to... ; Step 403 specifically includes: 1) The training process includes multiple training rounds, each round consisting of multiple Markov steps; Q-evaluation network and Q-target network are initialized for each agent, both of which adopt a convolutional neural network structure; an experience replay buffer is initialized to store experience samples shared by all agents; 2) In each training step, each agent selects an action based on its current state. After all agents have performed their actions, the environment transitions to the next state and generates a global reward. The experience tuple generated from this interaction is then stored in the experience replay buffer. 3) Periodically sample a batch of historical experience data from the experience replay buffer to calculate the loss and update the parameters of the Q-evaluation network using gradient descent. 4) The parameters of the Q target network are synchronized with the parameters of the Q evaluation network in a periodic manner independent of the gradient descent process; 5) After each training round, record the state corresponding to the maximum reward of the current training round and use it as the initial state for the next training round.
2. The method according to claim 1, characterized in that, Step 1 includes: Step 101: Establish a scenario model of a millimeter-wave non-cellular massive MIMO system managed by a central processing unit, wherein M access nodes are distributed to provide services to K users, and each access node is equipped with N antennas and is distributed in a uniform planar array; establish a channel model from the m-th access node to the k-th UAV user in each coherent time block. Step 102: Based on the channel model, construct the received signal model of the kth UAV user, and derive the expression for the user's communication signal-to-interference-plus-noise ratio and achievable rate.
3. The method according to claim 1, characterized in that, Step 2 includes: Step 201: The user sends mutually orthogonal uplink pilot signals, the access node receives the pilot signals, and uses the least squares channel estimation method to obtain the estimated channel; Step 202: Perform singular value decomposition on the received signal to separate the signal subspace and noise subspace. Use the 2D-MUSIC algorithm to construct the spatial spectrum function and search for its peak value to obtain the estimated values of the user's elevation angle and azimuth angle, thus completing the preliminary positioning of the user's location.
4. The method according to claim 1, characterized in that, In step 3, the design of the customized three-dimensional beamcodebook specifically involves: For the m-th access node to the k-th UAV user, a finite number of phase shifts are used in the azimuth domain and a finite number of phase shifts are used in the elevation domain. A beamcodebook is constructed based on the estimated user position.
5. The method according to claim 1, characterized in that, When the agent selects an action based on the current state, it adopts an ε-greedy strategy, where the exploration rate ε decreases as the training process progresses.
6. The method according to claim 1, characterized in that, In the target network synchronization step, the update frequency of the Q target network parameters is lower than the update frequency of the Q evaluation network parameters.
7. The method according to claim 1, characterized in that, The global reward is the sum of the reachable rates of all users.