A distributed time synchronization method suitable for a drone swarm

By combining hardware clock models and reinforcement learning, the problems of time synchronization errors and communication resource waste in UAV swarms under high-mobility environments were solved, achieving high-precision synchronization and resource optimization.

CN122294232APending Publication Date: 2026-06-26SOUTHEAST UNIV

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
SOUTHEAST UNIV
Filing Date
2026-04-03
Publication Date
2026-06-26

AI Technical Summary

Technical Problem

Existing technologies struggle to overcome synchronization errors caused by time-varying propagation delays in highly mobile environments and lack the ability to dynamically adapt to system states, leading to deterioration in synchronization accuracy and waste of communication resources.

Method used

A distributed time synchronization method based on a hardware clock model is adopted, which combines multiple broadcasts and bidirectional synchronization strategies to compensate for propagation delay. Furthermore, the synchronization interaction period is adaptively selected through reinforcement learning to dynamically adjust the synchronization accuracy and communication overhead.

Benefits of technology

It achieves high-precision time synchronization in highly mobile environments, improving synchronization accuracy to the microsecond level, while reducing communication overhead by 36%.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122294232A_ABST
    Figure CN122294232A_ABST
Patent Text Reader

Abstract

This invention provides a distributed time synchronization method suitable for UAV swarms, comprising: Step 1: Synchronization information exchange between adjacent nodes of the UAVs; Step 2: Propagation delay compensation, using the synchronization interaction information received from neighbors within the synchronization period of Step 1 to compensate for the time-varying propagation delay timestamps; Step 3: Logical clock adjustment parameter update, combining the previously obtained synchronization information to estimate and update the logical clock parameters, completing the final time synchronization; Step 4: Reinforcement learning-based adaptive selection of the synchronization period, using an adaptive period decision mechanism based on reinforcement learning to dynamically and adaptively output the optimal synchronization broadcast period in the current state. This method overcomes the synchronization error caused by time-varying propagation delay in high-mobility environments; it adaptively adjusts the synchronization interaction period according to the actual clock state of the swarm, effectively reducing the system's communication overhead and energy consumption while ensuring high-precision time synchronization.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of wireless communication and UAV swarm collaborative control technology, specifically to a high-precision distributed time synchronization method for large-scale, highly dynamic, decentralized UAV swarms. Background Technology

[0002] With the rapid development of Unmanned Aerial Vehicle (UAV) swarm technology and Wireless Ad-Hoc Network (WANET) technology, UAV swarms are increasingly being used in collaborative tasks such as wide-area reconnaissance and distributed formation. In a distributed swarm without a central node, each node achieves global state consistency through local information exchange. Establishing and maintaining a high-precision global time reference at the microsecond level for the entire network is a fundamental prerequisite for ensuring the accurate execution of various spatial collaborative tasks.

[0003] Currently, existing distributed time synchronization methods (such as the typical average time synchronization mechanism, AverageTimeSync, ATS) mainly rely on periodic timestamp broadcasting and information exchange between nodes, and use parameter estimation algorithms such as least squares fitting to gradually correct the frequency offset and phase offset of the local logical clock.

[0004] However, when applying the aforementioned existing technologies to large-scale, highly dynamic, decentralized drone swarms, the following technical drawbacks exist:

[0005] On the one hand, existing technologies struggle to overcome synchronization errors caused by time-varying propagation delays in highly maneuverable scenarios. Most existing distributed synchronization algorithms are based on the ideal assumption that nodes are relatively stationary and that propagation delays between nodes are fixed or symmetrical. However, in actual high-speed maneuvering flight, the rapid and irregular changes in the relative positions of UAVs introduce significant delay jitter. Directly using timestamps containing highly dynamic time-varying measurement noise for parameter estimation introduces huge observation errors, leading to severe distortion in the estimation of frequency and phase offsets of node logic clocks, and consequently causing a rapid deterioration in the overall network synchronization accuracy.

[0006] On the other hand, existing technologies generally employ fixed synchronization interaction cycles, lacking the ability to dynamically adapt to the evolution of system states. In complex actual flight missions, the clock error accumulation rate of UAV swarms is dynamically changing. Existing fixed-period timestamp broadcasting mechanisms have an insurmountable contradiction: when the swarm topology and clock state are relatively stable, high-frequency fixed-period interactions generate a large amount of communication redundancy, causing wireless channel collisions and excessively consuming the limited energy resources of UAVs; conversely, when the swarm performs high-maneuvering actions that cause the clock error to diverge sharply, low-frequency interaction cycles cannot provide enough data in time for correction. Existing methods cannot achieve a dynamic trade-off between maximizing synchronization accuracy and minimizing communication overhead based on the current actual clock error variance of the entire network and its evolution trend. Summary of the Invention

[0007] Technical Problem: To address the shortcomings of existing technologies, the present invention aims to provide an efficient distributed time synchronization method suitable for UAV swarms. This method overcomes synchronization errors caused by time-varying propagation delays in highly maneuverable environments without requiring frequent bidirectional synchronization; and it can adaptively adjust the synchronization interaction cycle according to the actual clock state of the swarm, thereby effectively reducing system communication overhead and energy consumption while ensuring high-precision time synchronization.

[0008] Technical Solution: The UAV node clock model is explained as follows: A hardware clock is typically equipped in a UAV node. The hardware clock circuit consists of a crystal oscillator and a counter. The crystal oscillator, based on the piezoelectric effect, continuously outputs a reference pulse signal at a fixed frequency. The counter detects the rising edge of the signal, accumulates it, and outputs a step trigger signal. The hardware clock register responds to the step signal, updating its internal second, minute, and hour values ​​sequentially according to the set carry modulus, thereby driving the linear increase of the system hardware clock. Therefore, the hardware clock of each node at real time is typically modeled as follows:

[0009] (1),

[0010] in, , This represents the set of all nodes in a drone swarm. Represents a node The nominal frequency of a crystal oscillator. This indicates the drone node. exist The actual frequency of the crystal oscillator at any given time. Indicates the actual time at which the timing began. The hardware time indicating the start of the timing. and The value of is determined by the physical characteristics and manufacturing process of the crystal oscillator.

[0011] Since the frequency of a crystal oscillator changes very little in a short period of time, we can assume that the frequency remains constant during synchronization, thus obtaining a linear model of the hardware clock:

[0012] (2),

[0013] in, Represents a node Frequency shift (skew), Represents a node Phase offset. The value is determined by the node. The crystal oscillator itself has physical characteristics and the hardware clock value at the start of synchronization.

[0014] In time synchronization, the hardware time cannot usually be changed directly, so a logical time is typically maintained. Compensation for hardware time:

[0015] (3),

[0016] in, Representing nodes respectively The frequency compensation value and phase compensation value are specific parameters that need to be estimated for time synchronization, and these parameters can be calculated using a time synchronization algorithm.

[0017] The purpose of time synchronization is to ensure that the logical clocks of different nodes have a consistent clock rate and logical clock value after compensation.

[0018] The technical solution of the present invention includes the following steps:

[0019] Step 1: Synchronization information exchange between adjacent nodes of the drone.

[0020] Consider a A distributed swarm of drones was modeled as an undirected graph in its topology. ,in This represents the set of drone nodes. This represents the set of valid communication links between nodes. Each drone node... Internally maintains a local hardware clock Its value is calculated by formula (2). During the initial synchronization phase, all UAV nodes use their own hardware clock as a reference. It periodically exchanges synchronization information with its neighboring nodes. The value of is determined by both the size of the drone swarm and the synchronization time required, and is positively correlated with both. In this method, a value of 2 seconds is recommended. If all nodes simultaneously initiate synchronization broadcasting, it will trigger a severe broadcast storm, leading to channel collapse. Considering any node... Initial phase shift To approximate a uniform distribution of physical properties, an interleaved startup strategy based on unique node numbers is introduced. Node Initial transmission time The result is obtained from formula (4):

[0021] (4),

[0022] in, Indicates the synchronization period. Represents a node The number satisfies .

[0023] node To synchronize communication with its neighbors, this step includes the following process:

[0024] (1) Node The hardware clock satisfies At that time, the interval for initiating communication with neighboring nodes within the communication range is... The total number of times is Multiple broadcasts, among which It is a positive integer, representing the number of times the current node has initiated synchronization from the start of synchronization to the current moment, i.e., the synchronization round. The value of is determined by the required synchronization accuracy; the higher the required synchronization accuracy, the higher it needs to be adjusted. and The value of is suggested in this method. The value ranges from 20ms to 50ms. A value of 5 is recommended. The broadcast content includes nodes. The hardware time at each broadcast Its own logic clock adjustment parameters and the current synchronization rounds .

[0025] (2) If neighboring nodes Non-first time receiving node The synchronous broadcast occurs when the difference between the current reception time and the last time a message was received from this node is less than a preset time threshold. Then the node Broadcast information is obtained through demodulation, and the hardware time of each received broadcast is recorded. And save it locally. The value is determined by the noise level of the timestamp measurement and is inversely correlated with it. In this method, a value between 40s and 50s is recommended.

[0026] (3) If neighboring nodes First received node The synchronous broadcast, or the difference between the current reception time and the last time the message was received from this node is greater than or equal to a preset time threshold. If r is time, then clear the node. Historical information, demodulation to obtain the latest broadcast information, and recording the hardware time of each received broadcast. And save it locally. The value is consistent with (2). Then, an action is initiated with the node. The bidirectional synchronization process:

[0027] (a) Node To the node Initiate a two-way synchronization request and record the hardware time at the time of its own transmission. .

[0028] (b) Node Record message arrival time And in local time Send a reply message, the message content includes The value of .

[0029] (c) node Solution adjustment point The reply message is recorded, and the message arrival time is recorded. .

[0030] (4) After completing synchronization communication with all neighbors, the node In the next Within a given time period, it receives synchronization broadcasts from its neighboring nodes and proceeds to step two after the synchronization period ends.

[0031] Step 2: Propagation delay compensation.

[0032] This step aims to use the synchronization interaction information received from neighbors during the synchronization period in step one to compensate for the time-varying propagation delay of the timestamp, eliminate errors caused by the mobility of the drone swarm, and improve the accuracy of subsequent logic clock parameter estimation.

[0033] With nodes Received node The synchronous interactive information illustrates the propagation delay compensation process.

[0034] (1) If node With nodes The node adopted a multi-broadcast and two-way synchronous information exchange method. Demodulation synchronization information acquisition node Hardware time when broadcasting Four timestamps for bidirectional message exchange At the same time, nodes Record the local hardware clock value when the broadcast message arrives. Calculate the nodes according to formula (5). With nodes Baseline round Propagation delay :

[0035] (5),

[0036] After the calculation is completed, the time information obtained from the synchronous interaction and the initial propagation delay are stored locally.

[0037] (2) If node With nodes The information exchange method only involved multiple broadcasts, and the nodes... Demodulation synchronization information acquisition node Hardware time when broadcasting It also records the local hardware clock value when the broadcast message arrives. .

[0038] Due to the highly dynamic movement of drone nodes, the baseline round The propagation delay and the current round The propagation delays have become significantly different. To achieve high-precision time synchronization, it's necessary to calculate and compensate for the time-varying propagation delays. Since each round of information exchange is very short, the propagation delays between the two nodes can be considered approximately constant within a single round of multicast. Mapping the transmission and reception times of the reference round and the current round to a series of time points in a Cartesian coordinate system, the impact of the time-varying propagation delay is that the straight line fitted by the latest round's time point set is generally vertically offset compared to the reference round's time point set. Using the line connecting the average points of the two rounds of synchronization as a baseline, the latest round's time information point set is shifted longitudinally. The sum of the squares of the longitudinal distances of all points to the baseline It can be represented as:

[0039] (6),

[0040] in

[0041] ,

[0042] ,

[0043] ,

[0044] ,

[0045] ,

[0046] ,

[0047] Indicates the number of multiple broadcasts per round. Indicates the reference synchronization round, This indicates the current synchronization round; all other time information can be obtained through the information exchange in step one. Relative motion causes a difference in propagation delay between the two rounds of information. This actually corresponds to a function. When taking the minimum value The possible values ​​of are: Therefore, we have:

[0048] (7),

[0049] in,

[0050] ,

[0051] ,

[0052] The values ​​of each parameter are consistent with those in formula (6).

[0053] Therefore, the propagation delay of the current synchronization round It can be calculated using formula (8):

[0054] (8),

[0055] Step 3: Update logic clock adjustment parameters.

[0056] Step two calculates the time-varying propagation delay without frequent bidirectional synchronization. This step combines the synchronization information obtained above to estimate and update the logic clock parameters, thus completing the final time synchronization.

[0057] In distributed time synchronization algorithms, nodes It updates its own logical clock adjustment parameters and logical clock frequency compensation parameters using the clock information of its neighbors. With phase compensation parameters The updated formula is as follows:

[0058] (9),

[0059] (10),

[0060] in, Represents a node In the The set of node numbers that receive neighbor synchronization information within a synchronization cycle; Indicates the first The number of synchronization neighbors received within a synchronization cycle; This indicates the propagation delay between two nodes during synchronous communication. Indicates that the two nodes are at the th The logic clock value during round synchronization; This indicates the hardware clock frequency offset between the two nodes.

[0061] For nodes any neighboring node Calculate the relative frequency shift between the two.

[0062] ,

[0063] Furthermore, the logic clock is corrected using the propagation delay obtained in step two. The basis for updating logic clock parameters.

[0064] This step includes the following process:

[0065] (a) If node With nodes A multi-broadcast and two-way synchronous information exchange method was implemented, and the reference propagation delay was calculated using formula (5). The relative frequency offset is calculated using formula (11). :

[0066] (11),

[0067] in

[0068] ,

[0069] ,

[0070] Indicates the number of multiple broadcasts per round. Indicates the current synchronization round.

[0071] (b) If node With nodes The information exchange method only involves multiple broadcasts, and the propagation delay of the current round is calculated using formulas (7) and (8). The relative frequency shift is calculated using formula (12). :

[0072] (12),

[0073] in,

[0074] ,

[0075] ,

[0076] This indicates the reference synchronization round from which the last multi-broadcast and two-way synchronization was performed. Indicates the current synchronization round.

[0077] When the nodes within the current synchronization period are calculated After determining the relative frequency offset with all neighbors and their corresponding propagation delays, the nodes are updated using formulas (9) and (10). The logic clock parameters are adjusted to complete the time synchronization process.

[0078] Step 4: Adaptive selection of synchronization cycle based on reinforcement learning.

[0079] The aforementioned delay compensation mechanism has achieved initial high-precision synchronization. However, due to the temperature drift and aging effects of the hardware crystal oscillator, the residual clock frequency offset will continue to integrate over time. If interaction is stopped at this point, the error will diverge. On the other hand, maintaining synchronization with a fixed high-frequency period will waste communication resources and generate a large amount of unnecessary communication overhead.

[0080] This step focuses on the adaptive periodic decision-making mechanism based on reinforcement learning. This mechanism enables nodes to perceive their current synchronization state and dynamically and adaptively output the optimal synchronization broadcast period for the current state through continuous trial and error exploration and value iteration (Q-Learning).

[0081] This step includes the following process:

[0082] (1) State space Nonlinear quantization and dimensionality reduction mapping.

[0083] node Current synchronous round The current synchronization state can be measured by the variance of the previous round's local logic clock difference and the current synchronization trend. To ensure convergence within a finite time, this invention performs non-uniform discretization on the variance of the state space. The continuous variance space is non-uniformly divided into... Each discrete state level ensures that frequently accessed states are more densely partitioned. Furthermore, the temporal evolution trend of variance is introduced. , These represent whether the current variance is worsening, improving, or remaining stable compared to the previous period, respectively. By orthogonally combining the one-dimensional variance parameter with the one-dimensional trend parameter, a model is ultimately constructed that contains only... The discrete state space effectively avoids the curse of dimensionality. Parameters The value of is determined by the maximum allowable synchronization error of the drone swarm; the larger the allowable maximum synchronization error, the better. The larger the value, the better. In this method, It is recommended to use an integer between 10 and 20.

[0084] (2) Action space initialization

[0085] The action that a drone node needs to decide in each state is the time interval between the next broadcast synchronization message. Combining the flight characteristics of the cluster and the control frequency requirements, the action space is defined as... A set of discrete absolute time periods: The unit is seconds. The value of is determined by the synchronization precision; the higher the required synchronization precision, the better. A larger value is recommended, with 5 being the preferred value in this method. Smaller action values ​​imply more frequent state interactions, suitable for the rapid error correction phase; larger action values ​​represent an extended node silence period, suitable for high-security domains with minimal errors, to maximize the conservation of communication resources.

[0086] (3) Reward function design

[0087] The design of the reward mechanism determines the final convergence form of the strategy. To guide the system to maximize the communication cycle within the safety domain while strictly avoiding the risk of going out of bounds, the algorithm introduces normalized variance. ,in and Representing nodes respectively The local variance of the current synchronization round and the maximum allowable variance of the drone swarm synchronization. Round-time node Reward function obtained from the current drone swarm synchronization state It can be represented as:

[0088] (13),

[0089] In the formula, This indicates the synchronization period selected in the previous round. and These are the accuracy and period gain weights, respectively. For synchronous trend feedforward compensation. Represents the cliff penalty constant. This represents the linear penalty constant for deviation. The constant term in the above reward function is an empirical value calibrated from numerous simulation experiments. In this method, it is recommended that... , , , , This enables reinforcement learning algorithms to achieve better overall performance in drone swarm time synchronization.

[0090] Since the state transition probability matrix is ​​unknown in a highly dynamic environment, this invention employs a Q-Learning method based on Temporal Difference (TD) to determine the optimal policy. An approximation is performed. The update of the value function follows the sampling iteration of the Bellman optimal equation:

[0091] (14),

[0092] In the formula, Indicates the state Select action Value; learning rate This is used to adjust the single-step update magnitude of the action value table (Q-Table); the larger the learning rate, the larger the single-step update magnitude. Discount factor. The discount factor is used to characterize the agent's emphasis on maintaining high-precision synchronization of the system over the long term. A larger discount factor indicates a greater emphasis on future synchronization accuracy. The learning rate and discount factor are empirical values ​​calibrated from numerous simulation experiments. In this method, it is recommended that... , It can effectively balance the convergence speed of the algorithm with the long-term synchronization accuracy.

[0093] (4) Iteration of reinforcement learning

[0094] After completing the reinforcement learning modeling described above, the iterative process of adaptive synchronization cycles begins. Node When the At the end of the round of synchronization information interaction, the node calculates the variance of the local logical clock difference and the current synchronization trend, and maps them to the discrete state space according to process (1). At the same time, it calculates the reward function of the action selected in the previous round of synchronization according to formula (13). Then, use formula (14) to update the value function of the corresponding state-behavior in the Q table. Finally, combine the current Q table and use... The strategy selects the cycle for the next synchronization information exchange. Among them, This represents the exploration rate parameter, with a value range of [value missing]. This parameter is used for exploring and utilizing balancing algorithms. The larger the value, the more likely the agent will randomly select an action to probe unknown environmental states. In this method, The recommended value is 0.05.

[0095] Beneficial Effects: The UAV swarm distributed time synchronization method based on delay compensation and reinforcement learning proposed in this invention has the following advantages: Compared with distributed algorithms such as ATS, this invention achieves real-time compensation for time-varying propagation delays without introducing frequent bidirectional information exchange, thereby improving synchronization accuracy from... Instantly upgraded to Seconds. Furthermore, this invention utilizes reinforcement learning to allow each node to adaptively select the cycle for the next synchronization information exchange based on its current synchronization state, reducing communication overhead by 36% without compromising synchronization accuracy. Attached Figure Description

[0096] Figure 1 This is a flowchart of the multiple broadcast and distributed delay compensation process;

[0097] Figure 2 This is a flowchart of adaptive synchronous information interaction cycle selection using reinforcement learning. Detailed Implementation

[0098] To make the objectives, technical solutions, and advantages of this invention clearer, the invention will be further described in detail below with reference to specific embodiments.

[0099] This embodiment constructs a... A swarm of high-performance tactical drones, in Distributed collaborative detection missions are conducted within the airspace. Each UAV node is equipped with a nominal frequency. A high-frequency oven-controlled crystal oscillator (OCXO) serves as the local clock reference. For any node... Its hardware clock has an initial frequency offset that conforms to a uniform distribution between -50 and 50, expressed in ppm (parts per million). The phase offset conforms to a uniform distribution between -1 and 1, expressed in seconds.

[0100] The base cruising speed of all drones within the swarm is set to The high dynamic range. To simulate the maneuvering states caused by airflow disturbances and autonomous obstacle avoidance in actual flight, the speed evolution of the UAV adopts a Markov smooth movement model with Gaussian random perturbation, whose speed change per unit time follows a mean of 0 and a standard deviation of 2. The distribution follows a normal pattern. Furthermore, the effective omnidirectional communication radius of the node is set to... Within this communication interface, UAVs exchange time information via multiple broadcasts, with the multiple broadcast interval set to [specify interval]. Number of broadcasts per round Initial synchronization period .

[0101] The technical solution of this embodiment includes the following steps:

[0102] Step 1: Synchronization information exchange between adjacent nodes of the drone.

[0103] Consider a A swarm of drones, with all drone nodes synchronizing using their own hardware clocks as a reference during the initial synchronization phase. It periodically exchanges synchronization information with its neighboring nodes. If all nodes simultaneously initiate synchronization broadcasts, it will trigger a severe broadcast storm, causing channel collapse. Considering any node... Initial phase shift To approximate a uniform distribution of physical properties, an interleaved startup strategy based on unique node numbers is introduced. Node Initial transmission time Calculated using formula (15):

[0104] (15),

[0105] in, Indicates the synchronization period. Represents a node The number satisfies .

[0106] With nodes Taking synchronous communication with its neighbors as an example, this step includes the following process:

[0107] (1) Node The hardware clock satisfies At that time, the interval for initiating communication with neighboring nodes within the communication range is... The total number of times is Multiple broadcasts, among which It is a positive integer, representing the number of times the current node has initiated synchronization from the start of synchronization to the current moment, i.e., the synchronization round; In this example, the intervals are 2 seconds and the broadcasts are 5 times. The broadcast content includes nodes. The hardware time at each broadcast Its own logic clock adjustment parameters and the current broadcast rounds .

[0108] (2) If neighboring nodes Non-first time receiving node The synchronous broadcast occurs when the difference between the current reception time and the last time a message was received from this node is less than a preset time threshold. Then the node Broadcast information is obtained through demodulation, and the hardware time of each received broadcast is recorded. And save it locally. The value is determined by the timestamp measurement noise level and is inversely correlated with it. In this embodiment, The value is 15.

[0109] (3) If neighboring nodes Non-first time receiving node The synchronous broadcast, and the difference between the current reception time and the last time the message was received from this node is greater than or equal to a preset time threshold. Then clear the node Historical information, demodulation to obtain the latest broadcast information, and recording the hardware time of each received broadcast. And save it locally. Then initiate a connection with the node. The bidirectional synchronization process:

[0110] (a) Node To the node Initiate a two-way synchronization request and record the hardware time at the time of its own transmission. .

[0111] (b) Node Record message arrival time And in local time Send a reply message, the message content includes The value of .

[0112] (c) node Solution adjustment point The reply message is recorded, and the message arrival time is recorded. .

[0113] (4) After completing synchronization communication with all neighbors, the node In the next Within a given time period, it receives synchronization broadcasts from its neighboring nodes and proceeds to step two after the synchronization period ends.

[0114] Step 2: Propagation delay compensation.

[0115] This step aims to use the synchronization interaction information received from neighbors during the synchronization period in step one to compensate for the time-varying propagation delay timestamp, eliminate errors caused by the dynamic changes of the drone swarm, and improve the accuracy of the logic clock parameter estimation in step three.

[0116] With nodes Received node Taking synchronous interactive information as an example, the process of propagation delay compensation is explained.

[0117] (1) If node With nodes The node adopted a multi-broadcast and two-way synchronous information exchange method. Demodulation synchronization information acquisition node Hardware time when broadcasting Four timestamps for bidirectional message exchange At the same time, nodes Record the local hardware clock value when the broadcast message arrives. Calculate the nodes according to formula (16). With nodes Baseline round Propagation delay :

[0118] (16),

[0119] After the calculation is completed, the time information obtained from the synchronous interaction and the initial propagation delay are stored locally.

[0120] (2) If node With nodes The information exchange method only involved multiple broadcasts, and the nodes... Demodulation synchronization information acquisition node Hardware time when broadcasting It also records the local hardware clock value when the broadcast message arrives. Compensation value for propagation delay It can be represented as:

[0121] (17),

[0122] in,

[0123] ,

[0124] ,

[0125] ,

[0126] ,

[0127] ,

[0128] .

[0129] Propagation delay of the current synchronization round It can be calculated using formula (18):

[0130] (18),

[0131] Step 3: Update logic clock adjustment parameters.

[0132] Step two calculates the time-varying propagation delay without frequent bidirectional synchronization. This step combines the synchronization information obtained above to estimate and update the logic clock parameters, thus completing the final time synchronization.

[0133] In distributed time synchronization algorithms, nodes are used as the basis for synchronization. Taking updating its own logical clock adjustment parameters using the clock information of its neighbors as an example, the logical clock frequency compensation parameters... With phase compensation parameters The updated formula is as follows:

[0134] (19),

[0135] (20),

[0136] in, Represents a node In the The set of node IDs that receive neighbor synchronization information within a synchronization cycle. Indicates the first The number of synchronization neighbors received within a synchronization cycle. This represents the propagation delay between two nodes during synchronous communication. Indicates that the two nodes are at the th The logical clock value during wheel synchronization.

[0137] For nodes any neighboring node Calculate the relative frequency shift between the two.

[0138] ,

[0139] Furthermore, the logic clock is corrected using the propagation delay obtained in step two. The basis for updating logic clock parameters.

[0140] This step includes the following process:

[0141] (a) If node With nodes A multi-broadcast and two-way synchronous information exchange method was implemented, and the reference propagation delay was calculated using formula (16). The relative frequency shift is calculated using formula (21). :

[0142] (twenty one),

[0143] in,

[0144] ,

[0145] ,

[0146] Indicates the number of multiple broadcasts per round. Indicates the current synchronization round, in this embodiment The value is 5.

[0147] (b) If node With nodes The information exchange method using multiple broadcasts is employed. The propagation delay of the current round is calculated using formulas (17) and (18). The relative frequency shift is calculated using formula (22). :

[0148] (twenty two),

[0149] in,

[0150] ,

[0151] ,

[0152] This indicates the reference synchronization round from which the last multi-broadcast and two-way synchronization was performed. Indicates the current synchronization round.

[0153] When the nodes within the current synchronization period are calculated After determining the relative frequency offset with all neighbors and their corresponding propagation delays, the nodes are updated using formulas (19) and (20). The logic clock parameters are adjusted to complete the final synchronization.

[0154] Step 4: Adaptive selection of synchronization cycle based on reinforcement learning.

[0155] The aforementioned delay compensation mechanism has achieved initial high-precision synchronization. However, due to the temperature drift and aging effects of the hardware crystal oscillator, the residual clock frequency offset will continue to integrate over time. If interaction is stopped at this point, the error will diverge. On the other hand, maintaining synchronization with a fixed high-frequency period will waste communication resources and generate a large amount of unnecessary communication overhead.

[0156] This step focuses on the adaptive periodic decision-making mechanism based on reinforcement learning. This mechanism enables nodes to perceive their current synchronization state and dynamically and adaptively output the optimal synchronization broadcast period for the current state through continuous trial and error exploration and value iteration (Q-Learning).

[0157] This step includes the following process:

[0158] (1) State space Nonlinear quantization and dimensionality reduction mapping.

[0159] node Current synchronous round The current synchronization state can be determined by the variance of the residual of the previous local logic clock. It is measured against the current synchronization trend. To ensure convergence in a finite time, the algorithm performs non-uniform discretization of the variance of the state space. The continuous variance space is non-uniformly divided into... Each discrete state level ensures that frequently accessed states are more densely partitioned. Furthermore, the temporal evolution trend of variance is introduced. , These represent whether the current variance is worsening, improving, or remaining stable compared to the previous period, respectively. By orthogonally combining the one-dimensional variance parameter with the one-dimensional trend parameter, a model is ultimately constructed that contains only... The discrete state space effectively avoids the curse of dimensionality. Parameters The value of is determined by the maximum allowable synchronization error of the drone swarm; the larger the allowable maximum synchronization error, the better. The larger the value, the better.

[0160] In this embodiment, the continuous variance values ​​are discretized into 10 state levels, namely... The specific division is as follows: variance less than As state one; variance in to In between, every The interval is divided into states, which are then divided into states two through seven; the variance is... to Divided into eight states; variance in to Divided into state nine; variance greater than It is divided into ten states. Each variance state has three possible scenarios based on the current variance trend, forming a total of... A discrete state space of 3D.

[0161] (2) Action space definition

[0162] The action that a drone node needs to decide in each state is the time interval between the next broadcast synchronization message. Combining the flight characteristics of the cluster and the control frequency requirements, the action space is defined as... A set of discrete absolute time periods: The unit is seconds. Smaller action values ​​mean more frequent state interactions, suitable for the rapid error correction phase; larger action values ​​represent an extended quiet period for nodes, suitable for high-security domains with minimal errors, in order to maximize the conservation of communication resources. The value of is determined by the synchronization precision; the higher the required synchronization precision, the better. The larger the value, the better.

[0163] In this embodiment, the action space dimension is defined as 5, that is The action space set is set to The unit is seconds.

[0164] (3) Segmented Hybrid Reward Function design

[0165] The design of the reward mechanism determines the final convergence form of the strategy. To guide the system to maximize the communication cycle within the safety domain while strictly avoiding the risk of going out of bounds, the algorithm introduces normalized variance. ,in and Representing nodes respectively The local variance of the current synchronization round and the maximum allowable variance for drone swarm synchronization. Reward function. It can be represented as:

[0166] (twenty three),

[0167] In the formula, Indicates the selected synchronization period. and These are the accuracy and period gain weights, respectively. For synchronous trend feedforward compensation. Represents the cliff penalty constant. This represents the linear penalty constant for deviation. The constant term in the above reward function is an empirical value calibrated from numerous simulation experiments.

[0168] Specifically, in this embodiment, the maximum variance Precision and period gain weights and The values ​​are 10 and 1 respectively. As variance increases, the trend reward... A value of -3 indicates a negative feedback penalty; a value of 3 indicates a positive reward when variance decreases; and a value of 0 indicates a cliff penalty constant when variance is approximately constant. and deviation linear penalty constant The values ​​are 20 and 50 respectively.

[0169] Since the state transition probability matrix is ​​unknown in a highly dynamic environment, this invention employs a Q-Learning algorithm based on Temporal Difference (TD) to determine the optimal strategy. An approximation is performed. The update of the value function follows the sampling iteration of the Bellman optimal equation:

[0170] (twenty four),

[0171] In the formula, the learning rate Used to adjust the single-step update magnitude of the action value table (Q-Table); the larger the learning rate, the larger the single-step update magnitude; discount factor. The larger the discount factor, which characterizes the agent's emphasis on maintaining high-precision synchronization of the system over the long term, the greater the emphasis on future synchronization accuracy. The learning rate and discount factor are empirical values ​​calibrated from numerous simulation experiments.

[0172] In this embodiment, the learning rate is set to 0.1 and the discount factor is set to 0.85.

[0173] (4) Iteration of reinforcement learning

[0174] After completing the reinforcement learning modeling described above, the iterative process of adaptive synchronization cycle begins. This is done with nodes... For example, when the first At the end of the round of synchronization information interaction, the node calculates the variance of the local logical clock difference and the current synchronization trend, and maps them to the discrete state space according to process (1). At the same time, it calculates the reward function of the action selected in the previous round of synchronization according to formula (23). Then, use formula (24) to update the value function of the corresponding state-behavior in the Q table. Finally, combine the current Q table and use... The strategy selects the cycle for the next synchronization information exchange.

[0175] In this embodiment, an asynchronous update method is used. Specifically, the reward value from the previous synchronization action is calculated based on the current synchronization state of the drone swarm and updated in the Q-table. Finally, combined with the current Q-table, the reward is calculated using... The strategy selects the cycle for the next synchronization information exchange. Among them, This represents the exploration rate parameter, with a value range of [value missing]. This parameter is used for exploring and utilizing balancing algorithms. The larger the value, the more likely the agent will randomly select an action to explore unknown environmental states.

[0176] In this embodiment, That is, there is a 95% probability that the action with the largest Q value in the current state space will be selected as the next synchronization information interaction cycle, and there is a 5% probability that one action will be randomly selected from all actions to be executed, thus achieving a balance between exploration and utilization.

[0177] The specific embodiments described herein are merely illustrative of the spirit of the invention. Those skilled in the art to which this invention pertains may make various modifications or additions to the described specific embodiments or use similar methods to substitute them, without departing from the spirit of the invention or exceeding the scope defined by the appended claims.

Claims

1. A distributed time synchronization method suitable for drone swarms, characterized in that: Includes the following steps: Step 1: Synchronization information exchange between adjacent nodes of the drone; Step 2: Propagation delay compensation. Using the synchronization interaction information of the neighbors received during the synchronization period in Step 1, the timestamp with time-varying propagation delay is compensated to eliminate the error caused by the mobility of the drone swarm and improve the accuracy of subsequent logic clock parameter estimation. The time-varying propagation delay is calculated without frequent bidirectional synchronization. Step 3: Logic clock adjustment parameter update. Combine the previously obtained synchronization information to estimate and update the logic clock parameters, and complete the final time synchronization. Step 4: Adaptive selection of synchronization cycle based on reinforcement learning, and an adaptive cycle decision mechanism based on reinforcement learning; this mechanism enables nodes to perceive the current synchronization state and dynamically and adaptively output the optimal synchronization broadcast cycle in the current state through continuous trial and error exploration and value iteration Q-Learning.

2. The distributed time synchronization method for UAV swarms according to claim 1, characterized in that: Each node In real time Hardware clock The model is as follows: (1), in, , This represents the set of all nodes in a drone swarm. Represents a node The nominal frequency of a crystal oscillator. This indicates the drone node. exist The actual frequency of the crystal oscillator at any given time. Indicates the actual time at which the timing began. The hardware time indicating the start of the timing. and The value of is determined by the physical characteristics and manufacturing process of the crystal oscillator; Since the frequency of a crystal oscillator changes very little in a short time, it is assumed that the frequency remains constant during synchronization, resulting in a linear model of the hardware clock: (2), in, Represents a node Frequency shift, Represents a node Phase shift, The value is determined by the node. The crystal oscillator itself has physical characteristics and the hardware clock value at the start of synchronization is determined by the crystal oscillator. In time synchronization, the hardware time cannot usually be changed directly, so a logical time is typically maintained. Compensation for hardware time: (3), in, Representing nodes respectively The frequency compensation value and phase compensation value are specific parameters that need to be estimated for time synchronization, and these parameters can be calculated by time synchronization algorithms. The purpose of time synchronization is to ensure that the logical clocks of different nodes have a consistent clock rate and logical clock value after compensation.

3. The distributed time synchronization method for UAV swarms according to claim 2, characterized in that: The first step, the synchronization information exchange between adjacent nodes of the UAV, is as follows: Consider a... A distributed swarm of drones was modeled as an undirected graph in its topology. ,in This represents the set of drone nodes. Represents the set of valid communication links between nodes; each drone node Internally maintains a local hardware clock Its value is calculated by formula (2); During the initial synchronization phase, all drone nodes use their own hardware clock as a reference. It periodically exchanges synchronization information with its neighboring nodes. The value of is determined by both the size of the drone swarm and the synchronization time required, and is positively correlated with both. If all nodes start synchronization broadcasting simultaneously, it will cause a severe broadcast storm, leading to channel collapse; considering any node Initial phase shift To approximate a uniform distribution in terms of physical characteristics, an interleaved startup strategy based on unique node numbers is introduced; nodes Initial transmission time The result is obtained from formula (4): (4), in, Indicates the synchronization period. Represents a node The number satisfies .

4. The distributed time synchronization method for UAV swarms according to claim 3, characterized in that: The synchronization information exchange between adjacent nodes of the UAV specifically involves: when a node... To synchronize communication with its neighbors, this step includes the following process: (1) Node The hardware clock satisfies At that time, the interval for initiating communication with neighboring nodes within the communication range is... The total number of times is Multiple broadcasts, among which It is a positive integer, representing the number of times the current node has initiated synchronization from the start of synchronization to the current moment, i.e., the synchronization round; The value of is determined by the required synchronization accuracy. The higher the required synchronization accuracy, the more the broadcast content includes nodes. The hardware time at each broadcast Its own logic clock adjustment parameters and the current synchronization rounds ; (2) If neighboring nodes Non-first time receiving node The synchronous broadcast occurs when the difference between the current reception time and the last time a message was received from this node is less than a preset time threshold. If r is time, then the node Broadcast information is obtained through demodulation, and the hardware time of each received broadcast is recorded. And save it locally. The value is determined by the noise level of the timestamp measurement and is inversely correlated with the noise level of the timestamp measurement. (3) If neighboring nodes First received node The synchronous broadcast, or the difference between the current reception time and the last time the message was received from this node is greater than or equal to a preset time threshold. Then clear the node Historical information, demodulation to obtain the latest broadcast information, and recording the hardware time of each received broadcast. And save it locally. The value is consistent with that in (2), and then an action is initiated with the node. The bidirectional synchronization process, (4) After completing the synchronization communication with all neighbors, the node receives the synchronization broadcast of its neighbor nodes in the following time, and proceeds to the next step after the synchronization period ends.

5. The distributed time synchronization method for UAV swarms according to claim 4, characterized in that: The bidirectional synchronization process between the initiator and the node is as follows: (a) Node To the node Initiate a two-way synchronization request and record the hardware time at the time of its own transmission. ; (b) Node Record message arrival time And in local time Send a reply message, the message content includes The value; (c) node Solution adjustment point The reply message is recorded, and the message arrival time is recorded. ; (4) After completing synchronization communication with all neighbors, the node In the next It receives synchronization broadcasts from its neighboring nodes within a specified time period and proceeds to the next step after the synchronization period ends.

6. The distributed time synchronization method for UAV swarms according to claim 5, characterized in that: The second step of propagation delay compensation specifically involves, when node Received node The synchronous interactive information, and the propagation delay compensation process are as follows: (1) If node With nodes The node adopted a multi-broadcast and two-way synchronous information exchange method. Demodulation synchronization information acquisition node Hardware time when broadcasting Four timestamps for bidirectional message exchange Meanwhile, nodes Record the local hardware clock value when the broadcast message arrives. ; Calculate the nodes according to formula (5) With nodes Baseline round Propagation delay : (5), After the calculation is completed, the time information obtained from the synchronous interaction and the initial propagation delay are stored locally; (2) If node With nodes The information exchange method only involved multiple broadcasts, and the nodes... Demodulation synchronization information acquisition node Hardware time when broadcasting It also records the local hardware clock value when the broadcast message arrives. ; Due to the highly dynamic movement of drone nodes, the baseline round The propagation delay and the current round The propagation delay has changed significantly; Mapping the transmission and reception times of the baseline round and the current round to a series of time points in a Cartesian coordinate system, the effect of time-varying propagation delay is that the straight line fitted by the time point set acquired in the latest round is generally vertically offset compared to the time point set of the baseline round. Using the line connecting the average points of the two rounds of synchronization as a baseline, the latest round's time information point set is shifted longitudinally. The sum of the squares of the longitudinal distances of all points to the baseline Represented as: (6), in , , , , , , Indicates the number of multiple broadcasts per round. Indicates the reference synchronization round, This indicates the current synchronization round; all other time information was obtained during the information exchange in the first step. Relative motion causes a time delay difference in the propagation of information between the two rounds. This actually corresponds to a function. When taking the minimum value The possible values ​​of ; therefore, we have: (7), in, , , The values ​​of each parameter are consistent with those in formula (6); Therefore, the propagation delay of the current synchronization round It can be calculated using formula (8): (8)。 7. The distributed time synchronization method for UAV swarms according to claim 6, characterized in that: The third step combines the previously obtained synchronization information to estimate and update the logic clock parameters, thus completing the final time synchronization. In distributed time synchronization algorithms, nodes It updates its own logical clock adjustment parameters and logical clock frequency compensation parameters using the clock information of its neighbors. With phase compensation parameters The updated formula is as follows: (9), (10, in, Represents a node In the The set of node numbers that receive neighbor synchronization information within a synchronization cycle; Indicates the first The number of synchronization neighbors received within a synchronization cycle; This indicates the propagation delay between two nodes during synchronous communication. Indicates that the two nodes are at the th The logic clock value during round synchronization; This indicates the hardware clock frequency offset between the two nodes; For nodes any neighboring node Calculate the relative frequency shift between the two. , Furthermore, the node uses the propagation delay obtained in the second step to correct the logic clock. The basis for updating logic clock parameters.

8. The distributed time synchronization method for UAV swarms according to claim 7, characterized in that: The third step, which combines the previously obtained synchronization information to estimate and update the logic clock parameters, includes the following process: (a) If node With nodes A multi-broadcast and two-way synchronous information exchange method was implemented, and the reference propagation delay was calculated using formula (5). The relative frequency offset is calculated using formula (11). : (11), in , , Indicates the number of multiple broadcasts per round. Indicates the current synchronization round; (b) If node With nodes The information exchange method only involves multiple broadcasts, and the propagation delay of the current round is calculated using formulas (7) and (8). The relative frequency shift is calculated using formula (12). : (12), in, , , This indicates the reference synchronization round from which the last multi-broadcast and two-way synchronization was performed. Indicates the current synchronization round; When the nodes within the current synchronization period are calculated After determining the relative frequency offset with all neighbors and their corresponding propagation delays, the nodes are updated using formulas (9) and (10). The logic clock parameters are adjusted to complete the time synchronization process.

9. The distributed time synchronization method for unmanned aerial vehicle (UAV) swarms according to claim 8, characterized in that: The fourth step, adaptive selection of synchronization cycles based on reinforcement learning, includes the following process: (1) State space Nonlinear quantization and dimensionality reduction mapping node Current synchronous round The current synchronization state can be measured by the variance of the previous local logic clock difference and the current synchronization trend. To ensure convergence in a finite time, the variance of the state space is non-uniformly discretized. The continuous variance space is non-uniformly divided into... The system employs discrete state levels to ensure that frequently accessed states are more densely divided; furthermore, it introduces the temporal evolution trend of variance. , These represent whether the current variance is worsening, improving, or remaining stable compared to the previous round, respectively. By orthogonally combining the one-dimensional variance parameter with the one-dimensional trend parameter, a model containing only these parameters is ultimately constructed. The discrete state space effectively avoids the curse of dimensionality; parameters The value of is determined by the maximum allowable synchronization error of the drone swarm; the larger the allowable maximum synchronization error, the better. The larger the value of ; (2) Action space initialization The action that a drone node needs to decide in each state is the time interval between the next broadcast synchronization message; combining the flight characteristics of the cluster and the control frequency requirements, the action space is defined as... A set of discrete absolute time periods: The unit is seconds; The value of is determined by the synchronization precision; the higher the required synchronization precision, the better. A larger value means more frequent state interactions for smaller action values, which is suitable for the rapid error correction phase; a larger action value means extending the quiet period of the node, which is suitable for the high security domain with extremely small errors, in order to maximize the conservation of communication resources. (3) Reward function design The design of the reward mechanism determines the final convergence form of the strategy; to guide the system to maximize the communication cycle within the safety domain while strictly avoiding the risk of going out of bounds, the algorithm introduces normalized variance. ,in and Representing nodes respectively The local variance of the current synchronization round and the maximum allowable variance of the drone swarm synchronization; the first Round-time node Reward function obtained from the current drone swarm synchronization state It can be represented as: (13), In the formula, This indicates the synchronization period selected in the previous round. and These are the accuracy and period gain weights, respectively. For synchronous trend feedforward compensation. Represents the cliff penalty constant. This represents the linear penalty constant for deviation; the constant term of the above reward function is an empirical value calibrated from numerous simulation experiments. Since the state transition probability matrix is ​​unknown in a highly dynamic environment, this invention employs a Q-Learning method based on time difference (TD) to determine the optimal policy. To approximate the value function, the update follows the sampling iteration of the Bellman optimal equation: (14), In the formula, Indicates the state Select action Value; learning rate This is used to adjust the single-step update magnitude of the action value table (Q-Table); a higher learning rate results in a larger single-step update magnitude. Discount factor. The discount factor is used to characterize the agent's emphasis on maintaining high-precision synchronization of the system over the long term. A larger discount factor indicates a greater emphasis on future synchronization accuracy. The learning rate and discount factor are empirical values ​​calibrated from numerous simulation experiments. (4) Iteration of reinforcement learning After completing the reinforcement learning modeling described above, the iterative process of adaptive synchronization cycle begins, with nodes... When the At the end of the round of synchronization information interaction, the node calculates the variance of the local logical clock difference and the current synchronization trend, and maps them to the discrete state space according to process (1); at the same time, it calculates the reward function of the action selected in the previous round of synchronization according to formula (13). The value function of the corresponding state-behavior in the Q-table is updated using formula (14); finally, combined with the current Q-table, the value function of the corresponding state-behavior is updated using formula (14). The strategy selects the cycle for the next synchronization information exchange, where, This represents the exploration rate parameter, with a value range of [value missing]. This parameter is used for exploring and utilizing balancing algorithms. The larger the value, the more likely the agent will randomly select an action to explore unknown environmental states.