Fuzzy reward function aided local graph attention unmanned aerial vehicle ad hoc network routing method

The local graph attention-based UAV ad hoc network routing method assisted by fuzzy reward function solves the network problems caused by rapid topology changes in UAV ad hoc networks, achieves efficient data transmission and stable routing selection, adapts to different topologies, and reduces network congestion.

CN116723531BActive Publication Date: 2026-06-12SOUTH CHINA UNIV OF TECH

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
SOUTH CHINA UNIV OF TECH
Filing Date
2023-05-11
Publication Date
2026-06-12

AI Technical Summary

Technical Problem

Existing drone ad hoc network routing protocols suffer from high network overhead, high end-to-end latency, and network congestion when faced with frequent changes in network topology caused by rapid movement. Furthermore, traditional deep reinforcement learning methods are difficult to effectively generalize to unseen network topologies, and the high complexity of reward function design leads to unstable agent behavior.

Method used

A local graph attention-based UAV ad hoc network routing method with fuzzy reward function assistance is proposed. By constructing a multi-agent distributed routing strategy with local interaction, the method utilizes the SLGAT network to calculate the predicted scores of neighboring nodes, and processes the reward function through fuzzy logic to achieve adaptation to dynamic environments and stable route selection.

🎯Benefits of technology

It improves the adaptability of UAV self-organizing networks to dynamic environments, reduces network congestion, enhances data transmission efficiency, has good generalization and scalability, and simplifies the implementation process of routing protocols.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN116723531B_ABST
    Figure CN116723531B_ABST
Patent Text Reader

Abstract

The application discloses a local graph attention unmanned aerial vehicle ad hoc network routing method assisted by a fuzzy reward function, and comprises the following steps: initializing an unmanned aerial vehicle cluster network model; constructing a task-oriented HELLO packet; when a node has a data packet forwarding task, the matching degree of a neighbor node to the routing task is calculated, the prediction score of the neighbor unmanned aerial vehicle node to the routing task is calculated in turn, and the neighbor node with the maximum prediction score is selected as the next hop for routing of data packet forwarding; position, speed and data packet arrival rate information about the node in a routing feedback HELLO packet is extracted, fuzzy processing is performed, and a fuzzy reward is calculated; a multi-agent distributed routing strategy updating framework based on local interaction is constructed, the evaluation network parameter of the node is updated, and the parameter of the evaluation network is assigned to a target network. The application improves the adaptability of the unmanned aerial vehicle self-organizing network to a dynamic environment, reduces network congestion, and improves the data transmission efficiency of networking.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of unmanned aerial vehicle (UAV) self-organizing network technology, and specifically to a fuzzy reward function-assisted local graph attention UAV self-organizing network routing method. Background Technology

[0002] Thanks to significant advancements in sensors, high-precision molds, and communication technologies, flying ad-hoc networks (FANETs) comprised of multiple unmanned aerial vehicles (UAVs) have been widely adopted in emergency communications, smart agriculture, and remote sensing due to their high mobility, ease of deployment, and scalability. In FANETs, ​​UAV nodes forward data packets using a multi-hop approach, and communication between nodes does not rely on any other infrastructure. Therefore, routing between nodes in a FANET is a key factor limiting network performance, posing a significant challenge to collaborative communication and networking. In recent years, routing algorithms for UAV networking have become a hot topic in the field of multi-UAV communication.

[0003] However, in FANET, the frequent changes in network topology caused by the rapid movement of drones result in significant network overhead due to the need for pre-built routing tables. Traditional reactive routing protocols (such as AODV) require frequent route discovery, leading to high end-to-end latency. Existing technologies incorporate geographic location information into routing protocols, proposing the classic Greedy Perimeter Stateless Routing (GPSR) protocol. However, GPSR only considers the geographic location of nodes, and nodes will always choose the nearest neighbor for forwarding. This can lead to network congestion and packet loss when network traffic increases.

[0004] In recent years, deep reinforcement learning (DRL) has achieved remarkable results in solving decision-making and intelligent control problems. Therefore, many studies have focused on introducing DRL technology into the routing optimization problem of FANET, such as: implementing DRL using deep neural networks (DNNs) and realizing routing decisions through a centralized training and distributed execution approach; improving geographic routing protocols using DRL technology and using the Proximal Policy Optimization (PPO) algorithm to minimize the number of hops while avoiding routing holes; and introducing LSTM into DRL to design a multi-agent routing protocol to collaboratively consider the minimum hop count and network congestion.

[0005] However, since computer networks can be naturally represented as graph structures, existing DRL-based routing protocols typically use traditional neural networks (such as DNN, Recurrent Neural Network, etc.), which cannot learn the structural features of graphs. Therefore, they are difficult to effectively generalize to a network topology that has never been seen before. Secondly, the DRL algorithm has strict requirements for the design of the reward function. Subtle changes in the value of the reward function will lead to abnormal behavior of the agent, which undoubtedly reduces the reliability and stability of the routing protocol. Summary of the Invention

[0006] To overcome the defects and shortcomings of existing technologies, this invention provides a fuzzy reward function-assisted local graph attention UAV ad hoc network routing method. This invention improves the adaptability of UAV ad hoc networks to dynamic environments, reduces network congestion, and enhances the data transmission efficiency of the network.

[0007] To achieve the above objectives, the present invention adopts the following technical solution:

[0008] A fuzzy reward function-assisted local graph attention UAV ad hoc network routing method includes the following steps:

[0009] Initialize the drone swarm network model;

[0010] Construct task-oriented HELLO packets, including neighbor discovery HELLO packets and route feedback HELLO packets;

[0011] SLGAT-based routing: When a node has a packet forwarding task, it calculates the matching degree of the neighboring nodes for the routing task, calculates the prediction score of the neighboring drone nodes for the routing task in turn, and selects the neighboring node with the highest prediction score as the next hop for packet forwarding.

[0012] Extract information about node location, speed, and data packet arrival rate from the HELLO packets in the routing feedback, perform fuzzification processing, and calculate fuzzy rewards;

[0013] A multi-agent distributed routing policy update framework based on local interaction is constructed to update the evaluation network parameters of the nodes. When the update time of the target network arrives, the parameters of the evaluation network are assigned to the target network.

[0014] As a preferred technical solution, the initialization of the UAV swarm network model specifically includes:

[0015] A cluster of multiple drones is initialized in three-dimensional space. Each drone node is regarded as an independent intelligent agent and maintains an evaluation network and a target network respectively. The evaluation network and the target network are randomly initialized with the same parameters. The evaluation network is used to select the next hop, and the target network is used to optimize the routing strategy. Each drone node has an experience pool, which is used to store the experience value generated by the node for each forward.

[0016] As a preferred technical solution, the content of the neighbor discovery HELLO packet includes: the HELLO packet function indicator "0", the HELLO packet sequence number, the ID of the node that sent the HELLO packet, and its own location coordinates (x, y, y). i ,y i ,z i ),speed Data packet arrival rate R i The IDs of the neighboring nodes and the location coordinates (x, y) of the neighboring nodes. k ,y k ,z k ),speed and packet arrival rate R k , Represents the set of first-order neighbors;

[0017] Data packet arrival rate R k The calculation formula is expressed as:

[0018]

[0019] Among them, t i This represents the time from startup to the present, and PR represents the time in t. i The number of data packets received by the node within a given time period.

[0020] As a preferred technical solution, the content of the routing feedback HELLO packet includes: the HELLO packet's function indicator "1", the location of node j (x j ,y j ,z j ),speed Packet arrival rate R j Routing task completion metrics f j And the component Q for updating network parameters of node i j ;

[0021] Routing task completion metrics f j The calculation formula is expressed as:

[0022]

[0023] Where u0 represents the ID of the target node, u j This represents the ID of the HELLO packet node itself.

[0024] As a preferred technical solution, the steps of constructing a first-order neighbor table and constructing a second-order neighbor table are also included.

[0025] The steps for constructing a first-order neighbor table include:

[0026] When a drone node j receives a HELLO packet from a neighbor, it checks the ID of the node that initiated the HELLO packet. If the ID is not in its own neighbor table, it adds the information of node i to its own first-order neighbor table. If a record of node i already exists in the first-order neighbor table, it compares the sequence number of the HELLO packet in that record with the sequence number of the currently received HELLO packet. If the sequence number in the record is smaller, it updates the information of node i in the first-order neighbor table.

[0027] The steps for constructing a second-order neighbor table include:

[0028] Check the neighbor node information carried in the HELLO packet. If the neighbor node ID in the HELLO packet is equal to the node's own ID, skip the construction of the second-order neighbor table entry corresponding to the neighbor's ID. If they are not equal, check if there is a record of the neighbor's ID in the node's own second-order neighbor table. If not, add the node's information to the second-order neighbor table. If there is a record of node k in the second-order neighbor table, determine whether to update the information based on the sequence number of the HELLO packet.

[0029] As a preferred technical solution, the input layer of the SLGAT network is a fully connected layer used to obtain the hidden representation of the feature states of neighboring nodes. The parameter W of the fully connected layer has a dimension of h×f, where h is the number of node states and f is the feature dimension of the hidden layer. A single-layer three-headed graph attention network is connected after the fully connected layer. The output layer is a fully connected layer used to transform the feature states of the neighbor into the corresponding task prediction scores.

[0030] As a preferred technical solution, the SLGAT-based routing selection involves the following steps:

[0031] When node i has a packet forwarding task, it starts observing the environment and obtains the state s. i Specifically, it is expressed as:

[0032]

[0033] in, S i It is a vector representing the normalized state of node i, specifically expressed as:

[0034]

[0035] Where, x max ,y max ,z max These represent the maximum distances that a drone node is allowed to reach in the x, y, and z directions, respectively. This represents the maximum speed of the drone node in the x, y, and z directions. Let represent the set of first-order neighbors of node i. This represents the maximum packet arrival rate among the neighboring nodes of node i;

[0036] The experience is stored in an experience pool, and a single experience is randomly selected from the pool for training. The specific process includes:

[0037] With state s i As input, let the evaluation network of node i process s in sequence. t Each element in Obtain the task prediction scores corresponding to node i and its neighboring nodes. Process each neighbor of node i serially to obtain the prediction scores of all neighbors of node i in turn. Select action a according to the greedy strategy. i ;

[0038] Data packet p is forwarded from node i to node j, and node j calculates the reward r. j The state at the next moment becomes node j's observation of its neighboring nodes. At the same time, node j sends the HELLO packet back to node i based on the routing feedback.

[0039] As a preferred technical solution, the calculation of the fuzzy reward includes the following specific steps:

[0040] Selecting neighbor nodes u j After the next hop, obtain node u based on its own neighbor table. j Position coordinates (x) j ,y j ,z j ),speed and packet arrival rate R j ;

[0041] The distance difference, link duration, and congestion level are blurred.

[0042] Construct fuzzy rules;

[0043] The fuzzy reward calculated by UAV node i is represented as follows:

[0044]

[0045] Where L represents the number of fuzzy rules, r max Represents the maximum reward, rmin ω represents the minimum reward. l The strength of the fuzzy rule is represented by r. l This indicates a pre-defined reward associated with the rules;

[0046] If the next hop is the destination, the maximum reward is given; if the next hop is a local minimum, the minimum reward is given; otherwise, a fuzzy reward is calculated.

[0047] As a preferred technical solution, the distance difference, link duration, and congestion degree are blurred, specifically including:

[0048] The formula for calculating the distance difference is as follows:

[0049] DDIFF=d nexthop -d myself

[0050]

[0051]

[0052] Where DDIFF represents the distance difference, d myself d represents the distance between the current node and the destination node. nexthop This represents the distance between the next-hop node and the destination node, where (x0, y0, z0) represents the position coordinates of the destination node. i y i , z i (x) represents the position coordinates of the node itself. j y j , z j () indicates the coordinates of the next hop node;

[0053] Three fuzzy sets are defined to indicate the degree of proximity, intermediate proximity, and distance between the next-hop node and the target node. The membership functions corresponding to the three fuzzy sets are as follows:

[0054] f C (d) = clip[-10d, 0, 1]

[0055] f M (d)=clip[min(10d+1,-10d+1),0,1]

[0056] f F (d) = clip[10d, 0, 1]

[0057] Where clip represents the truncation function;

[0058] The formula for calculating link duration is as follows:

[0059]

[0060] in,

[0061] Normalization operation:

[0062] Three fuzzy sets are defined to indicate the length (long), medium, and short) of the link duration between the next-hop node and the current node. The membership functions corresponding to the three fuzzy set tables are as follows:

[0063] g L (l) = clip[-2l+1, 0, 1]

[0064] g M (l)=clip[min(2l,-2l+2),0,1]

[0065] g S (l) = clip[2l-1, 0, 1]

[0066] The packet arrival rate R of the HELLO packet returned by the route. j As a measure of congestion, three fuzzy sets are defined to indicate low, medium, and high levels of congestion at the next-hop node. The membership functions corresponding to the three fuzzy set tables are as follows:

[0067] h L (R j = clip[-2.5R] j +1.25, 0, 1]

[0068] h M (R j = clip[min(2.5R) j -0.25, -2.5R j +2.25), 0, 1]

[0069] h H (R j = clip[2.5R j -1.25, 0, 1].

[0070] As a preferred technical solution, a multi-agent distributed routing policy update framework based on local interaction is constructed, and the specific steps include:

[0071] After node i completes the forwarding of the data packet and obtains relevant experience information, it stores the experience in the experience pool. During the training phase, node i randomly selects n mini-batch samples from the experience pool to train the network.

[0072] The loss function is defined as follows:

[0073]

[0074] Among them, y i =r i +γQ j ×(1-f j ), where γ is the discount factor;

[0075] After node j receives the data packet forwarded by node i, it updates the completion status indicators and network parameters. The HELLO packet is sent back to node i along with the routing feedback;

[0076] The parameter update process of the evaluation network for node i is as follows:

[0077]

[0078] Where β is the learning rate;

[0079] When the target network's update time arrives, the parameters of the evaluation network are assigned to the target network:

[0080]

[0081] in, Indicates the target network parameters. This represents the parameters used to evaluate the network.

[0082] Compared with the prior art, the present invention has the following advantages and beneficial effects:

[0083] (1) The input of existing graph attention networks is usually the features of all nodes in the graph. For the UAV routing problem, since it is a partially observable Markov decision problem, each node can only obtain the state of its neighboring nodes. This invention uses a deep reinforcement learning method assisted by a local graph attention network to mine the local graph structure information in the network through the graph attention mechanism. It has good generalization ability, can be applied to different topologies and can adapt to the rapid movement of UAV nodes. At the same time, it gets rid of the requirement of obtaining the global node state vector of traditional graph attention networks, which is conducive to the deployment of routing algorithms in practical applications.

[0084] (2) In existing methods, the output layer dimension of the neural network is equal to the number of all nodes in the network. However, if a new node is added to the network, the network structure needs to be modified and retrained. This invention uses a serial method to calculate the task prediction score of neighboring nodes, which solves the problem that the traditional routing protocol based on deep neural networks requires modification of the network structure and retraining due to the fixed output layer dimension of the neural network. It can adapt to the changes in the scale of UAV networking and has good scalability and extensibility.

[0085] (3) The present invention calculates the reward function through fuzzy logic, which can effectively reduce the design difficulty of the reward function and enable the agent to avoid abnormal behavior caused by the difference in reward value.

[0086] (4) This invention realizes a fully decentralized multi-agent routing strategy construction in a highly dynamic UAV self-organizing network. Compared with the traditional centralized training and distributed execution framework, this invention greatly simplifies routing implementation, and decentralized training is more in line with the current self-organizing network environment. Attached Figure Description

[0087] Figure 1 This is a schematic diagram illustrating the implementation framework of the local graph attention UAV ad hoc network routing method assisted by the fuzzy reward function of the present invention.

[0088] Figure 2 This is a flowchart illustrating the local graph attention-based UAV ad hoc network routing method assisted by the fuzzy reward function of the present invention.

[0089] Figure 3 This is a performance graph of packet transmission rate under different drone node speeds according to the present invention;

[0090] Figure 4 The average end-to-end latency performance diagram of this invention under different drone node speeds;

[0091] Figure 5 This invention presents throughput performance diagrams under different drone node speeds. Detailed Implementation

[0092] To make the objectives, technical solutions, and advantages of this invention clearer, the invention will be further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative and not intended to limit the invention.

[0093] Example

[0094] like Figure 1 As shown, this embodiment provides a fuzzy reward function-assisted local graph attention UAV ad hoc network routing method, including the following steps:

[0095] Step 1) Initialize the drone swarm network model:

[0096] Initialize a cluster of K drones in three-dimensional space. The boundary points of this three-dimensional space are (x) max ,0,0), (0,y max (0, 0, z) and (0, 0, z) max Drone nodes The position coordinates are represented as (x i y i , z i ), speed is expressed as The set of its first-order neighbors is Second-order neighbors are denoted as Each drone node is considered an independent intelligent agent, maintaining two SLGAT networks, referred to as the "evaluation network" and the "target network," with parameters as follows: and Both networks are randomly initialized with the same parameters. The evaluation network is used to select the next hop, while the target network is used to optimize the routing strategy. Each node also has a capacity of C. max experience pool The experience pool is used to store the experience gained by each node during forwarding, which is used to improve the stability and efficiency of SLGAT network training.

[0097] Step 2) Construct a task-oriented HELLO packet. The header of the HELLO packet uses 0 and 1 to distinguish its purpose: 0 indicates the packet is used for neighbor discovery, and 1 indicates it is used for policy update feedback. In this embodiment, the construction of the task-oriented HELLO packet is divided into the construction of a neighbor discovery HELLO packet and the construction of a route feedback HELLO packet, specifically including:

[0098] (2a): All drone nodes A neighbor discovery HELLO packet is broadcast at a fixed period T. The content of the neighbor discovery HELLO packet includes: the HELLO packet function indicator "0", the sequence number of the HELLO packet, and the ID of the node that sent the HELLO packet. i , its own position coordinates (x i y i , z i ),speed Data packet arrival rate R i The ID of the node's neighbor, u k The location coordinates (x) of neighboring nodes k y k , z k ),speed and packet arrival rate The data packet arrival rate reflects the current congestion status of the node, and its calculation method is as follows:

[0099]

[0100] Among them, t i This represents the time from startup to the present, and PR represents the time in t. iThe number of data packets received by this node within a time period, R i The calculation is real-time, meaning the drone node processes R as soon as it receives a data packet. i Perform an update calculation;

[0101] (2b): For the construction of the first-order neighbor table, the drone node that receives the neighbor discovery HELLO packet. First, check the ID of the node that initiated the HELLO packet: u i If it is not in its own neighbor list, then the information of node i (ID, location, speed, R) will be used. i The sequence number of the HELLO packet is added to the first-order neighbor table of the node i. If the first-order neighbor table already contains a record of node i, the sequence number of the HELLO packet in the record is compared with the sequence number of the currently received HELLO packet. If the sequence number in the record is smaller, the information of node i in the first-order neighbor table is updated; otherwise, the HELLO packet is ignored.

[0102] (2c): For the construction of the second-order neighbor table, the next step is to check the information of the neighbor nodes carried in the HELLO packet: if the "neighbor node ID" in the HELLO packet is u k Equal to "self-ID" u j Then skip u k If the corresponding second-order neighbor entries are not equal, check if u exists in the second-order neighbor table. k If no record exists, then add the information of node k to the second-order neighbor table (including the ID of node k's first-order neighbor i). i u k 、(x k y k , z k ), R k (And the sequence number of the HELLO packet). If a record for node k already exists in the second-order neighbor table, the sequence number of the HELLO packet will determine whether to update the information.

[0103] (2d): Constructing the route feedback HELLO packet: The route feedback HELLO packet is constructed and fed back to the previous hop node only when the current node j receives a data packet transmitted from the previous hop node i. The content of the route feedback HELLO packet includes: the HELLO packet function indicator "1", the position of node j (x... j y j , z j ),speed Packet arrival rate R j Routing task completion metrics f jAnd the component Q for updating network parameters of node i j .

[0104] In this embodiment, the routing task completion indicator f j The calculation method is as follows:

[0105]

[0106] Where u0 represents the ID of the target node.

[0107] In this embodiment, Q j The calculation method is as follows: Where Q(·) represents the SLGAT target network of node j itself, and its parameters are: s j This represents node j's observation of the environment.

[0108] In this embodiment, node i updates its own SLGAT evaluation network and the target network through the HELLO packet returned by node j, thereby optimizing the policy.

[0109] Step 3) SLGAT-based route selection: When a node has a packet forwarding task, calculate the matching degree of the neighboring nodes for the routing task, then calculate the prediction score of the neighboring drone nodes for the routing task in turn, and select the neighboring node with the highest prediction score as the next hop for packet forwarding:

[0110] The specific structure of the SLGAT network in this embodiment is as follows:

[0111] The input layer of the SLGAT network is a fully connected layer used to obtain hidden representations of the feature states of neighboring nodes. The parameter W of the fully connected layer has a dimension of h×f, where h is the number of node states and f is the feature dimension of the hidden layer. This is followed by a single-layer three-headed graph attention network, and the final output layer is a layer with parameters... A fully connected layer with a bias of b is used to transform the feature state of the neighbor into the corresponding task prediction score.

[0112] (3a): When node i has a packet forwarding task, it starts observing the environment and obtains the state s. i Among them, state s i It is a list, where each element is a matrix. The list contains the states of node i and its neighbors, and the states of i's neighboring nodes and their neighbors. i The format is as follows:

[0113]

[0114] in, S iIt is a vector representing the normalized state of node i, with the following form:

[0115]

[0116] Where, x max y max , z max These represent the maximum distances that a drone node is allowed to reach in the x, y, and z directions, respectively. This represents the maximum speed of the drone node in the x, y, and z directions. Let represent the set of first-order neighbors of node i. This represents the maximum packet arrival rate among i's neighboring nodes.

[0117] The experience is stored in an experience pool, and a single experience is randomly selected from the pool for training. The specific process includes:

[0118] With state s i As input, let the evaluation network of node i process s in sequence. t Each element in The task prediction scores for node i and its neighboring nodes can then be obtained. Taking the calculation of the prediction score for node j as an example, the steps are as follows:

[0119] First, obtain the hidden layer representation of the state of j and its neighboring nodes:

[0120] h k =WS k j = j, i, ..., f

[0121] Next, we calculate the attention coefficients between j and its neighboring nodes:

[0122]

[0123] Where "||" represents the concatenation operation, a is a trainable weight vector, and σ(·) is the LeackyReLu activation function.

[0124] Therefore, the output feature of node j is:

[0125]

[0126] Finally, x′ j The prediction score for that node can be obtained by inputting it into the output layer of SLGAT: Q j =W2x′ j +b;

[0127] By processing each neighbor of node i sequentially according to the above steps, the predicted scores of all neighbors of node i can be obtained in turn. Finally, action a is selected according to the ∈-greedy greedy strategy.i That is, among the first-order neighbors of node i, the neighbor node with the highest predicted score is selected as the next hop forwarding with a probability of 1-∈, and a neighbor is randomly selected with a probability of ∈.

[0128] (3b): Subsequently, data packet p is forwarded from node i to node j, and node j calculates the reward r through step 3. j The state at the next moment becomes node j's observation of its neighboring nodes. Simultaneously, node j constructs a route feedback HELLO packet according to step 2(2d) and sends it back to node i.

[0129] Step 4) Node i extracts the location, speed, and packet arrival rate information of node j from the received routing feedback HELLO packet, performs fuzzification processing, and calculates the fuzzy reward:

[0130] The calculation of fuzzy rewards includes fuzzification of state information, construction of fuzzy rules, and calculation of fuzzy rewards.

[0131] (4a): State information fuzzification: in selecting neighbor node u j After serving as the next hop, node u can be obtained from its own neighbor table. j Position coordinates (x) j y j , z j ),speed and packet arrival rate R j .

[0132] Furthermore, this embodiment considers three metrics in the calculation of the reward value: distance difference (DDIFF), link duration (LDT), and congestion level (RC).

[0133] First, the DDIFF information is blurred. The DDIFF is calculated by first calculating the distance between the current node and the destination node:

[0134]

[0135] The destination node's coordinates are (x0, y0, z0). Next, the distance between the next-hop node and the destination node is calculated:

[0136]

[0137] Therefore, DDIFF = d = d nexthop -d myself Let DDIFF be abbreviated as d. Define three fuzzy sets: C (Close), M (Middle), and F (Far) to represent the degree of proximity, intermediate proximity, and distance between the next-hop node and the target node. The membership functions corresponding to the three fuzzy sets C, M, and F are as follows:

[0138] f C (d) = clip[-10d, 0, 1]

[0139] f M (d)=clip[min(10d+1,-10d+1),0,1]

[0140] f F (d) = clip[10d, 0, 1]

[0141] Where clip(x, x) min x max ) represents the truncation function, x, x max x min Its input parameter, when x < x min The output of the time function is x min When x > x max The output of the time function is x max If x min <x<x max Then the output of the function is x itself. By inputting DDIFF into the membership function, we can obtain the probability that it belongs to the three fuzzy sets C, M, and F, thereby achieving the fuzzification of DDIFF.

[0142] Next, the LDT is blurred. The LDT is calculated as follows:

[0143]

[0144] in,

[0145] Normalization is also required: LDT is abbreviated as l.

[0146] Define three fuzzy sets L (Large), M (Middle), and S (Small) to represent the length (long), medium, and short) of the link duration between the next-hop node and the current node. The membership functions corresponding to L, M, and S are as follows:

[0147] g L (l) = clip[-2l+1, 0, 1]

[0148] g M (l)=clip[min(2l,-2l+2),0,1]

[0149] g S (l) = clip[2l-1, 0, 1]

[0150] The congestion level RC is directly equal to R. jSimilarly, we define three fuzzy sets, L (Low), M (Middle), and H (High), to represent the low, medium, and high congestion levels of the next-hop node. The membership functions corresponding to L, M, and H are:

[0151] h L (R j = clip[-2.5R] j +1.25, 0, 1]

[0152] h M (R j = clip[min(2.5R) j -0.25, -2.5R j +2.25), 0, 1]

[0153] h H (R j = clip[2.5R j -1.25, 0, 1]

[0154] By applying the position-related membership function to d, we can obtain the probability that d belongs to the three fuzzy sets C, M, and F. Similarly, the probability that the link duration LDT belongs to the three fuzzy sets L, M, and H can be obtained. and congestion level R j The probability of belonging to the three fuzzy sets L, M, and H

[0155] (4b): Fuzzy rule construction: Based on (4a) for DDIFF, LDT ij and R j The constructed fuzzy set can be customized with the following rules:

[0156] IF DDIFF is X1 and LDT ij is X2 and R j is X3, THEEN Reward is r l .

[0157] Where X1 is a fuzzy set about DDIFF, representing one of C, M, and F, and X2 is a fuzzy set about LDT. ij A fuzzy set, representing one of L, M, and S, where X3 is a set of fuzzy sets with respect to R. j Let be a fuzzy set representing one of L, M, and H. Using membership functions, the probabilities that DDIFF belongs to C, M, and F can be calculated, denoted as: Similarly, the LDT can be obtained. ij The probabilities of belonging to L, M, and S are: and R jThe probabilities of belonging to L, M, and H are:

[0158] Taking X1 = C, X2 = L, and X3 = H as an example, the strength of rule l can be expressed as:

[0159]

[0160] The reward r associated with this rule l Designated by human intervention.

[0161] (4c): Assuming the system has L rules, the fuzzy reward calculated by UAV node i can be expressed as:

[0162]

[0163] If the next hop is the destination, the maximum reward is given; if the next hop is a local minimum (including when there are no neighbors other than node i), the minimum reward is given; otherwise, a fuzzy reward is calculated.

[0164] Step 5): Construct a multi-agent distributed routing policy update framework based on local interactions.

[0165] After node i completes the forwarding of the data packet and obtains relevant experience information, it will transfer the experience (s) i a i r i Q j f j The samples are stored in the experience pool. During the training phase, node i randomly selects n mini-batch samples from the experience pool for network training, and the loss function is defined as follows:

[0166]

[0167] Among them, y i =r i +γQ j ×(1-f j ), where γ is the discount factor. Since fully distributed training is used, node i does not need to obtain the next-hop node u. j The network parameters are not the network parameters, but rather the completion status indicators after node j receives the data packet forwarded from i. The HELLO packet is sent back to node i along with the routing feedback.

[0168] Therefore, the parameter update process of the evaluation network for node i is as follows:

[0169]

[0170] Where β is the learning rate.

[0171] When the target network's update time arrives, the parameters of the evaluation network are assigned to the target network:

[0172]

[0173] in, Indicates the target network parameters. This represents the parameters used to evaluate the network.

[0174] To further illustrate the technical content of this embodiment, the simulation results are provided as follows:

[0175] The maximum speed of the drone node was gradually increased from 10m / s to 50m / s. Considering three indicators, namely packet arrival rate, average end-to-end latency and network throughput, the average results of 10 sets of random tests were obtained.

[0176] like Figure 3 The diagram illustrates the packet transmission rate performance of various routing protocols at different maximum flight speeds of drone nodes. It can be seen that the present invention achieves the highest packet transmission rate at different node speeds. However, it can also be observed that because the GPSR protocol selects the nearest node to the target node as the next hop based on the current neighbor table, the lag in neighbor table information increases with the node's speed, leading to packet loss. Although routing protocols based on Independent Q-Learning (IQL) offer improved performance compared to the GPSR protocol, deep neural networks cannot learn the feature information of neighbor nodes and the network structure information in a multi-agent environment. In contrast, the present invention captures the features of neighbor nodes through a local attention mechanism, and its strategy is forward-looking. Furthermore, the introduction of a fuzzy reward function effectively avoids abnormal agent behavior, thereby achieving a higher packet transmission rate. Figure 4 The diagram illustrates the impact of node movement speed on end-to-end latency. As node movement speed continuously increases, network topology changes more frequently, leading to the emergence of hole regions. Since link stability metrics are not considered, the GPSR protocol frequently uses a perimeter forwarding strategy to bypass these hole regions, resulting in higher end-to-end latency. While IQL routing achieves lower end-to-end latency than GPSR, the difficulty in effectively extracting graph structure information from fully connected networks makes it challenging for trained models to effectively generalize to unfamiliar network topologies. This, to some extent, reduces the performance of the IQL routing protocol in terms of end-to-end latency. Figure 5The diagram illustrates the relationship between throughput changes and the movement speed of drone nodes. It can be seen that for all four routing protocols, throughput decreases continuously as node movement speed increases. However, the network using the example of this invention shows a relatively smaller decrease in throughput. When the maximum movement speed of the drone node is 50 m / s, the throughput under GPSR is only 244.4 kbps, while the throughput under the example of this invention is 468.9 kbps, an improvement of 91.85%.

[0177] The above embodiments are preferred embodiments of the present invention, but the embodiments of the present invention are not limited to the above embodiments. Any changes, modifications, substitutions, combinations, or simplifications made without departing from the spirit and principle of the present invention shall be considered equivalent substitutions and shall be included within the protection scope of the present invention.

Claims

1. A fuzzy reward function-assisted local graph attention UAV ad hoc network routing method, characterized in that, Includes the following steps: Initialize the drone swarm network model; Construct task-oriented HELLO packets, including neighbor discovery HELLO packets and route feedback HELLO packets; SLGAT-based routing: When a node has a packet forwarding task, it calculates the matching degree of the neighboring nodes for the routing task, calculates the prediction score of the neighboring drone nodes for the routing task in turn, and selects the neighboring node with the highest prediction score as the next hop for packet forwarding. Extract information about node location, speed, and data packet arrival rate from the HELLO packets in the routing feedback, perform fuzzification processing, and calculate fuzzy rewards; A multi-agent distributed routing policy update framework based on local interaction is constructed to update the evaluation network parameters of the nodes. When the update time of the target network arrives, the parameters of the evaluation network are assigned to the target network.

2. The fuzzy reward function-assisted local graph attention UAV ad hoc network routing method according to claim 1, characterized in that, The initialization of the drone swarm network model specifically includes: A cluster of multiple drones is initialized in three-dimensional space. Each drone node is regarded as an independent intelligent agent and maintains an evaluation network and a target network respectively. The evaluation network and the target network are randomly initialized with the same parameters. The evaluation network is used to select the next hop, and the target network is used to optimize the routing strategy. Each drone node has an experience pool, which is used to store the experience value generated by the node for each forward.

3. The fuzzy reward function-assisted local graph attention UAV ad hoc network routing method according to claim 1, characterized in that, The neighbor discovered that the HELLO packet contained the following: the HELLO packet function indicator "0", the HELLO packet sequence number, the ID of the node that sent the HELLO packet, and its own location coordinates. ,speed Data packet arrival rate IDs of the neighboring nodes of this node The location coordinates of neighboring nodes ,speed and packet arrival rate , , Represents the set of first-order neighbors; Packet arrival rate The calculation formula is expressed as: ; in, This indicates the time from startup to the present. Indicates in The number of data packets received by the node within a given time period.

4. The fuzzy reward function-assisted local graph attention UAV ad hoc network routing method according to claim 1, characterized in that, The content of the routing feedback HELLO packet includes: the HELLO packet function indicator "1", node... Location ,speed Data packet arrival rate Routing task completion metrics and for nodes Component for updating network parameters ; Routing task completion metrics The calculation formula is expressed as: ; in, Indicates the ID of the target node. This represents the ID of the HELLO packet node itself.

5. The fuzzy reward function-assisted local graph attention UAV ad hoc network routing method according to claim 1, characterized in that, It also includes the steps of constructing a first-order neighbor table and constructing a second-order neighbor table; The steps for constructing a first-order neighbor table include: Received a message from a neighbor about the discovery of a HELLO package drone node Check the ID of the node that initiated the HELLO packet. If it is not in its own neighbor table, then add the node... The information of each node is added to its own first-order neighbor table. If the first-order neighbor table already contains a node... If the record contains a sequence number for the HELLO packet, it compares the sequence number of the HELLO packet in that record with the sequence number of the currently received HELLO packet. If the sequence number in the record is smaller, the node is updated. Information from the first-order neighbor table; The steps for constructing a second-order neighbor table include: Check the neighbor node information carried in the HELLO packet. If the neighbor node ID in the HELLO packet is equal to the node's own ID, skip building the second-order neighbor table entry corresponding to the neighbor's ID. If they are not equal, check if there is a record of the neighbor's ID in the node's own second-order neighbor table. If not, add the node to the second-order neighbor table. Information about each node, if the second-order neighbor table contains such a node. The record is then updated based on the HELLO package's serial number.

6. The fuzzy reward function-assisted local graph attention UAV ad hoc network routing method according to claim 1, characterized in that, The input layer of the SLGAT network is a fully connected layer, used to obtain hidden representations of the feature states of neighboring nodes. The parameters of the fully connected layer... The dimension is ,in, It is the number of node states. It is the feature dimension of the hidden layer. After the fully connected layer, a single-layer three-headed graph attention network is connected. The output layer is a fully connected layer, which is used to transform the feature state of the neighbor into the corresponding task prediction score.

7. The fuzzy reward function-assisted local graph attention UAV ad hoc network routing method according to claim 1, characterized in that, SLGAT-based routing involves the following steps: When node When there is a packet forwarding task, environmental observation is initiated to obtain the state. Specifically, it is expressed as: ; in, , It is a representation of a node The vector of normalized states is specifically represented as: ; in, These represent the maximum distances that a drone node is allowed to reach in the x, y, and z directions, respectively. This represents the maximum speed of the drone node in the x, y, and z directions. Represents a node The set of first-order neighbors, express The maximum packet arrival rate among the neighboring nodes; The experience is stored in an experience pool, and a single experience is randomly selected from the pool for training. The specific process includes: by state As input, let the node The evaluation network processes sequentially Each element in , obtain node The task prediction scores corresponding to its neighboring nodes are processed serially. Each neighbor, in turn, receives Based on the predicted scores of all neighbors, select actions according to a greedy strategy. ; Data packets From node Forward to node ,node Calculate rewards The state of the node in the next moment becomes Observation of neighboring nodes, simultaneously, nodes Based on the HELLO packet returned by the routing, it is sent back to the node. .

8. The fuzzy reward function-assisted local graph attention UAV ad hoc network routing method according to claim 1, characterized in that, The specific steps for calculating the fuzzy reward include: Selecting neighbor nodes After the next hop, obtain the node based on its own neighbor table. Position coordinates ,speed and packet arrival rate ; The distance difference, link duration, and congestion level are blurred. Construct fuzzy rules; UAV Node The calculated fuzzy reward is represented as follows: ; Where L represents the number of fuzzy rules. Indicates the maximum reward. Indicates the minimum reward. Indicates the strength of the fuzzy rule. This indicates a pre-defined reward associated with the rules; If the next hop is the destination, the maximum reward is given; if the next hop is a local minimum, the minimum reward is given; otherwise, a fuzzy reward is calculated.

9. The fuzzy reward function-assisted local graph attention UAV ad hoc network routing method according to claim 8, characterized in that, The distance difference, link duration, and congestion level are blurred, specifically including: The formula for calculating the distance difference is as follows: ; ; ; Where DDIFF represents the distance difference. This represents the distance between the current node and the destination node. This represents the distance between the next-hop node and the destination node. Indicates the location coordinates of the destination node. Represents the position coordinates of the node itself. Indicates the position coordinates of the next hop node; Three fuzzy sets are defined to indicate the degree of proximity, intermediate proximity, and distance between the next-hop node and the target node. The membership functions corresponding to the three fuzzy sets are as follows: ; ; ; in, This represents the truncation function; The formula for calculating link duration is as follows: ; in, , ; Normalization operation: ; Three fuzzy sets are defined to indicate the length (long), medium, and short) of the link duration between the next-hop node and the current node. The membership functions corresponding to the three fuzzy set tables are as follows: ; ; ; The packet arrival rate of the HELLO packet will be returned by the route. As a measure of congestion, three fuzzy sets are defined to indicate low, medium, and high levels of congestion at the next-hop node. The membership functions corresponding to the three fuzzy set tables are as follows: ; ; 。 10. The fuzzy reward function-assisted local graph attention UAV ad hoc network routing method according to claim 1, characterized in that, The specific steps for constructing a multi-agent distributed routing policy update framework based on local interactions include: At the node After forwarding the data packet and obtaining relevant experience information, the node stores the experience in the experience pool. During the training phase, random selection is performed from the experience pool. A small batch of samples is used to train the network; The loss function is defined as follows: ; in, , Discount factor; node Received from node After the data packet is forwarded, the component will update the status indicators and network parameters. The HELLO packet is sent back to the node along with the routing feedback. ; node The parameter update process for the evaluation network is as follows: ; in, The learning rate; When the target network's update time arrives, the parameters of the evaluation network are assigned to the target network: ; in, Indicates the target network parameters. This represents the parameters used to evaluate the network.