An underwater mobile node routing method, device, equipment and storage medium
By constructing a time-varying graphical model and Q-learning decision-making, the problem of poor adaptability of existing routing protocols in underwater mobile sensor networks is solved, realizing a highly reliable, low-latency, and energy-efficient underwater mobile node network that adapts to the dynamic changes of complex underwater environments.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- JILIN UNIVERSITY
- Filing Date
- 2023-06-29
- Publication Date
- 2026-06-23
AI Technical Summary
Existing routing protocols are mainly designed for networks of fixed underwater sensor nodes and fail to effectively adapt to highly dynamic mobile sensor networks. This results in poor applicability of network connectivity prediction models to the impact of water flow, and low network reliability and efficiency.
A time-varying graph model of the target node is constructed. Q-learning decision-making is used to calculate the Q-function to select the best action for data forwarding. The time-varying graph model is updated to adapt to dynamic changes in the network, including dynamic updates of neighbor nodes, link quality, and energy information.
It achieves high reliability, low latency, and high energy efficiency in underwater mobile node networks, extends network lifetime, and adapts to the needs of highly dynamic environments.
Smart Images

Figure CN116709461B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of underwater communication, and in particular to an underwater mobile node routing method, apparatus, device, and computer-readable storage medium. Background Technology
[0002] Submersible acoustic sensor networks deploy sensor nodes in the monitored waters and work collaboratively through acoustic communication. However, due to factors such as water flow, monitoring blind spots or network failures can easily occur. Underwater mobile wireless sensor networks, which tightly integrate underwater robots and underwater wireless sensor networks, are self-organized by a large number of ordinary sensor nodes and autonomous underwater vehicles. These networks provide a good technical means for applications such as underwater environmental monitoring, marine data collection, seabed resource development, and maritime search and rescue.
[0003] Existing routing protocols are primarily designed for networks of fixed underwater sensor nodes. Their connectivity prediction models mainly focus on the impact of water flow on node locations, and are not suitable for highly dynamic mobile sensor networks. Summary of the Invention
[0004] The purpose of this invention is to provide an underwater mobile node routing method, apparatus, device, and computer-readable storage medium, applicable to the field of underwater communication. The method of this invention achieves high reliability, low latency, high energy efficiency, and extended network lifetime of underwater mobile node networks by constructing a time-varying graph model of the nodes and making Q-learning decisions based on the time-varying graph model. It avoids the problem that the connection prediction models of existing routing protocols are mainly based on the influence of water flow on node positions and are not suitable for highly dynamic mobile sensor networks.
[0005] To address the aforementioned technical problems, this invention provides an underwater mobile node routing method, comprising:
[0006] Construct a time-varying graph model of the target node, and update the time-varying graph model every first preset time interval;
[0007] The Q-values of the target node performing different actions in the current state are obtained by calculating the Q-function based on the time-varying graph model.
[0008] Select the largest Q value from the Q values, and determine the action corresponding to the largest Q value as the target action of the target node;
[0009] The target action is executed to cause the target node to forward data to the next-hop routing node.
[0010] Optionally, the construction of the time-varying graph model of the target node includes:
[0011] Determine the set of neighboring nodes, the set of links, and the second preset time for the target node; wherein the first preset time is an equal division of the second preset time;
[0012] Determine the set of connectivity duration and link quality for any link within the second preset time period;
[0013] Determine the remaining energy set of neighboring nodes;
[0014] Construct a time-varying graph model of the target node based on the set of neighboring nodes, the set of links, the set of connectivity durations, and the set of remaining energy: G n =(N,E,T,D) T Q T ,En);
[0015] Among them, G n The time-varying graph model of the target node n, where N is the set of neighboring nodes, E is the set of links, T is the second preset time, and D... T Let Q be the set of connectivity durations of any of the links within T. T Let En be the set of link quality for any given link within T, and En be the set of remaining energy.
[0016] Optionally, determining the set of connectivity durations of any link within the second preset time period includes:
[0017] The link connectivity duration is calculated based on the speed of the target node and the speed of the neighboring nodes in any given link.
[0018] The sum of the link connectivity duration and the link hover duration is determined as the link duration.
[0019] The set of connectivity durations within the second preset time period is determined based on the total duration of all the links.
[0020] Optionally, determining the set of link quality for any link at the second preset time includes:
[0021] The link quality of any of the links at the first preset time is obtained based on the historical packet reception rate, transmission probability, and success probability.
[0022] The set of link qualities within the second preset time period is determined based on all the link qualities.
[0023] Optionally, the step of calculating the Q-value of the target node performing different actions in the current state based on the time-varying graph model includes:
[0024] Calculate the state transition probability and reward function based on the time-varying graphical model;
[0025] The state transition probability and the reward function are input into the Q function to obtain the Q value of the target node performing different actions in the current state.
[0026] Optionally, calculating the state transition probability based on the time-varying graphical model includes:
[0027] The link quality of any link in the time-varying graph model is input into the first model to calculate the state transition probability corresponding to that link. The expression of the first model is:
[0028]
[0029] in, The probability of a successful state transition. The probability of a state transition that fails to transmit is given. The link quality from the target node n to the neighbor node m within time t. Let be the link quality from the neighbor node m to the target node n within t, s′ be the next state of the current state s, S be the state space set, and rcv be the successful transmission state.
[0030] Optionally, calculating the reward function based on the time-varying graph model includes:
[0031] Based on the time-varying graph model, construct the energy function, link quality function, delay function, and identification function;
[0032] The energy function, the link quality function, the delay function, and the identification function are input into the second model to calculate the reward function. The expression of the second model is:
[0033]
[0034] in, Let E be the reward function. n (s,a) is the energy function, p n (s,a) is the link quality function, rt n (s,a) is the delay function, F L (s,a) is the identification function, α1 is the first weight, α2 is the second weight, and α3 is the third weight.
[0035] To address the aforementioned technical problems, the present invention also provides an underwater mobile node routing device, comprising:
[0036] The time-varying graph model module is used to construct a time-varying graph model of the target node and update the time-varying graph model every first preset time interval;
[0037] The Q-value calculation module is used to calculate the Q-value of the target node performing different actions in the current state based on the time-varying graph model.
[0038] An action selection module is used to select the maximum Q value from the Q values and determine the action corresponding to the maximum Q value as the target action of the target node.
[0039] The routing and forwarding module is used to execute the target action so that the target node forwards data to the next-hop routing node.
[0040] To address the aforementioned technical problems, the present invention also provides a fault diagnosis device, comprising:
[0041] Memory, used to store computer programs;
[0042] A processor, configured to implement any of the aforementioned underwater mobile node routing methods when executing the computer program.
[0043] To address the aforementioned technical problems, the present invention also provides a computer-readable storage medium storing computer-executable instructions, which, when executed by a processor, implement any of the aforementioned underwater mobile node routing methods.
[0044] As can be seen, the method of the present invention achieves high reliability, low latency, high energy efficiency, and extended network lifetime of underwater mobile node networks by constructing a time-varying graph model of nodes and making Q-learning decisions based on the time-varying graph model. This avoids the problem that the connection prediction models of existing routing protocols are mainly based on the influence of water flow on node positions and are not suitable for highly dynamic mobile sensor networks. Attached Figure Description
[0045] To more clearly illustrate the technical solutions in the embodiments of the present invention or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are only embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on the provided drawings without creative effort.
[0046] Figure 1 A flowchart of an underwater mobile node routing method provided in an embodiment of the present invention;
[0047] Figure 2 This is an example diagram of a time-varying graph model provided in an embodiment of the present invention;
[0048] Figure 3 This is a structural block diagram of an underwater mobile node routing device provided in an embodiment of the present invention. Detailed Implementation
[0049] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.
[0050] The following combination Figure 1 , Figure 1 A flowchart of an underwater mobile node routing method provided in an embodiment of the present invention is shown. The method may include:
[0051] S101: Construct a time-varying graph model for the target node and update the time-varying graph model every first preset time interval.
[0052] The underwater environment is complex, and underwater acoustic channels are characterized by long latency, narrow bandwidth, and high bit error rate. Furthermore, the mobility of AUVs (Autonomous Underwater Vehicles) leads to link instability, causing dynamic changes in network topology. Therefore, underwater mobile networks exhibit characteristics of DTN (Delay Tolerant Networks), where nodes can communicate via a store-and-forward mechanism. This overcomes the network instability caused by AUV mobility and the problems associated with underwater acoustic channels, ensuring the reliability and efficiency of the underwater mobile ad hoc network.
[0053] A time-varying graph is a graph in which the nodes and edges change in topology and properties over time. It is commonly used to describe dynamic changes in complex systems. In network research, time-varying graphs are widely used to describe the spatiotemporal transformation characteristics of network topology, facilitating the analysis and design of communication protocols and algorithms within networks.
[0054] This invention can construct a time-varying graph model of mobile nodes in a DTN. In this embodiment, any node can be set as the target node. This invention does not limit the specific form of the time-varying graph model, which can generally include network resources such as link duration, link quality, remaining node energy, and number of neighboring nodes. For example, in this embodiment, the time-varying graph model of the target node n can be represented as: G n =(N,E,T,D) T Q T ,En);
[0055] Among them, G nLet N be the time-varying graph model for the target node n, N be the set of neighboring nodes, E be the link set, T be the second preset time, and D be the time interval. T Let Q be the set of connectivity durations of any link within time T. T Let D be the set of link quality for any link within T, and En be the set of remaining energy. In this embodiment, for D... T Q T and E n It can also be expressed as:
[0056]
[0057] For a given node S, its time graph model can be illustrated as follows: Figure 2 As shown, the set of neighboring nodes of node S includes nodes A and B. The given time range T is divided into multiple time periods, which in this example can be divided into four time periods (t1, t2, t3, t4). The link duration of link (S, A) is also considered. For all neighboring nodes Link quality of link (S,A) For all neighboring nodes Node S records the remaining energy of its neighboring nodes, and node B records the remaining energy En. B ={120}, for all neighboring nodes En = {En A En B}
[0058] In this embodiment, for the highly dynamic underwater acoustic sensor network, most underwater nodes are AUVs (Automatic Valves), and the rest are static nodes. The AUVs have their post-entry path planning known before entering the water. Nodes calculate link connectivity and link duration based on the path planning information. Energy and link quality information are acquired through communication. To reduce unnecessary communication overhead, each node calculates its time-varying graph when sending data for the first time and first broadcasts a Meta (metadata) packet to all neighboring nodes, containing the node's energy information and link quality status. After all neighboring nodes receive the Meta packet, they stop broadcasting. In this embodiment, the time-varying graph model can be updated by obtaining the Meta data of neighboring nodes.
[0059] In this embodiment, due to the high dynamism of the network, the connectivity status of nodes is constantly changing. Some links may be disconnected during time period t1 but connected during time period t2. Therefore, the time-varying graph needs to be updated every first preset time period t, where the first preset time period t is an equal part of the second preset time period T, i.e., t = t1, ... t2. nThis helps increase the number of links that can be connected later. In this embodiment, the remaining energy changes continuously with the transmission of data packets, so the time-varying graph is updated to maintain its accuracy. This embodiment is not limited to a specific update method; generally, updates can be performed by periodically broadcasting Meta packets, or by implicit or explicit acknowledgments in each communication.
[0060] In this embodiment, the link connectivity duration can be calculated from the target node speed and the neighbor node speed in any link. The sum of the link connectivity duration and the link hovering duration is determined as the link duration. The set of connectivity durations within a second preset time period is determined based on the total link duration.
[0061] AUV link connectivity duration T exp Including the hovering time T of the node within the communication range hang And the duration T of connectivity maintained during the movement of the two nodes. move The duration of link connectivity can be calculated using path planning information from the AUV, including coordinates, pitch angle, heading angle, and speed.
[0062] In this embodiment, to more accurately predict connectivity duration, the influence of ocean currents on node velocity is considered during the calculation. For the target node n, its velocity is v. n =(v x ,v y After considering ocean currents, the nodal velocities are as follows:
[0063]
[0064] Wherein, the velocity of the target node n is (v x ,v y ), the pitch angle is α n The heading angle is β n The ocean current velocity is (v cx ,v cy ).
[0065] In this embodiment, the distance traveled by the target node within the first preset time period is d. n =v n Let the current coordinates of the target node n be (x′) ×t. n ,y′ n ,z′ n If ), then the coordinates after time t are (x... n ,y n ,z n ), then it can be expressed as:
[0066]
[0067] Among them, (xn ,y n ,z n (x′) represents the coordinates of the target node after time t. n ,y′ n ,z′ n ) represents the current coordinates of the target node, v n The target node velocity is α, and the pitch angle is α. n The heading angle is β n .
[0068] Let D be the predicted distance between the target node n and a neighbor node m within time t. p The predicted distance can be calculated using the following formula:
[0069]
[0070] In the formula, (x′ n ,y′ n ,z′ n (x′) represents the current coordinates of the target node. n ,y′ n ,z′ n ) represents the current coordinates of the neighbor, and α represents the pitch angle of the target node. n The heading angle is β n The pitch angle of the neighboring node is α. m The heading angle is β m t is the first preset time, v n v is the velocity of the target node. m For the target node velocity, D p Let n be the predicted distance between the target node n and its neighboring node m.
[0071] In this embodiment, x′ can be set n -x′ m y′ n -y′ m and z′ n -z′ m Let x, y, and z be respectively, and let (v n sinα n cosβ n )-(v m sinα m cosβ m ) for e, (v n sinα n sinβ n )-(v m sinα m sinβ m ) for f and (v n cosα n)-(v m cosα m ) is g.
[0072] Then D p This can be further expressed as:
[0073] D p 2 =t 2 (e 2 +f 2 +g 2 )+t(2ex+2fy+2gz)+x 2 +y 2 +z 2 ;
[0074] In this embodiment, when the distance between two nodes reaches the maximum communication range R after time t, then D... p If the value is R, then the connection time of the link during the movement can be calculated:
[0075]
[0076] Among them, T move R represents the link connection duration, taking the value greater than 0 of the two values. R is the maximum communication range between the two nodes after time t.
[0077] After calculating the link connectivity duration, adding the link hovering duration to the link connectivity duration yields the link duration:
[0078] T exp =T move +T hang ;
[0079] Among them, T move T represents the link connectivity duration. hang T represents the link hovering duration. exp This represents the duration of the link.
[0080] In this embodiment, after calculating the link duration, the link duration T can be... exp Update the time-varying graph model to target node n In this way, the time-varying graph model of the target node is constructed.
[0081] Link quality is one of the indicators that the sender considers when determining whether a neighboring node can be used as the next hop. Link quality is mainly affected by the historical packet reception rate from the target node n to its neighboring node m, the predicted distance from the target node n to its neighboring node m, and the impact of node density distribution on collisions.
[0082] In this embodiment, the link quality of any link in the first preset time period can be obtained based on the historical packet reception rate, transmission probability, and success probability. The link quality set in the second preset time period can be determined based on the quality of all links. The link quality of a certain link can be calculated using the following example.
[0083] In this embodiment, the historical packet reception rate from the target node n to its neighbor node m can be calculated:
[0084]
[0085] Among them, PDR nm Let m be the historical packet reception rate from the target node n to its neighbor node m. This represents the number of packets successfully transmitted from the target node n to its neighbor node m within time t. Let t be the total number of packets transmitted from the target node n to its neighbor node m within time t.
[0086] In this embodiment, the transmission probability refers to the distance d between two points within time t. nm The probability of correctly transmitting a data packet of size τ bits when the transmission power is P and the communication frequency is ft is calculated as follows:
[0087] p a t (d nm ,ft,P,τ)=(1-BER(d nm (t),ft,P)) τ ;
[0088] Where, p a t (d nm ,ft,P,τ) is the transmission probability, BER(d nm (t),ft,P) represents the bit error rate.
[0089] This embodiment does not limit the calculation method of the bit error rate. Generally, the underwater path propagation attenuation model A(d) can be determined. nm Given the ambient noise N(ft) and the ambient noise N(ft), the signal-to-noise ratio is calculated using the following formula:
[0090]
[0091] Where P is the transmission power, A(d) nm ,ft) is the underwater path propagation attenuation model, N(ft) is the environmental noise, ft is the communication frequency, and SNR(d nm ,ft,P) represents the signal-to-noise ratio.
[0092] In this embodiment, the MAC (Medium Access Control) protocol used can be Slotted-Aloha, where all frames are the same size, time is divided into equal-length time slots, and each time slot can transmit one frame. Nodes can only send frames at the beginning of a time slot, and clocks are synchronized among nodes. If two or more nodes send frames in the same time slot, a collision is detected. When a node has a new frame, it must send it in the next time slot. If there is no collision, the node can continue sending new frames in the next time slot; if a collision occurs, the node retransmits the frame in the next time slot with probability λ until successful.
[0093] In this embodiment, the success probability can refer to the probability that the target node n successfully transmits to its neighbor node m without collision at any time. The set of neighbor nodes within the transmission range of the target node n can be represented as N. n ={m:d nm ≤R t}, where R t For the transmission range, d nm Let n be the distance between the target node n and its neighboring node m.
[0094] In this embodiment, the success probability can be calculated using the following formula:
[0095]
[0096] Where, p c t (n,m,λ) represents the success probability, and λ represents the packet sending rate. Let t be the number of neighboring nodes of the target node within time t, where t is the first preset time.
[0097] In this embodiment, the link quality of any link at the first preset time can be obtained based on historical packet reception rate, transmission probability, and success probability. The specific calculation method is shown in the following formula:
[0098]
[0099] In the formula, Let p be the link quality from target node n to neighbor node m within time t, μ be the past link quality weight, (1-μ) be the most recent link quality weight, and p be the link quality weight. a t (d nm ,ft,P,τ) is the transmission probability, p c t (n,m,λ) represents the success probability.
[0100] In this embodiment, the set of link quality within a second preset time T can be determined based on the link quality of all links.
[0101] S102: Calculate the Q-value of the target node in the current state by using the Q-function based on the time-varying graph model.
[0102] Q-learning is one of the main reinforcement learning techniques used in machine learning, where a system iteratively learns from experience to achieve a control problem objective. It can handle problems with stochastic transitions and rewards without adjustment and approximates the Q-value through iteration. The Q-learning process can be viewed as a Markov decision process, described as a set of states, actions, state transition probabilities, and rewards.
[0103] In this embodiment, each node processing data packets is in a state, which represents the number of times the data packet transmission failed. State s = k indicates that the node has transmitted data packets k times, with a maximum of K transmissions. The value of K in this embodiment can be set according to the actual application scenario. When a data packet sent by a node is successfully received by a neighboring node, the node's state changes to rcv; when transmission fails, state s = k + 1, the data packet is dropped, and the node's state changes to drop. In this embodiment, the state space can be represented as S = {0, ..., K-1} ∪ {drop, rcv}.
[0104] In this embodiment, the target node determines the forwarding node for each data packet retransmission. When the target node's state is s = 0, ..., K-1, the target node's action set is as follows: t H t represents the operation time of the HOLD action. F Let Nn(t) be the operation time of the FORWARD action, and let Nn(t) represent the neighboring nodes of the target node n within time t. When s∈{drop,rcv}, no action can be executed. In this embodiment, the action set of the target node can be represented as A. n (S), where a represents the action to be performed.
[0105] In this embodiment, a reward function can be constructed based on a time-varying graph model. For the target node n, the reward function can be:
[0106] R n (s,a)=(α1E n (s,a)+α2p n (s,a)+α3rt n (s,a))*F L (s,a);
[0107] Among them, R n (s,a) is the reward function, E n (s,a) is the energy function, p n (s,a) is the link quality function, rtn (s,a) is the delay function, F L (s,a) is the identification function, α1 is the first weight, α2 is the second weight, and α3 is the third weight.
[0108] In this embodiment, the node forwarding identifier can be calculated based on the link duration, as shown in the following formula:
[0109]
[0110] in, V is the node forwarding identifier. trans Where L is the data transmission rate, L is the data packet length, and d is the data transfer rate. nm Let n be the distance between the target node n and its neighboring node m, and v be the speed of sound.
[0111] In this embodiment, the relationship between the node forwarding identifier and the identifier function is as follows:
[0112]
[0113] Among them, F L (s,a) is the identifier function. is the node forwarding identifier, m is the neighbor node, ty is the delay time, and a is the target node's action.
[0114] In this embodiment, the relationship between the link quality function and link quality can be expressed as:
[0115]
[0116] in, To determine the link quality from target node n to neighbor node m within time t, p n (s,a) is the link quality function, m is the neighbor node, ty is the delay time, and a is the action of the target node.
[0117] In this embodiment, the energy distribution reward value can be calculated using the following formula:
[0118]
[0119] Among them, E n (s,a) is the energy function, and E(m,n,t) is the residual energy of the target node n and its neighbor node m at time t. d (n,t) represents the energy distribution reward value of the target node n's neighboring nodes within time t, ty represents the delay time, and a represents the target node's action.
[0120] Wherein, E(m,n,t) can be expressed as:
[0121] E(m,n,t)=c e (n,t)+c e (m,t);
[0122] Where E(m,n,t) represents the residual energy of the target node n and its neighbor node m within time t, and c e (n,t) is the remaining energy reward function for the target node n in time t, c e (m,t) is the reward function for the remaining energy of neighbor node m in time t.
[0123] When communication is successful:
[0124]
[0125] Among them, c e (n,t) is the remaining energy reward function for the target node n in time t, c e (m,t) is the reward function for the remaining energy of neighbor node m in time t, E(n) is the remaining energy of target node n, and E(m) is the remaining energy of neighbor node m. nm The energy consumption, E, for sending a message from target node n to neighbor node m. mn The energy consumption, E, for sending a message from neighbor node m to target node n. init (n) represents the initial energy of the target node n, E init (m) represents the initial energy of neighboring node m.
[0126] When communication fails, the target node n needs to retransmit the data:
[0127]
[0128] Among them, c e (n,t) is the reward function for the remaining energy of the target node n in time t, and E(n) is the remaining energy of the target node n. nm E represents the energy consumption of the target node n sending data to its neighbor node m. init (n) represents the initial energy of the target node n. This represents the number of retransmissions.
[0129] In this embodiment, the number of retransmissions can be calculated using the following formula:
[0130]
[0131] in, t represents the number of retransmissions. r As an auxiliary parameter, K represents the maximum number of transmissions. Let be the link quality from target node n to neighbor node m within time t.
[0132] In this embodiment, E d The calculation method for (n,t) can be shown in the following formula:
[0133]
[0134] Among them, E d (n,t) represents the energy distribution reward value of the target node n's neighboring nodes within time t, and E(n) represents the remaining energy of the target node n. Let be the average energy of the set of neighboring nodes of the target node n.
[0135] In this embodiment, the delay function can be expressed as:
[0136] rt n (s,a)=rt n (n,m,t),m∈a,ty∈a;
[0137] Among them, rt n (s,a) is the delay function, rt n (n,m,t) represents the action reward, m represents the neighboring nodes, ty represents the delay time, and a represents the action of the target node.
[0138] Routing decisions involve two operations: forwarding packets to neighboring nodes (FORWARD), or storing packets for future use, in which case they are not forwarded (HOLD). H and t F T represents the operation time from the target node to the neighbor nodes HOLD and FORWARD, respectively. h and T f These are the HOLD and FORWARD operation times from the neighbor node to the next hop k node, respectively.
[0139] In this embodiment, to minimize latency, the reward for taking the action HOLD or FOEWARD can be expressed as:
[0140]
[0141] Among them, rt n (n,m,t) represents the action reward, rt n (n,m,t H ) is a direct reward for holding the action, rt n (n,m,t F T is a direct reward for the action FORWARD. h The HOLD operation time from the neighbor node to the next hop k node. Let T be the duration of the link from target node n to neighbor node m within time t. dis the communication duration from the target node n to the neighbor node m. is the minimum delay for the neighbor node m to take HOLD or FOEWARD to the next-hop node k.
[0142] In this embodiment, after constructing the reward function through the time-varying graph model, the state transition probability can also be calculated according to the time-varying graph model.
[0143] Input the link quality of any link in the time-varying graph model into the first model to calculate the corresponding state transition probability of the link. For the target node n and the neighbor node m, Substituting into the first model can calculate the state transition probability. For each state s = 0,..., K - 1, we consider two cases according to whether the transmission is successful: the target node n transitions from state s to state rcv and the data packet is not successfully sent. When the data packet is not successfully sent, it is further divided into two cases: if s < K - 1, only the number of retransmissions needs to be increased, and the state s transitions to s' = k + 1. Otherwise, after reaching the maximum number of retransmissions s = K - 1, the data packet is discarded, and s' = drop. The transition probabilities of successful transmission and failed transmission can be calculated by the first model. The expression of the first model is:
[0144]
[0145] Among them, is the state transition probability of successful transmission, is the state transition probability of unsuccessful transmission, is the link quality from the target node n to the neighbor node m at time t, is the link quality from the neighbor node m to the target node n at time t, s′ is the next state of the current state s, S is the state space set, and rcv is the successful transmission state.
[0146] In this embodiment, the Q function can be expressed as:
[0147]
[0148] Among them, Q n (s,a) is the Q value of the n node selecting action a in state s, R n (s,a) is the reward function, γ is the discount factor, is the state transition probability, V n (s′) is the maximum value among the Q values of all actions a executed in state s′, and S is the state space.
[0149] In this embodiment, for the target node n, it can obtain the Meta packet from its neighboring nodes every t time interval to update the time-varying graph model. Based on the time-varying graph model, the Q-value function is calculated to obtain all actions a∈A performed by the target node n in the current state s. n The Q value of (s), A n (s) is the set of actions for the target node n.
[0150] S103: Select the maximum Q value from the Q values and determine the action corresponding to the maximum Q value as the target action of the target node.
[0151] S104: Execute the target action to cause the target node to forward data to the next-hop routing node.
[0152] In this embodiment, the largest Q value among all Q values can be selected, and the action corresponding to the largest Q value can be determined as the target action of the target node. The target node executes the target action, including the selected next-hop node and the HOLD or FORWARD operation.
[0153] In this embodiment, after the target node performs the target action, its state s will change to state s′. The target node makes a decision on the action to take through Q learning and forwards the data to the next routing node.
[0154] Generally, when calculating the Q function, the V function can also be updated. This embodiment does not limit the specific operation of updating the V function. Generally, the maximum value of the Q value in the corresponding state can be updated to the V value of that state, as shown in the following formula:
[0155]
[0156] Among them, V n (s) is the V value in state s, where a∈A n (s) is the set of actions for the target node n, Q n (s,a) is the Q-value of node n when it chooses action a in state s.
[0157] This invention achieves high reliability, low latency, high energy efficiency, and extended network lifetime for underwater mobile node networks by constructing a time-varying graph model of nodes and making Q-learning decisions based on the time-varying graph model. This avoids the problem that existing routing protocols, whose connection prediction models are mainly based on the influence of water flow on node positions, are not suitable for highly dynamic mobile sensor networks.
[0158] The following is a specific embodiment provided by the present invention, which may include:
[0159] Get the Meta packet of each neighbor node in the set of neighbor nodes, and update the time-varying graph model of the target node;
[0160] Calculate the actions of the target node in the current state according to the time-varying graph model, obtain the Q value corresponding to each action, and update the V function.
[0161] Select the largest Q value from all Q values, and determine the action corresponding to the largest Q value as the target action to be executed by the target node;
[0162] The target node executes the target action to forward the data to the next hop routing node.
[0163] The following combination Figure 3 , Figure 3 This is a structural block diagram of an underwater mobile node communication device provided in an embodiment of the present invention. The device may include:
[0164] The time-varying graph model module 100 is used to construct a time-varying graph model of the target node and update the time-varying graph model every first preset time interval.
[0165] Q-value calculation module 200 is used to calculate the Q-value of the target node performing different actions in the current state based on the time-varying graph model;
[0166] The action selection module 300 is used to select the maximum Q value from the Q values and determine the action corresponding to the maximum Q value as the target action of the target node.
[0167] The routing forwarding module 400 is used to execute the target action so that the target node forwards data to the next-hop routing node.
[0168] Based on the above embodiments, the present invention constructs a time-varying graph model of nodes and makes Q-learning decisions based on the time-varying graph model, thereby achieving high reliability, low latency, high energy efficiency, and extending network lifetime of underwater mobile node networks. This avoids the problem that existing routing protocols, whose connection prediction models are mainly designed for the influence of water flow on node positions, are not suitable for highly dynamic mobile sensor networks.
[0169] Based on the above embodiments, the time-varying graph model module 100 may include:
[0170] The first determining submodule is used to determine the set of neighboring nodes, the set of links, and the second preset time of the target node; wherein the first preset time is an equal division of the second preset time;
[0171] The second determining submodule is used to determine the set of connection duration durations and the set of link quality for any link within the second preset time period;
[0172] The third determining submodule is used to determine the remaining energy set of neighboring nodes;
[0173] The construction submodule is used to construct a time-varying graph model of the target node based on the set of neighboring nodes, the set of links, the set of connectivity durations, the set of connectivity durations, and the set of remaining energy: G n =(N,E,T,D) T Q T ,En);
[0174] Among them, G n The time-varying graph model of the target node n, where N is the set of neighboring nodes, E is the set of links, T is the second preset time, and D... T Let Q be the set of connectivity durations of any of the links within T. T Let En be the set of link quality for any given link within T, and En be the set of remaining energy.
[0175] Based on the above embodiments, the second determining submodule may include:
[0176] The first unit is used to calculate the link connectivity time of the link based on the speed of the target node and the speed of the neighboring nodes in any link.
[0177] The second unit is used to determine the link duration by summing the link connection duration and the link hover duration;
[0178] The third unit is used to determine the set of connectivity durations within the second preset time period based on the total duration of all the links.
[0179] Based on the above embodiments, the second determining submodule may include:
[0180] The fourth unit is used to obtain the link quality of any of the links at the first preset time based on the historical packet reception rate, transmission probability and success probability;
[0181] The fifth unit is used to determine the set of link qualities within the second preset time period based on all the link qualities.
[0182] Based on the above embodiments, the Q-value calculation module 200 may include:
[0183] The calculation submodule is used to calculate the state transition probability and reward function based on the time-varying graph model;
[0184] The execution submodule is used to input the state transition probability and the reward function into the Q function to obtain the Q value of the target node performing different actions in the current state.
[0185] Based on the above embodiments, the computing submodule may include:
[0186] The first model unit is used to input the link quality of any link in the time-varying graph model into the first model to calculate the state transition probability corresponding to the link. The expression of the first model is:
[0187]
[0188] in, The probability of a successful state transition. The probability of a state transition that fails to transmit is given. The link quality from the target node n to the neighbor node m within time t. Let be the link quality from the neighbor node m to the target node n within t, s′ be the next state of the current state s, S be the state space set, and rcv be the successful transmission state.
[0189] Based on the above embodiments, the computing submodule may include:
[0190] The function construction unit is used to construct an energy function, a link quality function, a delay function, and an identification function based on the time-varying graph model.
[0191] The second model unit is used to input the energy function, the link quality function, the delay function, and the identification function into the second model to calculate the reward function. The expression of the second model is:
[0192]
[0193] in, Let E be the reward function. n (s,a) is the energy function, p n (s,a) is the link quality function, rt n (s,a) is the delay function, F L (s,a) is the identification function, α1 is the first weight, α2 is the second weight, and α3 is the third weight.
[0194] Based on the above embodiments, the present invention also provides a device that may include a memory and a processor. The memory stores a computer program, and when the processor invokes the computer program in the memory, it can implement the steps provided in the above embodiments. Of course, the device may also include various necessary network interfaces, a power supply, and other components.
[0195] The present invention also provides a computer-readable storage medium having a computer program stored thereon, which, when executed by an execution terminal or processor, can implement the method provided in the embodiments of the present invention; the storage medium may include various media capable of storing program code, such as a USB flash drive, a portable hard drive, a read-only memory (ROM), a random access memory (RAM), a magnetic disk, or an optical disk.
[0196] The various embodiments in this specification are described in a progressive manner, with each embodiment focusing on its differences from other embodiments. Similar or identical parts between embodiments can be referred to interchangeably. For the apparatus disclosed in the embodiments, since it corresponds to the method disclosed in the embodiments, the description is relatively simple; relevant parts can be referred to in the method section.
[0197] Finally, it should be noted that in this document, relational terms such as "first" and "second" are used only to distinguish one entity or operation from another, and do not necessarily require or imply any such actual relationship or order between these entities or operations. Furthermore, the terms "comprising," "including," or any other variations thereof are intended to cover non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such a process, method, article, or apparatus. Without further limitations, an element defined by the phrase "comprising one..." does not exclude the presence of other identical elements in the process, method, article, or apparatus that includes said element.
[0198] The above provides a detailed description of an underwater mobile node routing method, apparatus, device, and storage medium provided by the present invention. Specific examples have been used to illustrate the principles and implementation methods of the present invention. The descriptions of the above embodiments are only for the purpose of helping to understand the method and core ideas of the present invention. At the same time, for those skilled in the art, there will be changes in the specific implementation methods and application scope based on the ideas of the present invention. Therefore, the content of this specification should not be construed as a limitation of the present invention.
Claims
1. A routing method for underwater mobile nodes, characterized in that, include: Construct a time-varying graph model of the target node, and update the time-varying graph model every first preset time interval; The Q-values of the target node performing different actions in the current state are obtained by calculating the Q-function based on the time-varying graph model. Select the largest Q value from the Q values, and determine the action corresponding to the largest Q value as the target action of the target node; The target action is executed to cause the target node to forward data to the next-hop routing node; The construction of the time-varying graph model for the target node includes: Determine the set of neighboring nodes, the set of links, and the second preset time for the target node; wherein the first preset time is an equal division of the second preset time; Determine the set of connectivity duration and link quality for any link within the second preset time period; Determine the remaining energy set of neighboring nodes; Construct a time-varying graph model of the target node based on the set of neighboring nodes, the set of links, the set of connectivity durations, the set of connectivity durations, and the set of remaining energy: ; Among them, G n The time-varying graph model of the target node n, where N is the set of neighboring nodes, E is the set of links, T is the second preset time, and D... T Let Q be the set of connectivity durations of any of the links within T. T Let En be the set of link quality for any given link within T, and En be the set of remaining energy. The step of calculating the Q-value of the target node in the current state by using the Q-function based on the time-varying graph model includes: Calculate the state transition probability and reward function based on the time-varying graphical model; The state transition probability and the reward function are input into the Q function to obtain the Q value of the target node performing different actions in the current state; The calculation of state transition probabilities based on the time-varying graphical model includes: The link quality of any link in the time-varying graph model is input into the first model to calculate the state transition probability corresponding to that link. The expression of the first model is: ; in, The probability of a successful state transition. The probability of a state transition that fails to transmit is given. The link quality from the target node n to the neighbor node m within time t. The link quality from the neighbor node m to the target node n within time t. Let rcv be the next state of the current state s, where S is the set of states and rcv is the state of successful transmission. The calculation of the reward function based on the time-varying graph model includes: Based on the time-varying graph model, construct the energy function, link quality function, delay function, and identification function; The energy function, the link quality function, the delay function, and the identification function are input into the second model to calculate the reward function. The expression of the second model is: ; in, For the reward function, Let the energy function be... The link quality function is... Let be the delay function. For the identification function, As the first weight, As the second weight, The third weight; The Q function is: ; in, The Q-value for node n to choose action a in state s. Let γ be the reward function, and γ be the discount factor. Let be the state transition probability. For state The maximum Q value among all actions 'a', where S is the state space.
2. The underwater mobile node routing method according to claim 1, characterized in that, The determination of the set of connectivity durations of any link within the second preset time period includes: The link connectivity duration is calculated based on the speed of the target node and the speed of the neighboring nodes in any given link. The sum of the link connectivity duration and the link hover duration is determined as the link duration. The set of connectivity durations within the second preset time period is determined based on the total duration of all the links.
3. The underwater mobile node routing method according to claim 1, characterized in that, The determination of the link quality set for any link at the second preset time includes: The link quality of any of the links at the first preset time is obtained based on the historical packet reception rate, transmission probability, and success probability. The set of link qualities within the second preset time period is determined based on all the link qualities.
4. An underwater mobile node routing device, characterized in that, include: The time-varying graph model module is used to construct a time-varying graph model of the target node and update the time-varying graph model every first preset time interval; The Q-value calculation module is used to calculate the Q-value of the target node performing different actions in the current state based on the time-varying graph model. An action selection module is used to select the maximum Q value from the Q values and determine the action corresponding to the maximum Q value as the target action of the target node. The routing and forwarding module is used to execute the target action so that the target node forwards data to the next-hop routing node; The construction of the time-varying graph model for the target node includes: Determine the set of neighboring nodes, the set of links, and the second preset time for the target node; wherein the first preset time is an equal division of the second preset time; Determine the set of connectivity duration and link quality for any link within the second preset time period; Determine the remaining energy set of neighboring nodes; Construct a time-varying graph model of the target node based on the set of neighboring nodes, the set of links, the set of connectivity durations, the set of connectivity durations, and the set of remaining energy: ; Among them, G n The time-varying graph model of the target node n, where N is the set of neighboring nodes, E is the set of links, T is the second preset time, and D... T Let Q be the set of connectivity durations of any of the links within T. T Let En be the set of link quality for any given link within T, and En be the set of remaining energy. The step of calculating the Q-value of the target node in the current state by using the Q-function based on the time-varying graph model includes: Calculate the state transition probability and reward function based on the time-varying graphical model; The state transition probability and the reward function are input into the Q function to obtain the Q value of the target node performing different actions in the current state; The calculation of state transition probabilities based on the time-varying graphical model includes: The link quality of any link in the time-varying graph model is input into the first model to calculate the state transition probability corresponding to that link. The expression of the first model is: ; in, The probability of a successful state transition. The probability of a state transition that fails to transmit is given. The link quality from the target node n to the neighbor node m within time t. The link quality from the neighbor node m to the target node n within time t. Let rcv be the next state of the current state s, where S is the set of states and rcv is the state of successful transmission. The calculation of the reward function based on the time-varying graph model includes: Based on the time-varying graph model, construct the energy function, link quality function, delay function, and identification function; The energy function, the link quality function, the delay function, and the identification function are input into the second model to calculate the reward function. The expression of the second model is: ; in, For the reward function, Let the energy function be... The link quality function is... Let be the delay function. For the identification function, As the first weight, As the second weight, The third weight; The Q function is: ; in, The Q-value for node n to choose action a in state s. Let γ be the reward function, and γ be the discount factor. Let be the state transition probability. For state The maximum Q value among all actions 'a', where S is the state space.
5. A fault diagnosis device, characterized in that, include: Memory, used to store computer programs; A processor for executing the computer program to implement the underwater mobile node routing method as described in any one of claims 1 to 3.
6. A computer-readable storage medium, characterized in that, The computer-readable storage medium stores computer-executable instructions, which, when executed by a processor, implement the underwater mobile node routing method as described in any one of claims 1 to 3.