A routing method suitable for urban rail self-organizing network

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
By employing differentiated route discovery, static route configuration, and dynamic route selection methods in urban rail ad hoc networks, combined with reinforcement learning algorithms, the problem of route selection under high-speed train operation was solved, achieving low-latency and high-reliability data transmission and meeting the high real-time requirements of urban rail transit.

CN117041912BActive Publication Date: 2026-06-19BEIJING JIAOTONG UNIV

View PDF 2 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Patents(China)
Current Assignee / Owner: BEIJING JIAOTONG UNIV
Filing Date: 2023-07-20
Publication Date: 2026-06-19

Application Information

Patent Timeline

20 Jul 2023

Application

19 Jun 2026

Publication

CN117041912B

IPC: H04W4/42; H04W40/02; H04W40/20; H04W40/12; H04W40/22; G06N20/00; H04W84/18

AI Tagging

Application Domain

Particular environment based services Network topologies

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

A sports fitness training cloud data collection system and method
CN122209042ATelemetry/telecontrol selection arrangements Gymnastic exercising
Vehicle gateway with power management system
WO2025128612A8Registering/indicating working of vehicles Particular environment based services
Systems and Methods for Autonomous Vehicle Communication
US20260164219A1Particular environment based servicesFree-space transmission
Method for establishing a ship monitoring network based on magnetic-field communication
US20260164273A1Near-field transmissionParticular environment based services
A magnetic field triggered low power wireless current detection system
CN122193662AProgramme control Computer control

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

Technical Problem

Existing routing protocols have drawbacks in train autonomous operation systems, such as high latency, unsuitability for high-speed mobile scenarios and large-scale networks, leading to rapid changes in topology, wasted bandwidth resources, and transmission reliability issues.

Method used

By differentiating between route discovery, static route configuration, and dynamic route selection, and combining reinforcement learning algorithms, the path selection is optimized by estimating link lifetime and communication latency, thereby achieving low-latency and high-reliability transmission.

Benefits of technology

Stable path selection and low-latency data transmission were achieved under high-speed train operation, meeting the high real-time requirements of urban rail transit and improving transmission reliability and bandwidth utilization efficiency.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN117041912B_ABST

Patent Text Reader

Abstract

This invention provides a routing method suitable for urban rail ad hoc networks, including step S1, route discovery; step S2, static route configuration; and step S3, dynamic route selection. It can achieve the following: 1. Under high-speed train operation, due to the train's dynamic decision-making, the route can be changed in real time to select the optimal path for information transmission. 2. This invention achieves low-latency data transmission between trains. 3. When a train selects the first trackside relay node, it will select multiple nodes simultaneously, and each trackside access point has a route to the destination node, thereby achieving multi-path concurrent transmission and effectively ensuring reliability; in addition, the cluster head periodically monitors the status of nodes within the cluster, effectively avoiding transmission failures caused by abnormal node status.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of rail transit technology, and more specifically to a routing method applicable to urban rail self-organizing networks. Background Technology

[0002] Currently, Train Autonomous Circumambulate System (TACS) based on vehicle-to-vehicle communication has become an important development direction for the next generation of urban rail transit. Unlike the traditional vehicle control method centered on ground equipment, TACS uses intelligent onboard controllers as its core to realize autonomous and safe operation control of trains, further improving operational efficiency and reliability. At the same time, it also puts forward high real-time, high bandwidth, and high reliability communication requirements for vehicle-to-vehicle and vehicle-to-ground wireless communication networks.

[0003] In recent years, ad hoc networks have rapidly developed as a new type of network communication technology. As a decentralized network, they have been widely applied in fields such as vehicle-to-everything (V2X) and satellite communications. The functionality of routing protocols is primarily implemented at the network layer; any network requiring data transmission needs routing support. In urban rail ad hoc networks, due to the high mobility of trains and issues such as attenuation and interference in wireless channels, the routing problem is far more complex than in networks with infrastructure.

[0004] Existing routing methods can be divided into proactive routing protocols and on-demand routing protocols based on the timing of route acquisition. However, these two protocols have the following drawbacks: (1) Large latency. In on-demand routing protocols, when the source node has a communication need, if there is no route to the destination node, a route search and creation process is required, and data transmission will be delayed due to route creation. (2) Not suitable for high-speed mobile scenarios. Existing routing protocols are used in urban rail transit. Due to the high-speed movement of trains, the topology changes rapidly. This requires frequent flooding to maintain the real-time nature of the routes, which will cause a large amount of overhead information and waste bandwidth resources. (3) Not suitable for large-scale networks. Due to the large routing overhead information, when the network scale reaches a certain level, the control information for route maintenance will occupy a large amount of bandwidth resources, which is not conducive to the expansion of the network service area. Summary of the Invention

[0005] The present invention aims to provide a routing method suitable for urban rail self-organizing networks to solve the above problems.

[0006] The technical solution of this invention is: a routing method suitable for urban rail self-organizing networks, comprising...

[0007] Step S1, route discovery;

[0008] Step S2, Static route configuration;

[0009] Step S3, dynamic route selection.

[0010] Preferably, step S1 specifically includes:

[0011] S11, distinguishing between two routing discovery scenarios:

[0012] Scenario 1: If a node detects activity from its neighbor in the previous communication cycle, it considers the neighbor to be in a normal state.

[0013] Scenario 2: If a node does not detect any activity from its neighbors in the previous cycle, it generates an ASK packet to inquire about the neighbor's status. If it receives a Re packet, it considers the neighbor's status to be normal; otherwise, it marks it as abnormal.

[0014] S12, after completing route discovery, each trackside relay node reports its own information and neighbor status information to the cluster head node.

[0015] Preferably, step S2 specifically includes:

[0016] S21. After collecting the information reported by all trackside relay nodes, the cluster head node performs static route calculation and generates a global routing table.

[0017] S22, the cluster head node sends the global routing table to the nearest trackside relay node in the uplink and downlink directions. After receiving it, the trackside relay node transmits it downwards in order to ensure that all trackside relay nodes are notified.

[0018] S23, after receiving the global routing table, the trackside relay node reorganizes the routing entries containing its own nodes to generate a local routing table.

[0019] Preferably, the calculation method for static routes in S21 specifically includes:

[0020] S211 specifies that the nodes selected in a static path do not intersect;

[0021] S212, Before sending information to the destination train, a positional constraint is applied to the distance of the next hop node to make it closer to the destination train; the positional constraint is shown in the following formula:

[0022]

[0023] in, and x represents the position of the next hop node and the position of the current node, respectively. ch L represents the position of the cluster head node. D Indicates the location of the destination train;

[0024] S213 specifies the transmission order of nodes between adjacent profiles, and defines the interference constraints of trackside relay nodes on two adjacent profiles as follows:

[0025]

[0026] Among them, Π(v xi v (x+1)j ) = 1 indicates that the forwarding activities of the previous profile have been completed; Πa(v (x-1)i v xj ) = 0, indicating that the node in the current profile can be forwarded to the next hop. V x This represents the set of trackside relay nodes on the profile with horizontal coordinate x.

[0027] S214, constrain the transmission priority on the same profile, the priority constraints are as follows:

[0028]

[0029] in, p represents the transmission priority order of nodes on the profile at horizontal coordinate x. n This indicates the sending order of the nth trackside node.

[0030] Preferably, step S3 specifically includes:

[0031] S31, Before the train arrives, the cluster head node sends the configured static path set to the train in advance through cross-cluster communication;

[0032] S32, the train dynamically selects the complete path based on its own status information and static path set.

[0033] Preferably, the dynamic selection of the complete path in step S32 specifically includes:

[0034] S321, Calculate link lifetime:

[0035]

[0036] Where x represents the horizontal distance from the train to the trackside relay node, h represents the height of the trackside relay node above the ground, and R represents the communication radius of the node; the minimum received power to meet the communication requirements is set to P. r If the received power is less than P r If the link fails, it is considered dead; the train's speed is set to v.

[0037] S322, Calculate the average multipath delay: First, calculate the signal-to-noise ratio (SNR), which is the ratio of the signal received by the receiver to the noise. Then, calculate the channel capacity according to Shannon's formula, as shown in formula (5). Next, calculate the single-hop delay, which mainly consists of processing delay and transmission delay, as shown in formula (6). Based on the single-hop delay, calculate the single-path delay as shown in formula (7). Finally, obtain the average multipath delay, as shown in formula (8).

[0038]

[0039]

[0040]

[0041]

[0042] In formula (5), R t For channel capacity;

[0043] P represents the received signal power. s d represents the transmit power, and d represents the distance between the transmitter and receiver; K is a constant value related to transmit gain and receive gain.

[0044] The interference power is the sum of interference caused by the communication of other nodes within the interference range of the data transmitting node. fSet represents the set of nodes within the interference range of the data transmitting node, and TSet represents the set of nodes transmitting simultaneously at time t. Let d represent the transmission power of the i-th interfering node. i Indicates the distance to the interfering node;

[0045] N represents ambient noise;

[0046] When communicating between nodes, a lower signal-to-noise ratio is better;

[0047] In formula (6), This refers to the single-hop delay between nodes in the presence of interference. L represents the node's processing latency; L represents the length of the data packet.

[0048] In formula (7), This refers to single-path latency;

[0049] In formula (8), T avr The average delay for multipath transmission; pathset represents the set of paths; N represents the number of paths;

[0050] S323 uses reinforcement learning to dynamically select routes: q-learning reinforcement learning is adopted, and the train, as the agent, learns to find the optimal policy π* for each state, thereby maximizing the total discounted expected return.

[0051] Preferably, the reinforcement learning method in step S323 specifically includes:

[0052] (1) Model the routing problem as a triplet<S,A,R> The train's action A is to select a trackside relay node within its transmission range to send data packets; the state space S includes the train's position, speed, and static path set; the reward R consists of the link lifetime and multipath average delay, as shown in the following formula:

[0053] R t =ω1T avr +ω2LT (9)

[0054] Among them, T avr The multipath average delay is given by formula (8), and LT represents the link lifetime, which is calculated as shown in formula (4).

[0055] (2) For each state s and action a, initialize Q(s,a) to 0 and generate a Q table, which stores the Q value corresponding to the action taken in each state.

[0056] (3) The train selects the next hop node to send according to the Q table.

[0057] (4) The train receives a reward R t And observe a new state s′.

[0058] (5) Update the Q table using the Q value update formula (10).

[0059]

[0060] Wherein, (1-α)Q(s) t a t α(R) represents the current Q value. t ) indicates in s t Take a t The rewards obtained from the action. (Final Part) This represents the maximum reward that can be obtained from state s′.

[0061] (6) Set the new state s′ as the current state.

[0062] (7) Return to step (2).

[0063] The beneficial effects of this invention are as follows:

[0064] 1. This invention solves the routing problem under high-speed train operation. By incorporating train position and speed information into the routing decision, this invention predicts link lifetime and communication latency in advance, enabling the train to choose a more stable path for information transmission. Therefore, under high-speed train operation, due to the train's dynamic decision-making, the path can be changed in real time, selecting the optimal path to complete information transmission.

[0065] 2. This invention achieves low-latency data transmission between vehicles. When optimizing latency, this invention considers the interference caused by concurrent transmission along multiple paths and utilizes reinforcement learning to select the path combination with the minimum latency, thus meeting the latency requirements of urban rail transit.

[0066] 3. This invention solves the problem of transmission reliability. In this invention, when the train selects the first trackside relay node, it selects multiple nodes simultaneously, and each trackside access point has a route to the destination node, thereby realizing multi-path concurrent transmission and effectively ensuring reliability. In addition, the cluster head periodically monitors the status of nodes within the cluster, effectively avoiding transmission failures caused by abnormal node status. Attached Figure Description

[0067] Figure 1 A schematic diagram of a routing method applicable to urban rail self-organizing networks provided by an embodiment of the present invention;

[0068] Figure 2 This is a schematic diagram illustrating route discovery in a routing method applicable to urban rail ad hoc networks provided in an embodiment of the present invention.

[0069] Figure 3 This is a schematic diagram of the static route configuration process in a routing method applicable to urban rail self-organizing networks provided in an embodiment of the present invention;

[0070] Figure 4 This is a schematic diagram of the static route calculation steps in a routing method applicable to urban rail self-organizing networks provided in an embodiment of the present invention;

[0071] Figure 5 This is a schematic diagram of the dynamic routing selection process in a routing method applicable to urban rail self-organizing networks provided in an embodiment of the present invention;

[0072] Figure 6 This is a schematic diagram of the dynamic routing calculation steps in a routing method applicable to urban rail self-organizing networks provided in an embodiment of the present invention;

[0073] Figure 7 This is a schematic diagram illustrating the steps of a reinforcement learning method in a routing method applicable to urban rail ad hoc networks, provided by an embodiment of the present invention. Detailed Implementation

[0074] The present invention will be further described below with reference to the accompanying drawings and specific embodiments, so that those skilled in the art can better understand and implement the present invention. The embodiments of the present invention are not limited thereto.

[0075] Example 1

[0076] like Figure 1 As shown, a routing method suitable for urban rail ad hoc networks includes...

[0077] Step S1, route discovery;

[0078] Step S2, Static route configuration;

[0079] Step S3, dynamic route selection.

[0080] like Figure 2 As shown, step S1 specifically includes:

[0081] S11, distinguishing between two routing discovery scenarios:

[0082] Scenario 1: If a node detects activity from its neighbor in the previous communication cycle, it considers the neighbor to be in a normal state.

[0083] Scenario 2: If a node does not detect any activity from its neighbors in the previous cycle, it generates an ASK packet to inquire about the neighbor's status. If it receives a Re packet, it considers the neighbor's status to be normal; otherwise, it marks it as abnormal.

[0084] S12, after completing route discovery, each trackside relay node reports its own information and neighbor status information to the cluster head node.

[0085] like Figure 3 As shown, step S2 specifically includes:

[0086] S21. After collecting the information reported by all trackside relay nodes, the cluster head node performs static route calculation and generates a global routing table.

[0087] S22, the cluster head node sends the global routing table to the nearest trackside relay node in the uplink and downlink directions. After receiving it, the trackside relay node transmits it downwards in order to ensure that all trackside relay nodes are notified.

[0088] S23, after receiving the global routing table, the trackside relay node reorganizes the routing entries containing its own nodes to generate a local routing table.

[0089] like Figure 4 As shown, the calculation method for static routes in S21 specifically includes:

[0090] S211 specifies that nodes in a static path must not intersect; node intersections can lead to buffer queuing, resulting in additional queuing delays. Furthermore, multiple nodes simultaneously sending data to a single node can cause information collisions and packet loss. Therefore, to meet the requirements of low latency and high reliability in inter-train communication, we stipulate that nodes in paths must not intersect.

[0091] S212, Before sending information to the destination train, a positional constraint is applied to the distance of the next hop node to make it closer to the destination train; the positional constraint is shown in the following formula:

[0092]

[0093] in, and x represents the position of the next hop node and the position of the current node, respectively. ch L represents the position of the cluster head node. D Indicates the location of the destination train;

[0094] S213 specifies the transmission order of nodes between adjacent profiles, and defines the interference constraints of trackside relay nodes on two adjacent profiles as follows:

[0095]

[0096] Among them, Π(v xi v (x+1)j ) = 1 indicates that the forwarding activities of the previous profile have been completed; Πa(v (x-1)i v xj ) = 0, indicating that the node in the current profile can be forwarded to the next hop. V x This represents the set of trackside relay nodes on the profile with horizontal coordinate x.

[0097] Since the nodes are arranged in a ring along the tunnel wall, the section in S213 refers to the tunnel section where the nodes are arranged.

[0098] S214, constrain the transmission priority on the same profile, the priority constraints are as follows:

[0099]

[0100] in, p represents the transmission priority order of nodes on the profile at horizontal coordinate x. n This indicates the sending order of the nth trackside node.

[0101] like Figure 5 As shown, step S3 specifically includes:

[0102] S31, Before the train arrives, the cluster head node sends the configured static path set to the train in advance through cross-cluster communication;

[0103] S32, the train dynamically selects the complete path based on its own status information and static path set.

[0104] like Figure 6 As shown, the dynamic selection of the complete path in step S32 specifically includes:

[0105] S321, Calculate link lifetime:

[0106]

[0107] Where x represents the horizontal distance from the train to the trackside relay node, h represents the height of the trackside relay node above the ground, and R represents the communication radius of the node; the minimum received power to meet the communication requirements is set to P. r If the received power is less than P r If the link fails, it is considered dead; the train's speed is set to v.

[0108] S322, Calculate the average multipath delay: First, calculate the signal-to-noise ratio (SNR), which is the ratio of the signal received by the receiver to the noise. Then, calculate the channel capacity according to Shannon's formula, as shown in formula (5). Next, calculate the single-hop delay, which mainly consists of processing delay and transmission delay, as shown in formula (6). Based on the single-hop delay, calculate the single-path delay as shown in formula (7). Finally, obtain the average multipath delay, as shown in formula (8).

[0109]

[0110]

[0111]

[0112]

[0113] In formula (5), R t For channel capacity;

[0114] P represents the received signal power. s d represents the transmit power, and d represents the distance between the transmitter and receiver; K is a constant value related to transmit gain and receive gain.

[0115] The interference power is the sum of interference caused by the communication of other nodes within the interference range of the data transmitting node. fSet represents the set of nodes within the interference range of the data transmitting node, and TSet represents the set of nodes transmitting simultaneously at time t. Let d represent the transmission power of the i-th interfering node. i Indicates the distance to the interfering node;

[0116] N represents ambient noise;

[0117] When communicating between nodes, a lower signal-to-noise ratio is better;

[0118] In formula (6), This refers to the single-hop delay between nodes in the presence of interference. L represents the node's processing latency; L represents the length of the data packet.

[0119] In formula (7), This refers to single-path latency;

[0120] In formula (8), T avr The average delay for multipath transmission; pathset represents the set of paths; N represents the number of paths;

[0121] S323 uses reinforcement learning to dynamically select routes: q-learning reinforcement learning is adopted, and the train, as the agent, learns to find the optimal policy π* for each state, thereby maximizing the total discounted expected return.

[0122] like Figure 7 As shown, the reinforcement learning method in step S323 specifically includes:

[0123] (1) Model the routing problem as a triplet<S,A,R> The train's action A is to select a trackside relay node within its transmission range to send data packets; the state space S includes the train's position, speed, and static path set; the reward R consists of the link lifetime and multipath average delay, as shown in the following formula:

[0124] R t =ω1T avr +ω2LT (9)

[0125] Among them, T avr The multipath average delay is given by formula (8), and LT represents the link lifetime, which is calculated as shown in formula (4).

[0126] (2) For each state s and action a, initialize Q(s,a) to 0 and generate a Q table, which stores the Q value corresponding to the action taken in each state.

[0127] (3) The train selects the next hop node to send according to the Q table.

[0128] (4) The train receives a reward R t And observe a new state s′.

[0129] (5) Update the Q table using the Q value update formula (10).

[0130]

[0131] Wherein, (1-α)Q(s) t a t α(R) represents the current Q value. t ) indicates in s t Take a t The rewards obtained from the action. (Final Part) This represents the maximum reward that can be obtained from state s′.

[0132] (6) Set the new state s′ as the current state.

[0133] (7) Return to step (2).

[0134] Those skilled in the art will understand that the accompanying drawings are merely schematic diagrams of one embodiment, and the processes depicted in the drawings are not necessarily essential for implementing the present invention.

Claims

1. A routing method suitable for urban rail self-organizing networks, characterized in that, include Step S1, route discovery; Step S2, Static route configuration; Step S3, dynamic route selection; Specifically, step S3 includes: S31, Before the train arrives, the cluster head node sends the configured static path set to the train in advance through cross-cluster communication; S32, the train dynamically selects the complete path based on its own status information and static path set; The dynamic selection of the complete path in step S32 specifically includes: S321, Calculate link lifetime: (4) in, This represents the horizontal distance from the train to the trackside relay node. This indicates the height of the trackside relay node above the ground; the minimum received power to meet communication requirements is set as follows. If the received power is less than If the link is dead, then the train speed is set to [missing information]. ; S322, Calculate the average multipath delay: First, calculate the signal-to-noise ratio (SNR), which is the ratio of the signal received by the receiver to the noise; then calculate the channel capacity according to Shannon's formula, as shown in formula (5); then calculate the single-hop delay, which is mainly composed of processing delay and transmission delay, as shown in formula (6); based on the single-hop delay, calculate the single-path delay as shown in formula (7); finally, obtain the average multipath delay, as shown in formula (8). (5) (6) (7) (8) In formula (5), For channel capacity; Indicates the received signal power. Indicates the transmission power. This represents the distance between the sender and receiver; K is a constant value related to the transmit gain and receive gain. This represents the interference power, which is the sum of interference caused by the communication of other nodes within the interference range of the data transmitting node. This represents the set of nodes within the interference range of a node. express A set of nodes transmitting simultaneously at any given time. Indicates the first The transmission power of each interfering node Indicates the distance to the interfering node; represents ambient noise; When communicating between nodes, a lower signal-to-noise ratio is better; In formula (6), This refers to the single-hop delay between nodes in the presence of interference. L represents the node's processing latency; L represents the length of the data packet. In equation (7), is the single-path delay; In equation (8), is the average delay of multipath transmission; denotes a set of paths; N denotes the number of paths; S323, employing reinforcement learning to dynamically select routes: using q-learning reinforcement learning, the train, as the agent, learns by finding the optimal policy for each state. This maximizes the total discounted expected return; including: (1) Model the routing problem as a triplet<S,A,R> The train's action A is to select a trackside relay node within its transmission range to send data packets; the state space S includes the train's position, speed, and static path set; the reward R consists of the link lifetime and multipath average delay, as shown in the following formula: (9) in, The average multipath delay is given by formula (8). The link lifetime is represented by the formula shown in (4).

2. The routing method for urban rail self-organizing network according to claim 1, characterized in that, Step S1 specifically includes: S11, distinguishing between two routing discovery scenarios: Scenario 1: If a node detects activity from its neighbor in the previous communication cycle, it considers the neighbor to be in a normal state. Scenario 2: If a node does not detect any activity from its neighbors in the previous cycle, it generates an ASK packet to inquire about the neighbor's status. If it receives a Re packet, it considers the neighbor's status to be normal; otherwise, it marks it as abnormal. S12, after completing route discovery, each trackside relay node reports its own information and neighbor status information to the cluster head node.

3. The routing method applicable to urban rail ad hoc networks according to claim 1, characterized in that, Step S2 specifically includes: S21. After collecting the information reported by all trackside relay nodes, the cluster head node performs static route calculation and generates a global routing table. S22, the cluster head node sends the global routing table to the nearest trackside relay node in the uplink and downlink directions. After receiving it, the trackside relay node transmits it downwards in order to ensure that all trackside relay nodes are notified. S23, after receiving the global routing table, the trackside relay node reorganizes the routing entries containing its own nodes to generate a local routing table.

4. The routing method for urban rail self-organizing network according to claim 3, characterized in that, The calculation method for static routes in S21 specifically includes: S211 specifies that the nodes selected in a static path do not intersect; S212, Before sending information to the destination train, a positional constraint is applied to the distance of the next hop node to make it closer to the destination train; the positional constraint is shown in the following formula: (1) wherein, and respectively represent the position of the next hop node and the position of the current node, represents the position of the cluster head node, represents the position of the destination train; S213 specifies the transmission order of nodes between adjacent profiles, and defines the interference constraints of trackside relay nodes on two adjacent profiles as follows: (2) in, This indicates that the forwarding activities of the previous profile have been completed; This indicates that the node in the current profile can be forwarded to the next hop. The horizontal coordinate is The set of trackside relay nodes on the cross section; S214, constrain the transmission priority on the same profile, the priority constraints are as follows: (3) in, This represents the transmission priority order of nodes on the profile at horizontal coordinate x. This indicates the sending order of the nth trackside node.

5. A routing method suitable for urban rail self-organizing networks according to claim 1, characterized in that, The reinforcement learning methods in step S323 specifically include: (2) For each state s and action a, initialize Q(s,a) to 0 and generate a Q table, which stores the Q value corresponding to the action taken in each state; (3) The train selects the next hop node to send according to the Q table; (4) the train gets a reward and observes a new state ; (5) Update the Q table using the Q-value update formula (10); (11) in, This represents the current Q value; Indicates in take The rewards obtained from the action; the final part Indicates from state The maximum reward that can be obtained in the game; (6) setting the new state as the current state; (7) Return to step (2).