A method and apparatus for data flow scheduling in a time-sensitive network, and a medium

By optimizing routing and scheduling in time-sensitive networks using redundant candidate algorithms and reinforcement learning, the problem of balancing low latency and high reliability requirements in the network is solved, thereby optimizing network load and efficiently utilizing resources.

CN116527573BActive Publication Date: 2026-06-23SOUTH CHINA UNIV OF TECH

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
SOUTH CHINA UNIV OF TECH
Filing Date
2023-03-10
Publication Date
2026-06-23

AI Technical Summary

Technical Problem

In existing time-sensitive networks, routing and scheduling problems are difficult to effectively balance with low latency and high reliability requirements. Redundant flows consume network resources and affect the waiting time of network devices.

Method used

Multiple redundant routes are generated using a redundancy candidate algorithm, and routing decisions are made in conjunction with reinforcement learning methods. Time slots are allocated to each TSN data stream through an early scheduling method to optimize network load balancing.

Benefits of technology

It improved network balance by an average of 23.1%, and reduced the load on network bottleneck links.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN116527573B_ABST
    Figure CN116527573B_ABST
Patent Text Reader

Abstract

Disclosed are a data flow scheduling method and device in a time-sensitive network and a medium, wherein the method comprises: obtaining a plurality of TSN data flows; performing serialization processing on the obtained TSN data flows; generating a plurality of redundant routes for each TSN data flow; making a redundant route decision for each TSN data flow by using reinforcement learning; after obtaining the route of the TSN data flow, obtaining a scheduling table by using an early scheduling method, and allocating a time slot for each TSN data flow; determining whether the current TSN data flow is the last TSN data flow to be processed, and if so, transmitting the TSN data flow according to the allocation result; otherwise, returning to the step of making a redundant route decision for each TSN data flow by using reinforcement learning. The application first determines a redundant path candidate set, and then combines reinforcement learning to learn a selection strategy of the multi-route flow from the candidates, the strategy can dynamically understand the network state to reduce the load on the network bottleneck link, and can be widely applied in the technical field of communication flow scheduling.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of communication flow scheduling technology, and in particular to a data flow scheduling method, apparatus and medium in time-sensitive networks. Background Technology

[0002] Traditional Ethernet, in its initial specifications, did not consider real-time communication, only successfully reducing end-to-end operational latency to approximately ten milliseconds. With the rise of 5G networks, an increasing number of applications require ultra-reliable and low-latency (ULL) communication, such as autonomous driving, augmented reality, and industrial system automation applications, which demand end-to-end latency of several milliseconds. Although some real-time Ethernet communication technologies (such as PROFINET, EtherCAT, SERCOS III, etc.) were proposed early on, most were dedicated to industrial systems and performed poorly in terms of compatibility and scalability. Therefore, the IEEE deployed Time-Sensitive Networking (TSN) in the IEEE 802.1 Working Group (WG). Today, TSN has become the most advanced Ethernet standard for real-time communication, designed to meet the requirements of reliability and real-time communication.

[0003] Routing and scheduling issues in TSNs have been extensively discussed in numerous studies, but they remain challenging due to the low latency and high reliability requirements of time-sensitive networks. To enhance the reliability of TSN flows, additional flows are required. This is because if a TSN flow fails to travel along a given path, it impacts a large number of consecutive TSN flows in real-time services, leading to retransmissions and exceeding the total latency. Redundant flows consume more network resources and increase latency in network devices. Therefore, a balance must be struck between latency and reliability requirements in routing and scheduling.

[0004] Most existing works establish mathematical models based on the characteristics of data frames, end-to-end requirements, and physical links. They then address routing and scheduling problems through optimization methods, which can only satisfy flows that successfully arrive under low-latency constraints but do not consider reliability. Some works consider using heuristic methods to address reliability issues. However, when network architecture changes, heuristic methods inevitably require re-searching strategies. Furthermore, an excessively large method space leads to unacceptable runtime. Summary of the Invention

[0005] In order to at least partially solve one of the technical problems existing in the prior art, the present invention aims to provide a data flow scheduling method, apparatus and medium in time-sensitive networks.

[0006] The technical solution adopted in this invention is:

[0007] A data flow scheduling method in time-sensitive networks includes the following steps:

[0008] Retrieve multiple TSN data streams;

[0009] The obtained TSN data stream is serialized.

[0010] A redundancy candidate algorithm is used to generate multiple redundant routes for each TSN data stream;

[0011] The routing and scheduling problem in the TSN network is transformed into an NP-hard problem. Reinforcement learning is used to make redundant routing decisions for each TSN data stream to meet the reliability requirements of time-sensitive networks.

[0012] After obtaining the routes of TSN data streams, the scheduling table is obtained through the early scheduling method, and time slots are allocated for each TSN data stream.

[0013] Determine if the current TSN data stream is the last TSN data stream to be processed. If so, transmit the TSN data stream according to the allocation result; otherwise, return to the step of using reinforcement learning to make redundant routing decisions for each TSN data stream.

[0014] Furthermore, the working principle of the redundant candidate algorithm is as follows:

[0015] Determine the source src n A stream f n Target dst n The algorithm outputs R based on the number of redundant path parameters M. n Candidates;

[0016] When the TSN data stream arrives, the shortest path of the TSN data stream is calculated as the first path in the candidate set;

[0017] All nodes other than the starting point and the ending point are considered as deviation points, and other paths from the deviation points to the ending point are obtained using Dijkstra's algorithm.

[0018] Repeat this process for all deviation points until the required number of redundant paths is met.

[0019] Furthermore, obtaining alternative paths from the deviation point to the destination includes:

[0020] In the process of searching for other paths, there are two constraints on the next hop from the current deviation point:

[0021] 1) Do not select the successor node of the initial shortest path, otherwise the same path will be generated;

[0022] 2) Do not select nodes from the set of preceding nodes, otherwise a loop path will be generated.

[0023] Furthermore, the reinforcement learning works as follows:

[0024] The TSN data stream and the current network state are used as inputs to the reinforcement learning model, and the output of the reinforcement learning model is the route combination action. The reinforcement learning model determines the path combination in the online network environment by observing the TSN data stream and dynamic network resources, and updates the network parameters based on the reward update policy.

[0025] The TSN data stream is input into the reinforcement learning model in batch processing mode, and the behavior of the TSN data stream includes:

[0026] 1) Determine the data size, transmission period, and delay deadline of the TSN data stream, sort the data streams according to their data size, and route and schedule the TSN data streams from largest to smallest;

[0027] 2) For a single path of a unicast stream, select the path and schedule it directly;

[0028] 3) After routing and scheduling the unicast stream, execute the other unicast streams until all unicast streams have been executed.

[0029] Furthermore, the state definition of the reinforcement learning model is as follows:

[0030] The state, as the input to the reinforcement learning model, is defined as:

[0031] s = [F, N]

[0032] The flow-side F includes the source and destination of the flow, the cycle frequency, the amount of data transmitted, the maximum permissible duration reached, and the route candidate set for the TSN flow; the network-side N includes the dynamic load on each network link.

[0033] Furthermore, the action definition of the reinforcement learning model is as follows:

[0034] The action is the output of the reinforcement learning model, which outputs a set of path candidates for routing. The path candidate set includes the shortest path, the longest path, and other paths with overlapping nodes and links.

[0035] Furthermore, the reward of the reinforcement learning model is defined as follows:

[0036] The reward is the feedback that the reinforcement learning model receives from the environment after performing an action. The reward consists of two parts:

[0037] 1) The balance of the current network in each training round can be represented by r1 = -(Umax - Umin);

[0038] 2) The reward r2 is the penalty for the failure rate in each step, which is used to guide the model to quickly explore better strategies in the early training phase.

[0039] Furthermore, the allocation of time slots for each TSN data stream includes:

[0040] A TSN data stream is allocated a time slice directly after each step of a set of routing operations, instead of waiting for all streams to be routed before allocating time slices.

[0041] Another technical solution adopted in this invention is:

[0042] A data flow scheduling device for time-sensitive networks, comprising:

[0043] At least one processor;

[0044] At least one memory for storing at least one program;

[0045] When the at least one program is executed by the at least one processor, the at least one processor implements the method described above.

[0046] Another technical solution adopted in this invention is:

[0047] A computer-readable storage medium storing a processor-executable program, which, when executed by a processor, is used to perform the method described above.

[0048] The beneficial effects of this invention are: first, the invention determines a set of redundant path candidates, and then combines reinforcement learning to learn a multi-route flow selection strategy from the candidates. This strategy can dynamically understand the network status to reduce the load on network bottleneck links. Attached Figure Description

[0049] To more clearly illustrate the technical solutions in the embodiments of the present invention or the prior art, the following description is provided with accompanying drawings of the relevant technical solutions in the embodiments of the present invention or the prior art. It should be understood that the accompanying drawings described below are only for the purpose of clearly illustrating some embodiments of the technical solutions of the present invention. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0050] Figure 1 This is a flowchart of a data flow scheduling method in a time-sensitive network according to an embodiment of the present invention. Detailed Implementation

[0051] The embodiments of the present invention are described in detail below. Examples of these embodiments are shown in the accompanying drawings, wherein the same or similar reference numerals denote the same or similar elements or elements having the same or similar functions throughout. The embodiments described below with reference to the accompanying drawings are exemplary and are only used to explain the present invention, and should not be construed as limiting the present invention. The step numbers in the following embodiments are set only for ease of explanation, and there is no limitation on the order between the steps. The execution order of each step in the embodiments can be adaptively adjusted according to the understanding of those skilled in the art.

[0052] In the description of this invention, it should be understood that the orientation descriptions, such as up, down, front, back, left, right, etc., are based on the orientation or positional relationship shown in the accompanying drawings. They are only for the convenience of describing this invention and simplifying the description, and do not indicate or imply that the device or element referred to must have a specific orientation, or be constructed and operated in a specific orientation. Therefore, they should not be construed as limiting this invention.

[0053] In the description of this invention, "several" means one or more, "more than" means two or more, "greater than," "less than," and "exceeding" are understood to exclude the stated number, while "above," "below," and "within" are understood to include the stated number. The use of "first" and "second" in the description is merely for distinguishing technical features and should not be construed as indicating or implying relative importance, or implicitly indicating the number of indicated technical features, or implicitly indicating the order of the indicated technical features.

[0054] In the description of this invention, unless otherwise explicitly defined, terms such as "set up," "install," and "connect" should be interpreted broadly, and those skilled in the art can reasonably determine the specific meaning of the above terms in this invention in conjunction with the specific content of the technical solution.

[0055] To address existing technical challenges, this invention proposes a novel reinforcement learning (RL)-based method for redundant data flow routing and scheduling, aiming to achieve load balancing on network links while satisfying reliability and latency constraints. The method first utilizes a simple heuristic algorithm to determine a candidate set of redundant paths, then combines state-of-the-art RL to learn a multi-route flow selection strategy from these candidates. This strategy dynamically understands the network state to reduce the load on bottleneck links. Simulation results show that, compared to benchmark algorithms, the proposed solution can improve network load balancing by an average of 23.1%.

[0056] like Figure 1 As shown, this embodiment provides a data flow scheduling method in a time-sensitive network.

[0057] Includes the following steps:

[0058] S1. Obtain multiple TSN data streams.

[0059] Among them, TSN data streams are waiting to be processed in two modes: batch arrival or streaming arrival.

[0060] S2. Serialize the obtained TSN data stream.

[0061] For the arriving data streams, the first step is to determine the route and scheduling order for each stream. Considering the characteristics of the streams, the arriving streams are serialized according to their data size. Larger streams are executed first, followed by smaller streams.

[0062] S3. A redundant candidate algorithm is used to generate multiple redundant routes for each TSN data flow. The redundant candidates will be used to generate the corresponding route set for the sorted flow to ensure reliability.

[0063] S4. The routing and scheduling problem in the TSN network is transformed into an NP-hard problem. Reinforcement learning is used to make redundant routing decisions for each TSN data stream to meet the reliability requirements of time-sensitive networks.

[0064] Routing and scheduling problems in TSN networks are NP-hard and subject to multiple constraints; therefore, this embodiment separates routing and scheduling into optimization problems. In this case, reinforcement learning based on TSN flow routing can obtain a multi-routing strategy that maximizes network load balancing. The RL agent (i.e., the reinforcement learning model) selects multiple appropriate paths for transmission from the path candidate set generated in the previous steps.

[0065] S5. After obtaining the route of the TSN data stream, obtain the scheduling table through the early scheduling method and allocate time slots for each TSN data stream.

[0066] After obtaining the route for a flow, the flow obtains an accurate scheduling table through early scheduling. In this process, the early scheduling method allocates time slots for each flow along all its paths.

[0067] S6. Determine whether the current TSN data stream is the last TSN data stream to be processed.

[0068] If the above routing and scheduling steps are executed repeatedly until all arriving flows are processed, proceed to the next step. Otherwise, return to step S4.

[0069] S7. Data streams transmit data across the network based on the results.

[0070] In this embodiment, the reinforcement learning algorithm can converge quickly because the separate determination of routing and scheduling reduces the learning space of reinforcement learning.

[0071] The following provides a detailed explanation of the steps involved in the above method.

[0072] (1) Redundant candidate generation

[0073] The more routing paths deployed, the higher the reliability of TSN data flow transmission; however, too many paths lead to wasted network bandwidth. Before selecting a suitable routing path for each flow, the primary task is to determine the number of redundant paths and generate a set of candidate paths to choose from. Inspired by the Top-K Shortest Path (KSP) algorithm, we propose a redundancy candidate algorithm. Given a source src n A stream f n Target dst n The algorithm outputs R based on the number of redundant path parameters M. n The candidate path is calculated as the first path in the candidate set when the flow arrives. Then, all nodes except the origin and destination are considered deviation points, and alternative paths from the deviation to the destination are obtained using Dijkstra's algorithm. This process is repeated for all deviation points until the required number of redundant paths is met. During the search for alternative paths, there are two restrictions on the next hop of the current deviation point: 1) Do not select a successor node of the initial shortest path, otherwise the same path will be generated. 2) Do not select a node from the set of preceding nodes, otherwise a circular path will be generated.

[0074] (2) TSN flow routing decision

[0075] This embodiment employs reinforcement learning to optimize multi-path routing. The RL agent takes the input flow and the current network state as input and outputs a route combination action. The agent observes the information flow and dynamic network resources to determine path combinations in the online network environment. After performing this operation, it receives a reward and then continuously updates the policy network parameters.

[0076] 2.1 Mobility Behavior

[0077] Before introducing the states, actions, and rewards of the RL agent, we typically describe the operation of the process. We assume the data stream arrives in batch mode. The stream behaves as follows:

[0078] a) Considering the order of routing and scheduling, it is necessary to determine the criteria for measuring this order. At the initial design stage, we referenced three criteria: the stream data size, the transmission period, and the delay deadline. All three aspects are related to the routing and scheduling results of TSN streams. The design methodology first sorts the streams according to their data size, and then routes and schedules the TSN streams from largest to smallest.

[0079] b) For a single path in a unicast stream, select the path and schedule it directly. Then, perform the same operation on other redundant paths.

[0080] c) After routing and scheduling the unicast stream, execute the other unicast streams until all unicast streams have been executed.

[0081] 2.2 Status Observation

[0082] The state is the input to the RL agent. In this embodiment, the state is defined as s = [F, N], and it consists of two parts. The flow-side F includes the source and destination of the flow, the cycle frequency, the amount of data transmitted, the maximum permissible duration reached, and the route candidate set for the TSN flow. The network-side N includes the dynamic load on each network link. The candidate set in the state is the result of Algorithm 1. Therefore, the state consists of all specific redundant path information for each flow. Sufficient information on the flow side helps the RL agent make decisions. As traffic continues to flow in the network, the agent also observes the current load on each link, i.e., the use of time slices, which helps the agent adjust its decisions at each step.

[0083] 2.3 Actions

[0084] Actions are the output of the RL agent, determining a set of paths to route. The path candidate set can include shortest, longest, and other paths with overlapping nodes and links. In the short run, paths with fewer hops are chosen with higher probability to enable transmission of each flow as quickly as possible. The agent considers balancing the overall network resource load after performing actions on each flow in the long run. In existing technologies, the action space is the neighboring nodes of the current node, and the action is to select the next from the qualified neighbors. If our work also takes the same approach, the agent not only determines which set of paths but also the specific nodes on each path. The action space grows exponentially with increasing traffic. For the Traffic Engineering (TE) problem, the action will be designed to divert traffic to different paths. Therefore, the action space is redefined as a = Rn. The action is redesigned to select some paths as combinations, and TSN flows are transmitted synchronously on these path combinations. The action design in this embodiment allows the RL agent to focus on path selection.

[0085] 2.3 Rewards

[0086] The reward is the feedback the RL agent receives from the environment after performing an action. This embodiment designs a reward to guide the RL agent to make optimal decisions. The reward consists of two parts: 1) the current network balance in each set can be represented by r1 = -(Umax - Umin); 2) the reward r2 is the penalty for the failure rate in each step, which guides the agent to quickly explore better strategies in the early training phase. Overall, the reward is defined as r = r1 + r2, and the total reward guides the agent to learn the optimal strategy.

[0087] (3) Dispatch as early as possible

[0088] After a flow undergoes a set of routing operations at each step, our job is directly scheduled—that is, time slices are allocated—rather than waiting for all flows to be routed before allocation. The priority of path scheduling should be considered before scheduling multiple paths for a flow. Each flow path has a different length. Considering the effects of long and short paths, long paths are scheduled first for each flow, followed by short paths, to reuse time slots on the path as much as possible and reduce network resource waste. We apply an advance allocation policy: if there are enough free time slices in the currently available time zone, the earliest time slice is allocated immediately; otherwise, the allocation of time slices will be postponed in the available time zone. It is important to note that delaying time slices may save network resources because the same flow on different paths transmits the same data. Here we use frame cancellation and frame duplication from IEEE 802.1CB FRER. Frame cancellation avoids side effects of circuitry, prevents data frames from getting stuck in endless loops, and improves routing flexibility. Therefore, time slices on short paths can be delayed to wait for long paths to be allocated together. This is also why we do not deploy completely disjoint paths in redundancy candidate generation to achieve load balancing.

[0089] This embodiment also provides a data flow scheduling device in a time-sensitive network, including:

[0090] At least one processor;

[0091] At least one memory for storing at least one program;

[0092] When the at least one program is executed by the at least one processor, the at least one processor implements Figure 1 The method shown.

[0093] This embodiment of a data flow scheduling device in a time-sensitive network can execute a data flow scheduling method in a time-sensitive network provided in the method embodiment of the present invention. It can execute any combination of implementation steps of the method embodiment and has the corresponding functions and beneficial effects of the method.

[0094] This application also discloses a computer program product or computer program, which includes computer instructions stored in a computer-readable storage medium. A processor of a computer device can read the computer instructions from the computer-readable storage medium and execute the computer instructions, causing the computer device to perform... Figure 1 The method shown.

[0095] This embodiment also provides a storage medium storing instructions or programs that can execute the data flow scheduling method in a time-sensitive network provided by the method embodiment of the present invention. When the instructions or programs are run, any combination of implementation steps of the method embodiment can be executed, and the method has the corresponding functions and beneficial effects.

[0096] In some alternative embodiments, the functions / operations mentioned in the block diagrams may not occur in the order shown in the operation diagrams. For example, depending on the functions / operations involved, two consecutively shown blocks may actually be executed substantially simultaneously, or the blocks may sometimes be executed in reverse order. Furthermore, the embodiments presented and described in the flowcharts of this invention are provided by way of example to provide a more comprehensive understanding of the technology. The disclosed methods are not limited to the operations and logic flows presented herein. Alternative embodiments are contemplated in which the order of various operations is altered and sub-operations described as part of a larger operation are executed independently.

[0097] Furthermore, although the invention has been described in the context of functional modules, it should be understood that, unless otherwise stated, one or more of the described functions and / or features may be integrated into a single physical device and / or software module, or one or more functions and / or features may be implemented in a separate physical device or software module. It is also understood that a detailed discussion of the actual implementation of each module is unnecessary for understanding the invention. Rather, given the properties, functions, and internal relationships of the various functional modules in the apparatus disclosed herein, the actual implementation of the module will be understood within the scope of conventional skill of an engineer. Therefore, those skilled in the art can implement the invention as set forth in the claims using ordinary techniques without excessive experimentation. It is also understood that the specific concepts disclosed are merely illustrative and not intended to limit the scope of the invention, which is determined by the full scope of the appended claims and their equivalents.

[0098] If the aforementioned functions are implemented as software functional units and sold or used as independent products, they can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of this invention, or the part that contributes to the prior art, or a part of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of this invention. The aforementioned storage medium includes various media capable of storing program code, such as USB flash drives, portable hard drives, read-only memory (ROM), random access memory (RAM), magnetic disks, or optical disks.

[0099] The logic and / or steps represented in the flowchart or otherwise described herein, for example, can be considered as a sequenced list of executable instructions for implementing logical functions, and can be embodied in any computer-readable medium for use by, or in conjunction with, an instruction execution system, apparatus, or device (such as a computer-based system, a processor-included system, or other system that can fetch and execute instructions from, an instruction execution system, apparatus, or device). For the purposes of this specification, "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transmit programs for use by, or in conjunction with, an instruction execution system, apparatus, or device.

[0100] More specific examples of computer-readable media (a non-exhaustive list) include: electrical connections (electronic devices) having one or more wires, portable computer disk drives (magnetic devices), random access memory (RAM), read-only memory (ROM), erasable and editable read-only memory (EPROM or flash memory), fiber optic devices, and portable optical disc read-only memory (CDROM). Furthermore, computer-readable media can even be paper or other suitable media on which the program can be printed, since the program can be obtained electronically, for example, by optically scanning the paper or other medium, followed by editing, interpreting, or otherwise processing as necessary, and then stored in computer memory.

[0101] It should be understood that various parts of the present invention can be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, multiple steps or methods can be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, it can be implemented using any one or a combination of the following techniques known in the art: discrete logic circuits having logic gates for implementing logical functions on data signals, application-specific integrated circuits (ASICs) having suitable combinational logic gates, programmable gate arrays (PGAs), field-programmable gate arrays (FPGAs), etc.

[0102] In the foregoing description of this specification, references to terms such as "one embodiment," "another embodiment," or "some embodiments" indicate that a specific feature, structure, material, or characteristic described in connection with an embodiment or example is included in at least one embodiment or example of the present invention. In this specification, illustrative expressions of the above terms do not necessarily refer to the same embodiment or example. Furthermore, the specific features, structures, materials, or characteristics described may be combined in any suitable manner in one or more embodiments or examples.

[0103] Although embodiments of the invention have been shown and described, those skilled in the art will understand that various changes, modifications, substitutions and alterations can be made to these embodiments without departing from the principles and spirit of the invention, the scope of which is defined by the claims and their equivalents.

[0104] The above is a detailed description of the preferred embodiments of the present invention. However, the present invention is not limited to the above embodiments. Those skilled in the art can make various equivalent modifications or substitutions without departing from the spirit of the present invention. All such equivalent modifications or substitutions are included within the scope defined by the claims of this application.

Claims

1. A data flow scheduling method in a time-sensitive network, characterized in that, Includes the following steps: Retrieve multiple TSN data streams; The obtained TSN data stream is serialized. A redundancy candidate algorithm is used to generate multiple redundant routes for each TSN data stream; The routing and scheduling problem in the TSN network is transformed into an NP-hard problem. Reinforcement learning is used to make redundant routing decisions for each TSN data stream to meet the reliability requirements of time-sensitive networks. After obtaining the routes of TSN data streams, the scheduling table is obtained through the early scheduling method, and time slots are allocated for each TSN data stream. Determine if the current TSN data stream is the last TSN data stream to be processed. If so, transmit the TSN data stream according to the allocation result; otherwise, return to the step of using reinforcement learning to make redundant routing decisions for each TSN data stream. The reinforcement learning mechanism works as follows: The TSN data stream and the current network state are used as inputs to the reinforcement learning model, and the output of the reinforcement learning model is the route combination action. The reinforcement learning model determines the path combination in the online network environment by observing the TSN data stream and dynamic network resources, and updates the network parameters based on the reward update policy. The TSN data stream is input into the reinforcement learning model in batch processing mode, and the behavior of the TSN data stream includes: 1) Determine the data size, transmission period, and delay deadline of the TSN data stream, sort the data streams according to their data size, and route and schedule the TSN data streams from largest to smallest; 2) For a single path of a unicast stream, select the path and schedule it directly; 3) After routing and scheduling the unicast stream, execute the other unicast streams until all unicast streams have been executed; The action definition of the reinforcement learning model is as follows: The action is the output of the reinforcement learning model, which outputs a path candidate set for routing. The path candidate set includes the shortest path, the longest path, and other paths with overlapping nodes and links.

2. The data flow scheduling method in a time-sensitive network according to claim 1, characterized in that, The redundant candidate algorithm works as follows: Determine the source A stream goal And the number of redundant path parameters M, the algorithm output Candidates; When the TSN data stream arrives, the shortest path of the TSN data stream is calculated as the first path in the candidate set; All nodes other than the starting point and the ending point are considered as deviation points, and other paths from the deviation points to the ending point are obtained using Dijkstra's algorithm. Repeat this process for all deviation points until the required number of redundant paths is met.

3. The data flow scheduling method in a time-sensitive network according to claim 2, characterized in that, The process of obtaining alternative paths from the deviation point to the destination includes: In the process of searching for other paths, there are two constraints on the next hop from the current deviation point: 1) Do not select the successor node of the initial shortest path, otherwise the same path will be generated; 2) Do not select nodes from the set of preceding nodes, otherwise a loop path will be generated.

4. The data flow scheduling method in a time-sensitive network according to claim 1, characterized in that, The state definition of the reinforcement learning model is as follows: The state, as the input to the reinforcement learning model, is defined as: The flow-side F includes the source and destination of the flow, the cycle frequency, the amount of data transmitted, the maximum permissible duration reached, and the route candidate set for the TSN flow; the network-side N includes the dynamic load on each network link.

5. The data stream scheduling method in a time-sensitive network according to claim 1, characterized in that, The reward definition for the reinforcement learning model is as follows: The reward is the feedback that the reinforcement learning model receives from the environment after performing an action. The reward consists of two parts: 1) The current network balance level in each training round is used as... express; 2) The reward r2 is the penalty for the failure rate in each step, which is used to guide the model to quickly explore better strategies during the training phase.

6. The data flow scheduling method in a time-sensitive network according to claim 1, characterized in that, The allocation of time slots for each TSN data stream includes: A TSN data stream is allocated a time slice directly after each step of a set of routing operations.

7. A data stream scheduling device for time-sensitive networks, characterized in that, include: At least one processor; At least one memory for storing at least one program; When the at least one program is executed by the at least one processor, the at least one processor implements the method according to any one of claims 1-6.

8. A computer-readable storage medium storing a processor-executable program, characterized in that, The processor-executable program, when executed by the processor, is used to perform the method as described in any one of claims 1-6.