Comparative imitation learning scheduling method for parallel flow coordinated transmission in industrial time-sensitive network
By comparing and imitating the learning scheduling method, analyzing the parallel traffic transmission mode, constructing a multicast and multi-source data transmission model, and training an agent to perform scheduling, the problem of efficient scheduling of parallel traffic transmission in industrial time-sensitive networks is solved, improving scheduling efficiency and data delivery synchronization.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- TIANJIN UNIV
- Filing Date
- 2026-03-19
- Publication Date
- 2026-06-19
AI Technical Summary
Existing parallel traffic transmission scheduling methods in industrial time-sensitive networks introduce numerous constraints on parallel traffic coordination, leading to a shrinking of the feasible solution space and increased scheduling complexity, making it difficult to achieve high-efficiency parallel traffic transmission.
A contrastive imitation learning scheduling method is adopted. By analyzing the data acquisition patterns of industrial applications, a multicast and multi-source data transmission model is constructed. Under the branch-bound paradigm, an imitation learning scheduling framework is built to train an agent to guide the scheduling problem-solving process. An incremental training method is set to adapt to large-scale problems.
It significantly reduces the size of the solution tree, improves the scheduling efficiency of parallel traffic transmission, satisfies coupling jitter constraints, and enhances data delivery synchronization and network resource utilization, making it suitable for complex industrial systems.
Smart Images

Figure CN122247938A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of industrial Internet of Things (IIoT), and more particularly to a contrastive imitation learning scheduling method for parallel stream coordinated transmission in industrial time-sensitive networks. Background Technology
[0002] Time-Sensitive Networking (TSN) is a family of protocols that enables deterministic transmission over non-deterministic Ethernet. Developed by the IEEE 802.1 working group, TSN defines a time-sensitive mechanism for Ethernet data transmission to ensure real-time data transmission and low-latency communication quality. It has become a key technology for achieving integrated communication, computing, and control in the Industrial Internet of Things (IIoT), meeting the stringent reliability and deterministic requirements of automation applications in IIoT for critical data transmission.
[0003] With the rise of distributed industrial applications such as robot swarm control and multimodal defect detection, these applications require coordinated parallel time-sensitive traffic for data acquisition. This means ensuring that the data delivery time difference (i.e., the jitter of transmission completion) remains below a specific time difference threshold. Otherwise, if the data delivery time difference exceeds this threshold, the application may be unable to reliably obtain the data necessary for its normal operation. Therefore, to ensure that time-sensitive networks (also known as industrial time-sensitive networks) can effectively support industrial applications, the data delivery time difference (i.e., the aforementioned jitter) constraint must be incorporated into traffic transmission management to achieve the scheduling of parallel traffic transmission.
[0004] However, existing parallel traffic transmission scheduling methods in industrial time-sensitive networks have shortcomings: for parallel traffic scheduling problems under the combinatorial optimization framework, parallel traffic coordination introduces a large number of constraints to couple the parallel data transmission process, which will reduce the feasible solution space and make the search for feasible solutions more complicated, thus weakening the effectiveness of existing parallel traffic transmission scheduling.
[0005] Therefore, how to achieve efficient scheduling of parallel traffic transmission in industrial time-sensitive networks to improve the quality of data transmission services has become a pressing technical problem in the field of industrial Internet of Things. Summary of the Invention
[0006] The technical problem to be solved by this invention is to provide a contrastive imitation learning scheduling method for parallel flow coordination in industrial time-sensitive networks, addressing the aforementioned limitations of existing technologies. This contrastive imitation learning scheduling method for parallel flow coordination in industrial time-sensitive networks can achieve high-efficiency scheduling of parallel traffic transmission in industrial time-sensitive networks through contrastive imitation learning.
[0007] The technical solution adopted by this invention to solve the above-mentioned technical problems is: a contrastive imitation learning scheduling method for parallel stream coordinated transmission in industrial time-sensitive networks, characterized by comprising the following steps: Step 1: Analyze and process the data acquisition patterns of each industrial application in the industrial time-sensitive network to obtain the data transmission pattern corresponding to each industrial application; wherein, the data transmission pattern corresponding to the industrial application is either a multicast data transmission pattern or a multi-source data transmission pattern. Step 2: Model the time-sensitive traffic in the industrial time-sensitive network and construct a multicast and multi-source data transmission model pair. The multicast and multi-source data transmission model pair includes a multicast transmission model and a multi-source transmission model. Multicast is the distribution of traffic from one host to multiple hosts, and multi-source is the aggregation of traffic from multiple hosts to one host. Step 3: Under the branch-bound paradigm and based on the existing multicast and multi-source data transmission model pair, construct the imitation learning scheduling basic framework, and use this imitation learning scheduling basic framework as the parallel stream coordination transmission scheduling model for industrial time-sensitive networks. Step 4: Based on the parallel stream coordinated transmission scheduling model, train an agent that mimics the learning trajectory of the optimal scheduling problem solution, so that the agent can guide the branching direction in the scheduling problem solution process. Step 5: Construct a contrastive imitation learning method for the coordinated transmission scheduling model to make the feature embedding of samples of the same class more compact and maximize the separation between samples of different classes. Step 6: Set up an incremental training method for the constructed contrastive imitation learning method, allowing the agent to generalize from small-scale scheduling problems to large-scale problems, and realize the scheduling of parallel stream coordinated transmission in industrial time-sensitive networks.
[0008] Improvedly, in the buffer-assisted segmented traffic scheduling method in the industrial time-sensitive network, in step 1, the analysis process includes mining the characteristics of distributed applications in the industrial Internet of Things; wherein, the distributed application spans multiple devices, and the data generated in real time by the distributed application is transmitted to a designated host via traffic.
[0009] Further improvements are made to the buffer-assisted segmented traffic scheduling method in the aforementioned industrial time-sensitive network, wherein the multicast and multi-source data transmission model pair is constructed as follows: Define the set of time-sensitive traffic in an industrial time-sensitive network; where this set of time-sensitive traffic is denoted as... F Traffic collection F include N Traffic streams, traffic collections F The first in i Flow markers f i ,1≤i ≤ N ; f i ={ src i ,dst i , pth i ,sz i}; src i Indicates flow rate f i The starting point of transmission, dst i Indicates flow rate f i The transmission endpoint, pth i Indicates flow rate f i The transmission path sz i Indicates flow rate f i The size of the data packets carried; A first constraint is set for the multicast transmission mode, and a second constraint is set for the multi-source transmission mode; wherein, the first constraint is: for any i ∈[1,| F |], i ′∈[1,| F |], i ≠ i ′, src i = src i′ , dst i ≠dst i′ The second constraint is: for any i ∈[1,| F |], i ′∈[1,| F |], i ≠ i ′, src i ≠ src i′ , dst i = dst i′ ; Define the number of available time slots on each switch port in the current time-sensitive network, and calculate the time slot for each traffic to complete data transmission; where, in a time-sensitive network performing cyclic queuing and forwarding, the traffic... f i The time slot for data transmission completion is marked as p i : ; j ∈[1, J Regarding traffic f i , x i,j It is about i and j binary variables, x i,j Indicates flow rate f i Whether or not it is given the first j Transmission is performed in time slots. J This indicates the number of available time slots on each switch port in the current time-sensitive network; Set coupling jitter constraints to ensure the coordination of traffic data transmission; wherein, the coupling jitter constraints are set as follows: for any i and i ′, i ∈[1,| F |], i ′∈[1,| F |], i ≠ i ′,0≤β i,i′ ≤ jtr , p i - p i′ ≤β i,i′ , p i′ - p i ≤β i,i′ , jtr To address the jitter limit for globally coordinated transmission of traffic data, β i,i′ It is an auxiliary real-valued variable that linearizes the coupling jitter constraint; The optimization objective of the time slot allocation scheme for allocating time slots for each data traffic stream is as follows: ; Maximize Indicates to Find the maximum value.
[0010] Furthermore, in the buffer-assisted segmented traffic scheduling method in the industrial time-sensitive network, in step 3, the branch delimitation paradigm is set as follows: Multiple scheduling problem instances are defined to characterize data stream scheduling scenarios under different transmission conditions, and the optimal scheduling solution for each defined scheduling problem instance is calculated; wherein any scheduling problem instance is labeled as... I, Scheduling Problem Examples I The optimal scheduling solution is denoted as U. * Optimal scheduling solution U * Scheduling decision variables v The value is denoted as U * [ v Scheduling decision variables v An example of a scheduling problem I The elements in the set of decision variables are used to describe the scheduling decisions for parallel time-sensitive flows; scheduling decision variables v Including the indication of the i Is the traffic flow assigned to the first... j Time slot allocation variables for transmission in each time slot x i,j and its equivalent representation; Define the relaxed problem instance at each node in the solution binary tree organized by the branch and bound method, and the relaxed solution corresponding to each relaxed problem instance; where, node N Instances of relaxation issues are marked as I ′ N Examples of relaxation problems I ′ N The relaxation solution is labeled as U ′ N ; for U ′ N A subset, Capture in node N The value of the variable that has been unslacked; All nodes in the binary tree are divided into expert nodes and non-expert nodes; the method for dividing expert nodes and non-expert nodes is as follows: For any v ,if The variable values in and U * If the values of the corresponding variables match, then the node... N It is considered an expert node and included in the expert trajectory; if The variable values in and U * If the values of the corresponding variables do not match, then the node... N Considered a non-expert node; The agent is trained to retain expert nodes while discarding non-expert nodes, thereby mimicking expert trajectories; wherein: for node features o The strategy is π intelligent agent output To determine Whether to retain or discard nodes during sampling; let π * Expert policies that guide the trajectory of experts. z o Indicating expert policy π * Assigned to node features o Tags; Calculate the agent's node features o The training loss is denoted as . : .
[0011] Compared with the prior art, the advantages of the present invention are as follows: First, this invention analyzes data acquisition patterns in industrial applications, distinguishes between multicast and multi-source parallel transmission scenarios, and constructs a unified data transmission model pair. It explicitly embeds parallel flow coordination constraints into the scheduling modeling process, characterizing the collaborative requirements between parallel flows from the source. Based on this, the invention introduces a contrastive imitation learning mechanism within the branch-bound solution framework. This enables the agent to learn the branch selection patterns in the optimal scheduling solution trajectory, prioritizing the retention of high-potential nodes and discarding low-quality branches during the search process. This significantly reduces the size of the solution tree and improves the efficiency of solving combinatorial optimization problems.
[0012] Secondly, this invention employs an incremental training method, enabling the trained agent to gradually generalize from small-scale scheduling instances to large-scale industrial time-sensitive network scenarios. This improves the applicability and engineering deployability of the contrastive imitation learning scheduling method in complex industrial systems. Therefore, this invention can achieve highly efficient coordinated scheduling of multicast and multi-source parallel time-sensitive streams while satisfying the jitter constraint of parallel stream coupling, improving data delivery synchronization and network resource utilization, thereby effectively ensuring the comprehensive real-time and deterministic requirements of distributed industrial applications.
[0013] Finally, the scheduling method proposed in this invention does not require fundamental changes to the protocol architecture of existing industrial time-sensitive networks. It can be deployed in controllers or network management units while remaining compatible with the existing IEEE 802.1 standard framework, demonstrating good engineering adaptability. By replacing traditional scheduling strategies that rely on manual experience or static rules with intelligent methods, it reduces the cost of scheduling strategy design and parameter configuration, minimizes invalid time slot allocation and link idle time, improves overall network service quality, and enhances the system's adaptability in complex industrial scenarios. This invention not only improves the coordination and scheduling efficiency of parallel time-sensitive flows but also enhances the real-time performance, reliability, and engineering deployment feasibility of the Industrial Internet of Things (IIoT) at the system level, possessing broad application value. Attached Figure Description
[0014] Figure 1 This is a schematic diagram of the comparative imitation learning scheduling method for parallel stream coordinated transmission in an industrial time-sensitive network according to an embodiment of the present invention. Detailed Implementation
[0015] The present invention will be further described in detail below with reference to the accompanying drawings and embodiments.
[0016] This embodiment provides a contrastive imitation learning scheduling method for parallel stream coordinated transmission in industrial time-sensitive networks. Specifically, see [link to documentation]. Figure 1 As shown, the contrastive imitation learning scheduling method for parallel stream coordinated transmission in an industrial time-sensitive network in this embodiment includes the following steps 1 to 6: Step 1: Analyze and process the data acquisition patterns of each industrial application in the industrial time-sensitive network to obtain the data transmission pattern corresponding to each industrial application; wherein, the data transmission pattern corresponding to the industrial application is a multicast data transmission pattern or a multi-source data transmission pattern; specifically, in this step 1, the analysis and processing here includes mining the characteristics of distributed applications in the industrial Internet of Things; wherein, the distributed application spans multiple devices, and the data generated in real time by the distributed application is transmitted to a designated host through traffic. Step 2: Model the time-sensitive traffic in the industrial time-sensitive network and construct a multicast and multi-source data transmission model pair. The multicast and multi-source data transmission model pair includes a multicast transmission model and a multi-source transmission model. Multicast is the distribution of traffic from one host to multiple hosts, and multi-source is the aggregation of traffic from multiple hosts to one host. Step 3: Under the branch-bound paradigm and based on the existing multicast and multi-source data transmission model pair, construct the imitation learning scheduling basic framework, and use this imitation learning scheduling basic framework as the parallel stream coordination transmission scheduling model for industrial time-sensitive networks. Step 4: Based on the parallel stream coordinated transmission scheduling model, train an agent that mimics the learning trajectory of the optimal scheduling problem solution, so that the agent can guide the branching direction in the scheduling problem solution process. Step 5: Construct a contrastive imitation learning method for the coordinated transport scheduling model to make the feature embedding of samples of the same class more compact and maximize the separation between samples of different classes. The contrastive imitation learning method here allows the representation of a few positive class samples to stand out in the feature space, thereby mitigating the adverse effects of sample imbalance on imitation learning. Step 6 involves setting up an incremental training method for the constructed contrastive imitation learning method. This allows the agent to generalize from small-scale scheduling problems to large-scale problems, enabling the scheduling of parallel flow coordination in industrial time-sensitive networks. Specifically, this incremental training method effectively preserves the agent's scheduling experience during generalization, relaxes the quantity and quality requirements of training data for large-scale scheduling problem instances, and makes the agent suitable for handling scheduling problems of large-scale parallel flow coordination. This maximizes the real-time data acquisition capabilities of distributed industrial applications in the Industrial Internet of Things (IIoT) using Time-Sensitive Networking (TSN).
[0017] Specifically, in step 2 of this embodiment, the aforementioned multicast and multi-source data transmission model pair is constructed in the following steps a1~a5: Step a1, define the time-sensitive traffic set in the industrial time-sensitive network; wherein, this time-sensitive traffic set is denoted as F Traffic collection F include N Traffic streams, traffic collections F The first in i Flow markers f i ,1≤ i ≤ N ; f i ={ src i ,dst i ,pth i ,sz i}; src i Indicates flow rate f i The starting point of transmission, dst i Indicates flow rate f i The transmission endpoint, pth i Indicates flow rate f i The transmission path szi Indicates flow rate f i The size of the data packets carried; Step a2: Set a first constraint condition for the multicast transmission mode and a second constraint condition for the multi-source transmission mode; wherein, the first constraint condition is: for any i ∈[1,| F |], i ′∈[1,| F |], i ≠ i ′, src i = src i′ , dst i ≠dst i′ The second constraint is: for any i ∈[1,| F |], i ′∈[1,| F |], i ≠ i ′, src i ≠src i′ , dst i = dst i′ ; Step a3: Define the number of available time slots on each switch port in the current time-sensitive network, and calculate the time slot for each traffic to complete data transmission; wherein, in a time-sensitive network performing Cyclic Queuing and Forwarding (CQF), the traffic... f i The time slot for data transmission completion is marked as p i : ; j ∈[1, J Regarding traffic f i , x i,j It is about i and j binary variables, x i,j Indicates flow rate f i Whether or not it is given the first j Transmission is performed in time slots. J This indicates the number of available time slots on each switch port in the current time-sensitive network; Step a4, set coupling jitter constraints to ensure the coordination of traffic data transmission; wherein, the coupling jitter constraints are set as follows: for any i and i ′, i ∈[1,| F |], i ′∈[1,| F |], i ≠ i ′,0≤β i,i′ ≤ jtr , p i - p i′ ≤β i,i′ , p i′ - p i ≤β i,i′ , jtr To address the jitter limit for globally coordinated transmission of traffic data, β i,i′ It is an auxiliary real-valued variable that linearizes the coupling jitter constraint; Step a5: Construct an optimization objective for the time slot allocation scheme that allocates time slots for the transmission of each traffic data stream; wherein, the optimization objective for this time slot allocation scheme is: ; Maximize Indicates to Find the maximum value. By allocating more traffic to time slots and enabling its transmission, the overall transmission level of the time-sensitive network is improved while ensuring the coordinated transmission of parallel traffic.
[0018] It should be noted that when solving a problem instance of P, the branch and bound algorithm organizes the relaxed subproblems of the problem instance into nodes in a binary tree. The deeper the node, the fewer relaxation times for the integer variables. Although the branch and bound algorithm can find the global optimum by calculating all nodes and exploring all branches, it comes with significant computational overhead. However, the branch and bound algorithm can reveal the truly efficient computational path to the optimum, which is a unique trajectory from the root node to a certain leaf node, and can be called the expert trajectory. To enhance the branch and bound algorithm, the agent can be trained to replicate the expert trajectory through imitation learning, enabling the agent to identify and prune nodes that do not lead to the optimal branch, thereby effectively meeting the reliable data acquisition requirements of industrial applications in the TSN-enabled Industrial Internet of Things. It should be pointed out that in step 3 above in this embodiment, the branch and bound paradigm is set in the following steps b1~b5: Step b1: Define multiple scheduling problem instances for scheduling data stream transmission to characterize data stream scheduling scenarios under different transmission conditions, and calculate the optimal scheduling solution for each defined scheduling problem instance; wherein, any scheduling problem instance is labeled as...I, Scheduling Problem Examples I The optimal scheduling solution is denoted as U. * Optimal scheduling solution U * Scheduling decision variables v The value is denoted as U * [ v Scheduling decision variables v An example of a scheduling problem I The elements in the set of decision variables are used to describe the scheduling decisions for parallel time-sensitive flows; scheduling decision variables v Including the indication of the i Is the traffic flow assigned to the first... j Time slot allocation variables for transmission in each time slot x i,j and its equivalent representation; Step b2 defines the relaxed problem instances at each node in the solution binary tree organized by the branch and bound method, and the relaxed solutions corresponding to each relaxed problem instance; where, nodes N Instances of relaxation issues are marked as I ′ N Examples of relaxation problems I ′ N The relaxation solution is labeled as U ′ N ; for U ′ N A subset, Capture in node N The value of the variable that has been unslacked; Step b3 involves dividing all nodes in the binary tree into expert nodes and non-expert nodes; the method for dividing expert nodes and non-expert nodes is as follows: For any v ,if The variable values in and U * If the values of the corresponding variables match, then the node... N It is considered an expert node and included in the expert trajectory; if The variable values in and U * If the values of the corresponding variables do not match, then the node... N Considered a non-expert node; Step b4: Train the agent to retain expert nodes while discarding non-expert nodes, thereby mimicking expert trajectories; wherein: for node features o The strategy is π intelligent agent output To determine Whether to retain or prune nodes during sampling; set π* Expert policies that guide the trajectory of experts. z o Indicating expert policy π * Assigned to node features o The tags; agent training is essentially a supervised learning process of node classification; Step b5: Calculate the node features of the agent. o The training loss is denoted as . : .
[0019] In addition, in step 4 of this embodiment, the imitation learning process includes the following steps c1 to c7: Step c1: Retrieve the defined multiple scheduling problem instances; wherein, the branch and bound method is invoked to solve multiple traffic scheduling problem instances to collect expert trajectories leading to the optimal solution; Step c2 involves initializing the agent's policy and obtaining the agent's behavior policy corresponding to that policy; wherein, the initialized agent policy is tagged as follows. π ; Step c3: Set up the dataset for the corresponding agent's behavior policy, and in the training rounds... r In the next section, each scheduling problem instance is examined sequentially; the dataset is labeled as follows. O ; Step c4: Add the root node of the solution binary tree organized by the branch and bound method that is fully relaxed for the scheduling problem instance to a temporary queue; wherein, the temporary queue is labeled as queue ; Step c5: Nodes are sequentially retrieved from the temporary queue, and the features of each retrieved node are extracted. These extracted node features, along with their labels determined by expert policy, are stored in the pre-defined dataset. Each node retrieved from the temporary queue is labeled as follows: m The extracted node m Feature labeling o ; Step c6: Based on the agent's behavioral policy, decide whether to retain the current node retrieved from the temporary queue. When the agent's behavior policy decides to retain the extracted current node, the child nodes generated after the current node is branched are added to a temporary queue for subsequent nodes to be inspected until all nodes in the temporary queue are empty, then proceed to step c7; otherwise, if the current node is pruned, the next extracted node is processed to determine whether to retain it based on the agent's behavior policy. Step c7, after reviewing all scheduling problem instances, uses the pre-configured dataset. O The agent's policy is trained, and the original agent behavior policy is updated using a hybrid policy and an expert policy to obtain the updated agent behavior policy; wherein, the updated agent behavior policy is labeled as... π ′: π ′= κ r π +(1- κ r ) π * → π ′; κ r This control value is used to determine the degree of policy integration between hybrid strategies and expert policies. κ r Depends on training rounds r .
[0020] Furthermore, it should be noted that during the imitation learning process, the agent's behavioral strategy inevitably introduces negative samples that deviate from the expert's trajectory. An excessive introduction of negative samples leads to an imbalance between positive and negative samples, further driving the trained agent to overly favor pruning nodes. This results in the premature elimination of nodes that might produce better solutions, thereby weakening the scheduling effectiveness of parallel traffic coordination. Therefore, in step 5 of this embodiment, contrastive learning is introduced into imitation learning. By making the feature embeddings of similar samples more compact and maximizing the separation between samples of different categories, the embedding representations of a few positive samples become more prominent in the feature space, thereby mitigating the negative impact of sample imbalance. For step 5 of this embodiment, the construction of the contrastive imitation learning method includes the following steps d1 to d5: Step d1: Define any node feature in the dataset as a sample and define the embedding representation of this sample after neural network processing by the agent; wherein, any node feature in the temporary queue is labeled as... o Node features o The embedded representation of the sample after processing by the agent's neural network is labeled as follows: ; Step d2, define the similarity between the embedding representations of two samples; where, sample o The embedding is ,sample The embedding is Embedded With Embedded The similarity markers between them are : λ is a temperature parameter; sample For those located in the dataset and different from the samples o Node characteristics; Step d3: Using one of the two aforementioned samples as the standard sample, and using whether the samples belong to the same class as the standard sample as the classification criterion, the samples in the aforementioned dataset are divided into a set of similar samples and a set of dissimilar samples. The set of similar samples consists of samples located in the aforementioned dataset that belong to the same class as the standard sample, while the set of dissimilar samples consists of samples located in the aforementioned dataset that do not belong to the same class as the standard sample. The node feature label for the standard sample is... o, Compared with standard samples o Sample sets belonging to the same class are labeled as A O ( o ), ; and standard sample o Sample sets that do not belong to the same class are labeled as B O ( o ), B O ( o )= O / { o}; Step d4: Based on the resulting sets of similar and dissimilar samples, the agent calculates the contrastive loss corresponding to the standard sample; where the standard sample... o The corresponding contrast loss is labeled as : ; in, a Represents a set of similar samples A O ( o Any sample in ) b Represents a set of samples from different classes B O ( o For any sample s in ) o,a Indicates sample o With sample a Feature similarity, s o,b Indicates sample o With sample b Feature similarity; Step d5: Calculate the contrastive imitation learning loss function for the standard samples based on the contrastive loss for the standard samples and the training loss obtained in the imitation learning; wherein, the contrastive imitation learning loss function is denoted as... : .
[0021] It should be noted that, in this embodiment, the comparison imitation learning loss function... The driving agent clusters samples of the same type when distinguishing samples of different categories, ensuring that the sample features are correctly aligned with their respective labels.
[0022] Considering the problem The growth in the size of variables exponentially expands the solution space, making the cost of running branch-and-bound algorithms to collect sufficient expert trajectories for agent training prohibitively high. In this embodiment, the buffer-assisted segmented flow scheduling method in industrial time-sensitive networks employs an incremental training method for contrastive imitation learning during step 6. This incremental training method relaxes the requirements for the quantity and quality of training data on large-scale problems by generalizing agents used for small-scale problems to large-scale problems, while preserving the agent's prior branch-pruning experience.
[0023] Specifically, for step 6 of this embodiment, the incremental training method for the constructed contrastive imitation learning method is set as follows: steps e1 to e5: Step e1: Define a supplementary dataset to the existing dataset corresponding to the large-scale problem instance, and obtain the complete dataset of the large-scale problem instance; wherein, the existing dataset corresponding to the large-scale problem instance is labeled as... O Existing dataset O The supplementary dataset is labeled as O The complete dataset for this large-scale problem instance is labeled as ''. O * ; O * = O ∪ O ′; Step e2, to compare with the standard sample o Using the classification criterion of whether they belong to the same category, data in the supplementary dataset are divided into a set of similar samples and a set of dissimilar samples; where the set of similar samples consists of data located in the supplementary dataset. O ′ and the standard sample o The sample sets belonging to the same class are considered as separate sets, while the sample sets belonging to different classes are located in the supplementary dataset. O ′ and the standard sample o A set of samples that do not belong to the same class; the node features used as standard samples are labeled as... o, Located in the supplementary dataset O ′ and the standard sample o Sample sets belonging to the same class are labeled as A O′ ( o ), Located in the supplementary dataset O ′ and the standard sampleo Sample sets that do not belong to the same class are labeled as B O′ ( o ), B O′ ( o )= O ′ / { o}= O ′; Step e3: Using the standard sample as the quantization standard, calculate the overall distinguishability among the feature embeddings of all samples in the complete dataset; wherein, the overall distinguishability is denoted as... ρ o : ; s o,b Indicates standard sample o With sample b Embedding similarity between them s o,b′ Indicates standard sample o With sample b ′ The embedding similarity between samples, calculated using a previously defined similarity formula; b ′ With sample b All belong to supplementary datasets O During incremental training, as initially in O The agent policies trained on this platform can be generalized to larger-scale problems. ρ o The constantly changing value reflects the agent's response to information from... O and O The evolving understanding of discriminative differences between samples; Step e4, calculate the amount of additional samples in the supplementary dataset relative to the existing dataset; where this amount of additional samples is denoted as γ: γ=γ o , γ o =| A O′ ( o )| / (| A O ( o )|+| A O′ ( o )|); | A O ( o )|=ε· / (|O|-1),| A O′ ( o)|=ε· / (|O′|-1); where, | A O′ ( o )| represents a dataset A O′ ( o The total number of samples within ) | A O ( o | represents a dataset A O ( o The total number of samples in dataset O, where |O| represents the total number of samples in dataset O, and |O′| represents the total number of samples in dataset O′. Step e5: Based on the obtained overall distinguishability and the amount of sample replenishment, calculate the incremental contrastive imitation learning loss for the standard sample; whereby, for the standard sample... o The incremental contrastive imitation learning loss is labeled as : ; ; in, For intelligent agents targeting samples o In supplementary datasets O The contrast-imitation loss of ′ a ′ For sample set A O′ ( o One of the samples in ) b ′ For sample set B O′ ( o One sample from ) . Compare the imitation loss Encourage agents to use samples o Feature embedding and O Feature embeddings of similar samples are aligned, thereby facilitating the generalization of agent policies from smaller to larger problem sizes. The second and third terms of the incremental contrastive imitation learning loss formula introduce complementary biases to capture the agent's affinity for the existing dataset. O and supplementary datasets O The current state of understanding of the distinguishability between samples.
[0024] Although preferred embodiments of the present invention have been described in detail above, it should be clearly understood that various modifications and variations can be made to the present invention by those skilled in the art. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of the present invention should be included within the scope of protection of the present invention.
Claims
1. A comparative imitation learning scheduling method for coordinated transmission of parallel flows in industrial time-sensitive networks, characterized in that, Includes the following steps: Step 1: Analyze and process the data acquisition patterns of each industrial application in the industrial time-sensitive network to obtain the data transmission pattern corresponding to each industrial application; wherein, the data transmission pattern corresponding to the industrial application is either a multicast data transmission pattern or a multi-source data transmission pattern. Step 2: Model the time-sensitive traffic in the industrial time-sensitive network and construct a multicast and multi-source data transmission model pair. The multicast and multi-source data transmission model pair includes a multicast transmission model and a multi-source transmission model. Multicast is the distribution of traffic from one host to multiple hosts, and multi-source is the aggregation of traffic from multiple hosts to one host. Step 3: Under the branch-bound paradigm and based on the existing multicast and multi-source data transmission model pair, construct the imitation learning scheduling basic framework, and use this imitation learning scheduling basic framework as the parallel stream coordination transmission scheduling model for industrial time-sensitive networks. Step 4: Based on the parallel stream coordinated transmission scheduling model, train an agent that mimics the learning trajectory of the optimal scheduling problem solution, so that the agent can guide the branching direction in the scheduling problem solution process. Step 5: Construct a contrastive imitation learning method for the coordinated transmission scheduling model to make the feature embedding of samples of the same class more compact and maximize the separation between samples of different classes. Step 6: Set up an incremental training method for the constructed contrastive imitation learning method, allowing the agent to generalize from small-scale scheduling problems to large-scale problems, and realize the scheduling of parallel stream coordinated transmission in industrial time-sensitive networks.
2. The method of claim 1, wherein, In step 1, the analysis and processing includes mining the characteristics of distributed applications in the Industrial Internet of Things; wherein, the distributed application spans multiple devices, and the data generated in real time by the distributed application is transmitted to a designated host via traffic.
3. The method of claim 1, wherein, The construction method of the multicast and multi-source data transmission model pair is as follows: Define the set of time-sensitive traffic in an industrial time-sensitive network; where this set of time-sensitive traffic is denoted as... F Traffic collection F include N Traffic streams, traffic collections F The first in i Flow markers f i ,1≤ i ≤ N ; f i ={ src i ,dst i ,pth i , sz i }; src i Indicates flow rate f i The starting point of transmission, dst i Indicates flow rate f i The transmission endpoint, pth i Indicates flow rate f i The transmission path sz i Indicates flow rate f i The size of the data packets carried; A first constraint is set for the multicast transmission mode, and a second constraint is set for the multi-source transmission mode; wherein, the first constraint is: for any i ∈[1,| F |], i ′∈[1,| F |], i ≠ i ′, src i = src i′ , dst i ≠ dst i′ The second constraint is: for any i ∈[1,| F |], i ′∈[1,| F |], i ≠ i ′, src i ≠src i′ , dst i = dst i′ ; Define the number of available time slots on each switch port in the current time-sensitive network, and calculate the time slot for each traffic to complete data transmission; where, in a time-sensitive network performing cyclic queuing and forwarding, the traffic... f i The time slot for data transmission completion is marked as p i : ; j ∈[1, J ]; For traffic f i , x i,j It is about i and j binary variables, x i,j Indicates flow rate f i Whether or not it is given the first j Transmission is performed in time slots. J This indicates the number of available time slots on each switch port in the current time-sensitive network; Set coupling jitter constraints to ensure the coordination of traffic data transmission; wherein, the coupling jitter constraints are set as follows: for any i and i ′, i ∈[1,| F |], i ′∈[1,| F |], i ≠ i ′,0≤β i,i′ ≤ jtr , p i - p i′ ≤β i,i′ , p i′ - p i ≤β i,i′ , jtr To address the jitter limit for globally coordinated transmission of traffic data, β i,i′ It is an auxiliary real-valued variable that linearizes the coupling jitter constraint; The optimization objective of the time slot allocation scheme for allocating time slots for each data traffic stream is as follows: ; Maximize Indicates to Find the maximum value.
4. The buffer-assisted segmented traffic scheduling method in industrial time-sensitive networks according to claim 3, characterized in that, In step 3, the branch delimitation paradigm is set as follows: Multiple scheduling problem instances are defined to characterize data stream scheduling scenarios under different transmission conditions, and the optimal scheduling solution for each defined scheduling problem instance is calculated; wherein any scheduling problem instance is labeled as... I, Scheduling Problem Examples I The optimal scheduling solution is denoted as U. * Optimal scheduling solution U * Scheduling decision variables v The value is denoted as U * [ v Scheduling decision variables v An example of a scheduling problem I The elements in the set of decision variables are used to describe the scheduling decisions for parallel time-sensitive flows; scheduling decision variables v Including the indication of the i Is the traffic flow assigned to the first... j Time slot allocation variables for transmission in each time slot x i,j and its equivalent representation; Define the relaxed problem instance at each node in the solution binary tree organized by the branch and bound method, and the relaxed solution corresponding to each relaxed problem instance; where, node N Instances of relaxation issues are marked as I ′ N Examples of relaxation problems I ′ N The relaxation solution is labeled as U ′ N ; for U ′ N A subset, Capture in node N The value of the variable that has been unslacked; All nodes in the binary tree are divided into expert nodes and non-expert nodes; the method for dividing expert nodes and non-expert nodes is as follows: For any v ,if The variable values in and U * If the values of the corresponding variables match, then the node... N It is considered an expert node and included in the expert trajectory; if The variable values in and U * If the values of the corresponding variables do not match, then the node... N Considered a non-expert node; The agent is trained to retain expert nodes while discarding non-expert nodes, thereby mimicking expert trajectories; wherein: for node features o The strategy is π intelligent agent output To determine Whether to retain or discard nodes during sampling; let π * Expert policies that guide the trajectory of experts. z o Indicating expert policy π * Assigned to node features o Tags; Calculate the agent's node features o The training loss is denoted as . : .
5. The buffer-assisted segmented traffic scheduling method in industrial time-sensitive networks according to claim 4, characterized in that, In step 4, the imitation learning process includes the following steps c1 to c7: Step c1: Retrieve the defined multiple scheduling problem instances; Step c2 involves initializing the agent's policy and obtaining the agent's behavior policy corresponding to that policy; wherein, the initialized agent policy is tagged as follows. π ; Step c3: Set up the dataset for the corresponding agent's behavior policy, and in the training rounds... r In the next section, each scheduling problem instance is examined sequentially; the dataset is labeled as follows. O ; Step c4: Add the root node of the solution binary tree organized by the branch and bound method that is fully relaxed for the scheduling problem instance to a temporary queue; wherein, the temporary queue is labeled as queue ; Step c5: Nodes are sequentially retrieved from the temporary queue, and the features of each retrieved node are extracted. These extracted node features, along with their labels determined by expert policy, are stored in the pre-defined dataset. Each node retrieved from the temporary queue is labeled as follows: m The extracted node m Feature labeling o ; Step c6: Based on the agent's behavioral policy, decide whether to retain the current node retrieved from the temporary queue. When the agent's behavior policy decides to retain the extracted current node, the child nodes generated after the current node is branched are added to a temporary queue for subsequent nodes to be inspected until all nodes in the temporary queue are empty, then proceed to step c7; otherwise, if the current node is pruned, the next extracted node is processed to determine whether to retain it based on the agent's behavior policy. Step c7, after reviewing all scheduling problem instances, uses the pre-configured dataset. O The agent's policy is trained, and the original agent behavior policy is updated using a hybrid policy and an expert policy to obtain the updated agent behavior policy; wherein, the updated agent behavior policy is labeled as... π ′: π ′= κ r π +(1- κ r ) π * → π ′; κ r This control value is used to determine the degree of policy integration between hybrid strategies and expert policies. κ r Depends on training rounds r .