An iot cloud edge coordination micro-service scheduling method
By modeling microservice scheduling as a directed acyclic graph and a Markov decision process, and combining it with a deep deterministic policy gradient algorithm, cloud-edge collaborative computing is optimized, solving the problem of low task scheduling efficiency in the cloud-edge collaborative computing architecture, and achieving efficient and adaptive scheduling decisions and load balancing.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- 山东怡然信息技术有限公司
- Filing Date
- 2026-04-13
- Publication Date
- 2026-06-19
AI Technical Summary
In existing technologies, cloud-edge collaborative computing architectures are inefficient in task scheduling in microservice applications, especially when there are many nodes and complex task dependencies. They are difficult to adapt to dynamic environmental changes, resulting in latency and poor resource utilization.
The microservice scheduling is modeled as a directed acyclic graph. Combined with the cloud-edge collaborative computing infrastructure model, a Markov decision process is constructed. Through the deep deterministic policy gradient algorithm, node selection is decomposed into node group selection and intra-node selection. First and second schedulers are designed to optimize the total task time and load balancing.
It significantly reduces the learning difficulty of the scheduling model, improves the convergence performance when the number of nodes increases, has good scalability and robustness, can adapt to bandwidth fluctuations and changes in node load, accurately describes the dependencies between subtasks, and provides accurate scheduling decision basis.
Smart Images

Figure CN122019213B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of cloud computing and edge computing collaborative scheduling technology, and in particular to an IoT cloud-edge collaborative microservice scheduling method. Background Technology
[0002] The statements in this section are merely background information relating to this disclosure and do not necessarily constitute prior art.
[0003] While traditional cloud computing offers powerful centralized computing capabilities, its reliance on remote data centers results in high data transmission latency and bandwidth consumption, making it difficult to meet the time-sensitive needs of microservice applications. A complete microservice application process typically involves a series of sequentially dependent computational tasks, which vary significantly in computational load, memory requirements, and data transmission volume. To address the real-time requirements of microservice applications, cloud-edge collaborative computing architecture has emerged. This architecture deploys computational tasks on edge nodes closer to the data source or on cloud nodes with greater computing power, balancing latency and computational efficiency through distributed resource utilization. Cloud-edge collaborative computing extends computing, storage, and network resources from the central cloud data center to edge nodes closer to end users and devices, forming a geographically distributed, heterogeneous, and dynamically dynamic continuous pool of computing resources. In such an environment, user requests must traverse complex microservice call relationships, traffic patterns change significantly over time, and cross-node network conditions such as latency, bandwidth, and packet loss rate fluctuate due to resource contention, background traffic, and routing changes. These intertwined dynamic factors directly impact the end-to-end latency, throughput, and reliability of microservice applications deployed on container orchestration platforms. How to efficiently schedule tasks to minimize completion time has become an extremely challenging problem.
[0004] In existing technologies, deep reinforcement learning has been incorporated into task scheduling schemes, such as deep deterministic policy gradient algorithms. However, the action space of existing reinforcement learning methods expands exponentially with the number of nodes, leading to problems such as low exploration efficiency, slow model convergence, and poor scalability. They are particularly unsuitable for tasks involving a large number of heterogeneous cloud-edge nodes and complex task dependencies. Therefore, designing a method that can fully utilize the characteristics of a cloud-edge layered architecture, efficiently process tasks, and adapt to dynamic environmental changes is a pressing technical problem to be solved in this field. Summary of the Invention
[0005] To solve the above-mentioned technical problems, or at least partially solve them, the present invention provides an IoT cloud-edge collaborative microservice scheduling method.
[0006] This invention provides an IoT cloud-edge collaborative microservice scheduling method, comprising:
[0007] The process of scheduling several microservices to perform tasks is modeled as a directed acyclic graph;
[0008] A cloud-edge collaborative computing infrastructure model is constructed, which models all computing devices at the edge and in the cloud as computing nodes with set computing capabilities and memory capacity, groups the nodes and configures the bandwidth and communication latency between the node groups;
[0009] Based on the aforementioned directed acyclic graph and cloud-edge collaborative computing infrastructure model, a scheduling optimization objective is constructed;
[0010] The scheduling process is modeled as a Markov decision process, which is decomposed into multiple decision steps. Each decision step handles a ready subtask. The state space, action space, state transition function, and reward function are defined according to the scheduling optimization objective.
[0011] A scheduling model for Markov decision processes is constructed and trained. The scheduling model includes a first scheduler and a second scheduler based on deep deterministic policy gradient. The first scheduler, based on deep deterministic policy gradient, selects the node group to execute the subtask according to the current global state and the subtask status. The second scheduler selects computing nodes from the nodes in the node group selected by the first scheduler according to the local state to achieve load balancing within the node group.
[0012] When applied, scheduling decisions are made using a scheduling model.
[0013] Furthermore, the task of scheduling several microservices is modeled as a directed acyclic graph (DAG). The vertices of the DAG represent the subtasks implemented by several microservices, and the directed edges between subtasks represent the dependencies between them. Only after the upstream subtask is completed can the downstream subtask be executed. The vertex configuration of the DAG determines the attributes of how to schedule the subtasks, including: the workload of the subtask, the memory requirements of the subtask, and the amount of output data of the subtask.
[0014] Furthermore, the edge can be further divided into several sub-edges, or the cloud can be further divided into several sub-clouds, wherein the node group is any one of the cloud, the sub-clouds subdivided by the cloud, the edge, or the sub-edges subdivided by the edge.
[0015] Furthermore, the goal of scheduling optimization is to obtain a scheduling scheme under allocation constraints that minimizes the total task time.
[0016] The allocation constraints include: during the entire task processing, each subtask of the directed acyclic graph is assigned to any node of the cloud-edge collaborative computing infrastructure model once, and during allocation, the memory requirements of the subtask must not exceed the memory capacity of the assigned node.
[0017] Furthermore, the calculation process for the total task time is as follows:
[0018] Based on the workload of subtask v The computation time is determined by the computational power of node n:
[0019] ;
[0020] in, The computational power of node n;
[0021] If subtask v requires communication for processing, the communication time is the maximum transmission time for receiving the task output data from all upstream subtasks:
[0022] ;
[0023] in, For the upstream subtask of subtask v, For subtask v, the set of upstream subtasks. For the upstream subtask of subtask v The amount of data output by the task. Indicates from upstream subtask The node group in which the node is located To the node group where node v is located Bandwidth between; Indicates from upstream subtask The node group in which the node is located To the node group where node v is located Delay between;
[0024] The total time spent by subtask v on node n is:
[0025] ;
[0026] Considering the total time of all upstream subtasks, the total time of subtask v is:
[0027] ;
[0028] in, This represents the maximum total time taken by all upstream subtasks of subtask v. For an entry subtask with no upstream subtasks, ;
[0029] For the entire task, the total time taken is the maximum of the total times taken by all subtasks, that is:
[0030] .
[0031] Furthermore, the state space of the Markov decision process includes a global state space and a local state space. The global state space contains the feature vector of the currently ready subtask set, the aggregation information of each node group, and the global dependency information. The feature vector of the currently ready subtask set in the global state includes: the computational workload, memory requirements, output data size, and number of ready subtasks for all ready subtasks. The aggregation information of each node group includes: the average computational load of each node group, the aggregate available memory capacity, and the average communication latency from each node group to all upstream subtasks. The global dependency information includes: the number of unscheduled subtasks in the task and the estimated remaining execution time. The estimated remaining execution time is estimated by calculating the longest path length from the currently unscheduled node to the exit node in the directed acyclic graph.
[0032] The local state of each node group includes the remaining memory, computing power, and expected available time of each computing node in the node group; the expected available time of each computing node in the local state is determined by estimating the time point when the node completes all currently assigned tasks based on the sum of the computing time and communication time of all tasks in the node's current task queue.
[0033] Furthermore, the state transition process includes the following updates:
[0034] Resource status update: Assign subtasks to nodes, and the remaining memory of the nodes reduces the memory requirement of the subtasks;
[0035] A subtask is added to the node's task queue, and its expected completion time is updated to the node's current completion time plus the total time spent by the subtask on the node.
[0036] Ready Subtask Set Update: Removes subtasks assigned to the current ready subtask set and adds new ready subtasks;
[0037] State variable update: Calculate new aggregate statistics based on the updated node resources to form a new global state, and form a new local state based on the updated node resources.
[0038] Furthermore, the reward function is defined as follows:
[0039]
[0040] in, For the current subtask Selected node Total time spent on A set positive reward is given when all subtasks of the entire task are scheduled to complete, in order to guide the scheme to complete the scheduling process of the entire task. Incentives for resource utilization efficiency are used to encourage efficient resource use. These are the weighting coefficients.
[0041] Furthermore, the training process for the first and second schedulers includes the following steps:
[0042] Step S501: Initialize all network parameters using the Xavier initialization method; initialize the target network parameters to be the same as the main network;
[0043] Step S502: Initialize the experience playback buffer D;
[0044] Step S503: Start the training round loop. Each training round corresponds to a complete task scheduling process. In each round, the number of subtasks, computational requirements, memory requirements, and the amount of subtask output data are randomly generated from a preset distribution. The directed acyclic graph structure of the task is also randomly generated.
[0045] Step S504: Initialize the current round, reset the state, generate an initial set of ready subtasks, and initialize the global and local states;
[0046] Step S505: Determine if the ready subtask set is empty. If it is not empty, continue execution; if it is empty and all tasks have been scheduled, end the current round.
[0047] Step S506: Select a subtask from the set of ready subtasks based on path priority;
[0048] Step S507: The first scheduler selects the target node group, adding noise disturbance during the selection process;
[0049] Step S508: The second scheduler selects the target node from the target node group, adding noise disturbance during the selection process;
[0050] Step S509: Perform the allocation action, assign the subtask to the selected node, update the node's remaining memory and task queue; according to the subtask's dependency, add the tasks whose predecessor tasks in its downstream tasks have been scheduled to the ready subtask set.
[0051] Step S510: Calculate the immediate reward based on the reward function;
[0052] Step S511: Construct an experience tuple and store it in the experience replay buffer. The experience tuple includes the global state and all local states, the action of selecting a node group and the action of selecting a node in the node group, the immediate reward, and the updated global and local states.
[0053] Step S512: Determine whether the training conditions have been met. If the number of samples in the experience replay buffer reaches the preset threshold, perform network update; otherwise, return to step S505 to continue collecting samples.
[0054] Step S513: Randomly sample a small batch of samples from the experience replay buffer;
[0055] Step S514: Calculate the target value and loss of the first critic network using the samples; minimize the loss of the first critic network using the Adam optimizer;
[0056] Step S515: Calculate the target value and loss for each second critic, and minimize the network loss for each second critic using the Adam optimizer;
[0057] Step S516: Calculate the policy gradient of the first actor network and update the parameters of the first actor network along the gradient ascent direction using the Adam optimizer.
[0058] Step S517: Calculate the policy gradient of each second actor network and update the parameters of each second actor network using the Adam optimizer;
[0059] Step S518: Soft update target network parameters;
[0060] Step S519: If the first actor network and each of the second actor networks have converged, or the maximum number of rounds has been reached, then the training ends; otherwise, return to step S504 to continue to the next round.
[0061] Furthermore, selecting a subtask based on path priority includes: calculating the path length of each ready subtask, selecting the longest path length from that subtask to the exit subtask, and prioritizing the scheduling of the subtask with the longest path length.
[0062] The technical solutions provided in the embodiments of the present invention have the following advantages compared with the prior art:
[0063] This invention decomposes the large-scale node selection action space of a cloud-edge system into the selection of node groups and the selection of nodes within node groups. This significantly reduces the dimensionality of the action space of each decision network in the scheduling model, lowers the learning difficulty for each decision network, and allows the algorithm to maintain high convergence performance and good scalability even as the number of nodes increases. The first scheduler selects node groups based on computation and communication, while the second scheduler selects nodes based on load balancing. Faced with uncertainties such as bandwidth fluctuations and changes in node load, this invention can adaptively adjust the scheduling strategy, exhibiting stronger robustness.
[0064] This invention applies directed acyclic graphs to microservice scheduling modeling for tasks, enabling precise description of many-to-many dependencies between subtasks. It supports providing the output of an upstream subtask to multiple downstream subtasks, and a downstream subtask receiving the output of multiple upstream subtasks. The scheduling optimization objective established based on this can accurately calculate task readiness time and communication overhead, providing a more accurate basis for scheduling decisions.
[0065] The method of this invention can be adapted to computing nodes of different sizes. When the infrastructure expands, only a corresponding second scheduler needs to be added, without redesigning the entire scheduling architecture. Attached Figure Description
[0066] The accompanying drawings, which are incorporated in and form part of this specification, illustrate embodiments consistent with the invention and, together with the description, serve to explain the principles of the invention.
[0067] To more clearly illustrate the technical solutions in the embodiments of the present invention or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, for those skilled in the art, other drawings can be obtained based on these drawings without creative effort.
[0068] Figure 1 A flowchart of an IoT cloud-edge collaborative microservice scheduling method provided in an embodiment of the present invention;
[0069] Figure 2 This is a schematic diagram of a directed acyclic graph provided in an embodiment of the present invention;
[0070] Figure 3 A schematic diagram of a grouped cloud-edge collaborative computing infrastructure model provided in an embodiment of the present invention;
[0071] Figure 4 A schematic diagram of another grouped cloud-edge collaborative computing infrastructure model provided in an embodiment of the present invention;
[0072] Figure 5 This is a schematic diagram of an IoT cloud-edge collaborative microservice scheduling device provided in an embodiment of the present invention. Detailed Implementation
[0073] To make the objectives, technical solutions, and advantages of the embodiments of the present invention clearer, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.
[0074] It should be noted that, in this document, the terms "comprising," "including," or any other variations thereof are intended to cover non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such a process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one..." does not exclude the presence of other identical elements in the process, method, article, or apparatus that includes said element.
[0075] Example 1
[0076] This invention provides a cloud-edge collaborative microservice scheduling method for the Internet of Things (IoT), such as... Figure 1 As shown, it includes:
[0077] S100 models the process of scheduling several microservices to implement tasks as a directed acyclic graph G=(V,E), where V is the set of vertices and E is the set of dependent edges.
[0078] like Figure 2 As shown, the vertices of the directed acyclic graph (DAG) represent the subtasks implemented by the microservices, and the directed edges between vertices represent the dependencies between these subtasks. The vertex configuration of the DAG determines how subtasks are scheduled, including subtask workload. Subtask memory requirements and the amount of subtask output data After an upstream microservice completes its subtask, it transmits the data to downstream microservices, allowing the downstream microservices' subtasks to execute. The output of an upstream subtask can be provided to multiple downstream subtasks, and a downstream microservice can receive output from multiple upstream subtasks. For vertices with multiple outgoing edges, their task output data needs to be transmitted to each downstream task separately; for vertices with multiple incoming edges, the task must wait for the outputs of all upstream tasks to arrive before it can begin execution.
[0079] S200 constructs a cloud-edge collaborative computing infrastructure model. Microservices run at the edge or in the cloud as needed. This application's cloud-edge collaborative computing infrastructure model models all computing devices at the edge and in the cloud as computing nodes with defined computing capabilities and memory capacities, groups the nodes, and configures the bandwidth and communication latency between node groups.
[0080] like Figure 3 As shown, the edge is a set of edge computing nodes, which is used for ease of description. This represents the e-th edge computing node. Edge computing nodes directly obtain data from the data source, but their computing resources are limited. They are suitable for low-latency, lightweight microservices. The number of edge computing nodes is set to N1. Any edge computing node... The computing power is expressed as Memory capacity is expressed as .
[0081] The cloud refers to a collection of cloud nodes that rely on high-performance computing servers. For ease of description... Let represent the c-th cloud node. Cloud nodes are used to execute computationally complex and computationally intensive microservices. Although the cloud supports computationally complex and computationally intensive subtasks, there is communication time between it and the edge computing nodes. The completion time of subtasks executed by the cloud needs to take communication time into account. Let the number of cloud nodes be N², and any cloud node... The computing power is expressed as Memory capacity is expressed as .
[0082] During task processing, edge computing nodes and cloud nodes can communicate, allowing different subtasks of the complete task to be handled by edge computing nodes and cloud computing nodes respectively. In the cloud-edge collaborative computing infrastructure model in this example, the communication latency and bandwidth between cloud nodes and edge computing nodes are configured to model the communication time between the cloud and the edge.
[0083] In another implementation, to address the expansion of the edge action space during subsequent decision-making processes caused by edge expansion, such as... Figure 4 As shown, the edge is further divided into several sub-edges. Accordingly, just as scheduling between the cloud and the edge requires consideration of communication time between the cloud and the edge, scheduling between sub-edges requires consideration of scheduling time between sub-edges. Therefore, additional configuration is needed for the communication latency and bandwidth between each sub-edge.
[0084] Similarly, in order to cope with the expansion of the cloud action space in the subsequent decision-making process brought about by cloud expansion, the cloud can be divided into several sub-clouds, referring to the edge grouping approach.
[0085] For ease of expression, the cloud, the sub-clouds subdivided by the cloud, the edge, or the sub-edges subdivided by the edge are collectively referred to as node groups.
[0086] Once cross-layer or cross-group scheduling occurs, a comprehensive trade-off between computing power and communication time is required.
[0087] S300, based on the directed acyclic graph and cloud-edge collaborative computing infrastructure model, constructs a scheduling optimization objective.
[0088] During the entire task processing, each subtask of the directed acyclic graph is assigned to any node of the cloud-edge collaborative computing infrastructure model once, and the memory requirements of the subtask must not exceed the memory capacity of the assigned node during the assignment.
[0089] The goal of scheduling optimization is to obtain a scheduling scheme that minimizes the total task time under the above allocation constraints.
[0090] The calculation process for the total task time is as follows:
[0091] Define the total time spent by subtask v on node n as follows: The total time spent by subtask v on node n is the sum of the longest communication time spent receiving data from all upstream subtasks and the computation time spent on node n.
[0092] Specifically, based on the workload of subtask v The computation time is determined by the computational power of node n:
[0093] ;
[0094] in, The computational power of node n, depending on the type of node n, is... or .
[0095] If subtask v requires communication for processing, the communication time is the maximum transmission time for receiving the task output data from all upstream subtasks:
[0096] ;
[0097] in, For the upstream subtask of subtask v, For subtask v, the set of upstream subtasks. For the upstream subtask of subtask v The amount of data output by the task. Indicates from upstream subtask The node group in which the node is located To the node group where node v is located Bandwidth between; Indicates from upstream subtask The node group in which the node is located To the node group where node v is located Delay between.
[0098] The total time spent by subtask v on node n is:
[0099] ;
[0100] Considering the total time of all upstream subtasks, the total time of subtask v is:
[0101] ;
[0102] in, This represents the maximum total time taken by all upstream subtasks of subtask v. For an entry subtask with no upstream subtasks, The above equation provides an iterative method for calculating the total time of any subtask.
[0103] Furthermore, for the entire task, the total time consumed is the maximum of the total time consumed by all subtasks, that is:
[0104] .
[0105] S400 models the scheduling process as a Markov decision process, decomposing it into multiple decision steps, each handling a ready subtask. A ready subtask is one whose upstream subtasks have all been scheduled. The state space, action space, state transition function, and reward function are defined.
[0106] In this application, the state space of the Markov decision process includes a global state space and a local state space. The global state includes the feature vectors of the currently ready subtask set, the aggregation information of each node group, and the global dependency information; the local state of each node group includes the remaining memory, computing power, and expected available time of each computing node in that node group.
[0107] The feature vector of the currently ready subtask set in the global state includes: the computational workload, memory requirements, output data size, and number of ready subtasks for all ready subtasks; the aggregated information for each node group includes: the average computational load of each node group, the aggregated available memory capacity, and the average communication latency from each node group to all upstream subtasks; the global dependency information includes: the number of unscheduled subtasks in the task and the estimated remaining execution time. The estimated remaining execution time is estimated by calculating the longest path length from the currently unscheduled node to the exit node in the directed acyclic graph.
[0108] The expected available time for each computing node in the local state is determined by estimating the time point when the node completes all currently assigned tasks based on the sum of the computing time and communication time of all tasks in the node's current task queue.
[0109] Since nodes in the cloud and at the edge are divided into node groups, the action space is decoupled into actions for selecting node groups and actions for selecting nodes within those groups. This action decoupling design breaks down the original large action space (all nodes) into two smaller action spaces, significantly reducing the learning difficulty for subsequent decision-making models.
[0110] The state transition function describes how, in a Markov decision process, starting from the current state and performing an action, the process deterministically transitions to the next state.
[0111] In this application, the state includes a global state and the local states of all node groups. The state transition process involves the following updates:
[0112] Resource status update: Subtasks are assigned to nodes, and the remaining memory on the nodes reduces the memory requirements of the subtasks.
[0113] A subtask is added to the node's task queue, and its expected completion time is updated to the node's current completion time plus the total time the subtask takes on that node.
[0114] Ready Subtask Set Update: Remove subtasks assigned to the current ready subtask set. For each downstream subtask assigned to a subtask, check if all its upstream subtasks have been scheduled. If so, add the corresponding downstream subtask assigned to the subtask to the ready subtask set.
[0115] State variable update: Calculate new aggregate statistics based on the updated node resources (remaining memory, task queue length) to form a new global state. Form a new local state based on the updated resources of each node.
[0116] Through the state transition function, the subsequent scheduler can predict the consequences of decisions, thereby learning the optimal policy.
[0117] This application defines the reward function based on the scheduling optimization objective as follows:
[0118]
[0119] in, For the current subtask Selected node Total time spent on The time cost of the current subtask is directly penalized. This time cost already includes communication and computation overhead and is directly related to the final completion time of the task in the scheduling optimization objective. A set positive reward is given when all subtasks of the entire task are scheduled to complete, in order to guide the scheme to complete the scheduling process of the entire task. This is a resource utilization efficiency reward, used to encourage efficient resource utilization. For example, a positive reward is given when the remaining memory of the selected node matches the memory requirements of the task, or when the selected node is the node with the lightest current load. This is the weighting coefficient. It is usually set to... It is 0.1. It is set to 0.01 to maintain the dominant position of the optimized scheduling objective.
[0120] S500 constructs and trains a scheduling model for Markov decision processes, which includes a first scheduler and a second scheduler based on deep deterministic policy gradients.
[0121] The first scheduler consists of a first actor network and a first critic network. The first actor network takes the global state as input and outputs a node group decision vector whose dimension corresponds to the number of node groups. This node group decision vector, after passing through a softmax layer, represents the probability distribution of selecting each node group. The structure of the first actor network includes: an input layer, a fully connected layer, ReLU activation, another fully connected layer, ReLU activation, an output layer, and a softmax activation layer. The first critic network takes the global state and the encoded index of the selected node group as input and outputs a scalar Q-value, representing the expected cumulative reward for selecting that node group given the global state. The structure of the first critic network is: an input layer, a fully connected layer, ReLU activation, another fully connected layer, ReLU activation, an output layer, and a linear activation layer.
[0122] Each node group corresponds to a second scheduler, which includes a second actor network and a second critic network. The input of the second actor network includes the local state, and the output is a node decision vector representing the number of nodes in a node group. After passing through a softmax layer, the node decision vector represents the probability distribution of selecting each node in that node group. The structure of the second actor network is consistent with that of the first actor network, but the output dimension is different. The input of the second critic network is the local state and the encoding of the selected node n, and the output is a scalar Q-value, representing the expected cumulative reward for selecting that node given the local state.
[0123] The training process for the first and second schedulers includes the following steps:
[0124] Step S501: Initialize all network parameters using the Xavier initialization method.
[0125] Initialize the target network parameters to be the same as the main network.
[0126] Step S502: Initialize the experience replay buffer D, setting its capacity to 100,000. A priority experience replay mechanism is adopted, assigning priority to samples based on TD error to increase the sampling probability of important samples.
[0127] Step S503: Begin the training round cycle. Each training round corresponds to a complete task scheduling process. Set the maximum number of rounds to 5000. In each round, the number of subtasks, computational requirements, memory requirements, and subtask output data volume are randomly generated from a preset distribution. The directed acyclic graph structure of the task is also randomly generated.
[0128] Step S504: Initialize the current round. Reset the state, generate an initial set of ready subtasks, and initialize the global and local states.
[0129] Step S505: Determine if the ready subtask set is empty. If not empty, continue execution; if empty and all tasks have been scheduled, end the current round.
[0130] Step S506: Select a subtask from the set of ready subtasks. The order in which subtasks are selected affects scheduling performance. This embodiment adopts a path priority-based strategy: calculate the path length of each ready subtask (the longest path length from the subtask to the exit subtask), and prioritize scheduling the subtask with the longest path length. This strategy helps to process subtasks that are more likely to cause long task completion times as early as possible, avoiding an increase in overall completion time due to subtask delays.
[0131] Step S507: The first scheduler selects the target node group. The current global state is input into the first actor network of the first scheduler to obtain the probability distribution for selecting the node group. Then, a network with a mean of 0 and a standard deviation of [missing value] is added. Gaussian noise This yields the probability distribution of the selected node group with noise. The node group with the highest probability is then selected using the argmax operation. (Noise standard deviation) The value decays during training, starting at 0.3 and decreasing to 0.95 every 100 rounds to balance exploration and exploitation.
[0132] Step S508: The second scheduler selects target nodes from the target node group. The current local state of the target node group is obtained. The current local state is input into the second actor network of the second scheduler to obtain the probability distribution vector of the selected nodes. A vector with a mean of 0 and a standard deviation of is added. Gaussian noise After adding Gaussian noise, a memory feasibility mask is applied: for nodes whose memory requirements exceed their remaining memory, their probability is set to negative infinity, preventing them from being selected in subsequent argmax operations. Then, the node with the highest probability is selected. Gaussian noise. The standard deviation also decays during the training process.
[0133] Step S509: Perform the allocation action. Assign the subtask to the selected node, update the node's remaining memory and task queue. Based on the subtask's dependencies, add tasks whose predecessor tasks in its downstream tasks have been scheduled to the ready subtask set.
[0134] Step S510: Calculate the immediate reward based on the reward function. If all tasks are completed after the current subtask is finished, then r1 is set; otherwise, it is zero. The resource utilization efficiency reward r2 is calculated based on the quality of the selected node. For example, if the selected node n is the lightest-loaded node in the current level, then r2=1; otherwise, r2=0.
[0135] Step S511: Construct the experience tuple. Store the global state and all local states, the actions of the selected node group and the actions of the selected nodes in the node group, the immediate reward, and the updated global and local states into the experience replay buffer D.
[0136] Step S512: Determine if the training conditions have been met. When the number of samples in the experience replay buffer reaches a preset threshold, perform a network update; otherwise, return to step S505 to continue collecting samples.
[0137] Step S513: Randomly sample a small batch of samples from the experience replay buffer, with a batch size of B=64.
[0138] Step S514: Calculate the target value and loss of the first critic network.
[0139] For each sample i, the optimal node group for the next state is selected using the first target actor network. Calculate the global TD target: ,in, As a discount factor, The value is 0.99; For the primary target of the commentator network; The immediate reward for sample i; For the next global state, It is the optimal node group given by the first target actor network based on the next global state.
[0140] The loss of the first commentator network is calculated based on the global TD objective:
[0141] ,
[0142] Where B is the batch size. For the first commentator network, This represents the current global state. The currently selected optimal group of nodes.
[0143] Minimize the first commenter network loss using the Adam optimizer with a learning rate of 0.001.
[0144] Step S515: Calculate the target value and loss for each second commentator.
[0145] For each node group, it participates in the calculation only if the node group was actually selected in sample i. The optimal node for the next state is selected using the second objective actor network, and the local TD objective is calculated: , For the second target of the critic network; The immediate reward for sample i; For the next local state, It is the optimal node given by the first target actor network based on the next local state.
[0146] Calculate the second commentator network loss based on the local TD target:
[0147] ;
[0148] For the first commentator network, This represents the current local state. The optimal node currently selected. This is an indicator function that indicates the selected group of nodes.
[0149] Minimize the network loss for each second commentator using the Adam optimizer.
[0150] Step S516: Calculate the policy gradient of the first actor network. Update the parameters of the first actor network along the gradient ascent direction using the Adam optimizer.
[0151] Step S517: Calculate the policy gradient for each second actor network. Update the parameters of each second actor network using the Adam optimizer.
[0152] Step S518: Soft update target network parameters. Update all target networks using the Polyak averaging method.
[0153] Step S519: Determine if the training termination condition has been met. If the first actor network and each of the second actor networks have converged (e.g., the average completion time no longer decreases over 100 consecutive rounds), or the maximum number of rounds has been reached, then training ends; otherwise, return to step S504 to continue to the next round.
[0154] The S600 deploys the converged network as a scheduling model, which is then used for scheduling decisions during application. The online scheduling execution process of the scheduling model includes:
[0155] Receive a new task, parse its directed acyclic graph structure, and obtain the workload, memory requirements, and output data volume of each subtask.
[0156] Initialize the state and determine the initial set of ready subtasks.
[0157] Check if the set of ready subtasks is empty. If it is not empty, continue; if it is empty, the scheduling is complete, and the scheduling result is output.
[0158] Select a subtask from the set of ready subtasks according to path priority.
[0159] Obtain the current global state and input it into the first target actor network to obtain the selected target node group.
[0160] The local state of the target node group is obtained, input into the corresponding second target actor network, and memory feasibility masking is applied to select the target node. Memory feasibility masking is used to verify whether the remaining memory of a node meets the memory requirements of the subtask when selecting a node. If the memory requirements of the subtask are not met, the node is masked from the action space to ensure that the scheduling model does not select nodes with insufficient memory.
[0161] Perform the assignment action, update the status and ready subtask set.
[0162] Record the scheduling decision and return to continue processing the next ready subtask.
[0163] Example 2
[0164] like Figure 5 As shown, this embodiment of the invention provides an IoT cloud-edge collaborative microservice scheduling device, comprising: at least one processing unit, the processing unit being connected to a storage unit via a bus unit, the storage unit serving as a computer-readable storage medium, which can be used to store software programs, computer-executable programs, and modules, such as the software program, computer-executable program, and module corresponding to the IoT cloud-edge collaborative microservice scheduling method in this embodiment of the invention. The processing unit implements the aforementioned IoT cloud-edge collaborative microservice scheduling method by running the software program, computer-executable program, and module stored in the storage unit.
[0165] Of course, the computer program stored in the memory of the IoT cloud-edge collaborative microservice scheduling device provided in the embodiments of the present invention is not limited to the method operation described above, but can also execute related operations in the IoT cloud-edge collaborative microservice scheduling method provided in any embodiment of the present invention.
[0166] Example 3
[0167] This invention provides a computer-readable storage medium storing a computer program, characterized in that, when the computer program is executed, it implements the IoT cloud-edge collaborative microservice scheduling method.
[0168] In the embodiments provided by this invention, it should be understood that the disclosed structures and methods can be implemented in other ways. For example, the structural embodiments described above are merely illustrative. For instance, the division of units is only a logical functional division, and in actual implementation, there may be other division methods. For example, multiple units or components may be combined or integrated into another system, or some features may be ignored or not executed. Furthermore, the coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interfaces, structures, or units, and may be electrical, mechanical, or other forms.
[0169] The units described as separate components may or may not be physically separate. The components shown as units may or may not be physical units; that is, they may be located in one place or distributed across multiple network units. Some or all of the units can be selected to achieve the purpose of this embodiment according to actual needs.
[0170] Furthermore, the functional units in the various embodiments of the present invention can be integrated into one processing unit, or each unit can exist physically separately, or two or more units can be integrated into one unit. The integrated unit can be implemented in hardware or as a software functional unit.
[0171] The above description is merely a specific embodiment of the present invention, enabling those skilled in the art to understand or implement the invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the general principles defined herein may be implemented in other embodiments without departing from the spirit or scope of the invention. Therefore, the present invention is not to be limited to the embodiments shown herein, but is to be accorded the widest scope consistent with the principles and novel features claimed herein.
Claims
1. A method for scheduling microservices in an IoT cloud-edge collaboration, characterized in that, include: The process of scheduling several microservices to perform tasks is modeled as a directed acyclic graph; A cloud-edge collaborative computing infrastructure model is constructed, which models all computing devices at the edge and in the cloud as computing nodes with set computing capabilities and memory capacity, groups the nodes and configures the bandwidth and communication latency between the node groups; Based on the aforementioned directed acyclic graph and cloud-edge collaborative computing infrastructure model, a scheduling optimization objective is constructed; The scheduling process is modeled as a Markov decision process, the scheduling process is divided into multiple decision steps, each decision step processes a ready subtask, a state space, an action space, a state transition function and a reward function are defined according to a scheduling optimization target; wherein the reward function is defined as: , in, For the current subtask Selected node Total time spent on A set positive reward is given when all subtasks of the entire task are scheduled to complete, in order to guide the scheme to complete the scheduling process of the entire task. Incentives for resource utilization efficiency are used to encourage efficient resource use. These are the weighting coefficients; A scheduling model for Markov decision processes is constructed and trained. The scheduling model includes a first scheduler and a second scheduler based on deep deterministic policy gradient. The first scheduler, based on deep deterministic policy gradient, selects the node group to execute the subtask according to the current global state and the subtask status. The second scheduler selects computing nodes from the nodes in the node group selected by the first scheduler according to the local state to achieve load balancing within the node group. The training process for the first and second schedulers includes the following steps: Step S501: Initialize all network parameters using the Xavier initialization method; initialize the target network parameters to be the same as the main network; Step S502: Initialize the experience playback buffer; Step S503: Start the training round loop. Each training round corresponds to a complete task scheduling process. In each round, the number of subtasks, computational requirements, memory requirements, and the amount of subtask output data are randomly generated from a preset distribution. The directed acyclic graph structure of the task is also randomly generated. Step S504: Initialize the current round, reset the state, generate an initial set of ready subtasks, and initialize the global and local states; Step S505: Determine if the ready subtask set is empty. If it is not empty, continue execution; if it is empty and all tasks have been scheduled, end the current round. Step S506: Select a subtask from the set of ready subtasks based on path priority; Step S507: The first scheduler selects the target node group, adding noise disturbance during the selection process; Step S508: The second scheduler selects the target node from the target node group, adding noise disturbance during the selection process; Step S509: Perform the allocation action, assign the subtask to the selected node, update the node's remaining memory and task queue; according to the subtask's dependency, add the tasks whose predecessor tasks in its downstream tasks have been scheduled to the ready subtask set. Step S510: Calculate the immediate reward based on the reward function; Step S511: Construct an experience tuple and store it in the experience replay buffer. The experience tuple includes the global state and all local states, the action of selecting a node group and the action of selecting a node in the node group, the immediate reward, and the updated global and local states. Step S512: Determine whether the training conditions have been met. If the number of samples in the experience replay buffer reaches the preset threshold, perform network update; otherwise, return to step S505 to continue collecting samples. Step S513: Randomly sample a batch of samples from the experience replay buffer; Step S514: Calculate the target value and loss of the first critic network using the samples; minimize the loss of the first critic network using the Adam optimizer; Step S515: Calculate the target value and loss for each second critic using the samples, and minimize the network loss for each second critic using the Adam optimizer; Step S516: Calculate the policy gradient of the first actor network and update the parameters of the first actor network along the gradient ascent direction using the Adam optimizer. Step S517: Calculate the policy gradient of each second actor network and update the parameters of each second actor network using the Adam optimizer; Step S518: Soft update target network parameters; Step S519: If the first actor network and each of the second actor networks have converged, or the maximum number of rounds has been reached, then the training ends; otherwise, return to step S504 to continue to the next round. When applied, scheduling decisions are made using a scheduling model.
2. The IoT cloud-edge collaborative microservice scheduling method according to claim 1, characterized in that, The task of scheduling several microservices is modeled as a directed acyclic graph (DAG). The vertices of the DAG represent the subtasks implemented by several microservices, and the directed edges between subtasks represent the dependencies between them. The downstream subtask can only be executed after the upstream subtask is completed. The vertex configuration of the DAG determines the attributes of how to schedule the subtasks, including: the workload of the subtask, the memory requirements of the subtask, and the amount of output data of the subtask.
3. The IoT cloud-edge collaborative microservice scheduling method according to claim 1, characterized in that, The edge is further divided into several sub-edges, or the cloud is further divided into several sub-clouds, wherein the node group is any one of the cloud, the sub-clouds subdivided by the cloud, the edge, or the sub-edges subdivided by the edge.
4. The IoT cloud-edge collaborative microservice scheduling method according to claim 1, characterized in that, The goal of scheduling optimization is to obtain a scheduling scheme that minimizes the total task time under allocation constraints. The allocation constraints include: during the entire task processing, each subtask of the directed acyclic graph is assigned to any node of the cloud-edge collaborative computing infrastructure model once, and during allocation, the memory requirements of the subtask must not exceed the memory capacity of the assigned node.
5. The IoT cloud-edge collaborative microservice scheduling method according to claim 4, characterized in that, The calculation process for the total task time is as follows: Based on the workload of subtask v The computation time is determined by the computational power of node n: ; in, The computational power of node n; If subtask v requires communication for processing, the communication time is the maximum transmission time for receiving the task output data from all upstream subtasks: ; in, For the upstream subtask of subtask v, For subtask v, the set of upstream subtasks. For the upstream subtask of subtask v The amount of data output by the task. Indicates from upstream subtask The node group in which the node is located To the node group where node v is located Bandwidth between; Indicates from upstream subtask The node group in which the node is located To the node group where node v is located Delay between; The total time spent by subtask v on node n is: ; Considering the total time of all upstream subtasks, the total time of subtask v is: ; in, This represents the maximum total time taken by all upstream subtasks of subtask v. For an entry subtask with no upstream subtasks, ; For the entire task, the total time taken is the maximum of the total times taken by all subtasks, that is: 。 6. The IoT cloud-edge collaborative microservice scheduling method according to claim 1, characterized in that, The state space of a Markov decision process includes a global state space and a local state space. The global state space contains the feature vector of the currently ready subtask set, the aggregation information of each node group, and the global dependency information. The feature vector of the currently ready subtask set in the global state includes: the computational workload, memory requirements, output data size, and number of ready subtasks for all ready subtasks. The aggregation information of each node group includes: the average computational load of each node group, the aggregate available memory capacity, and the average communication latency from each node group to all upstream subtasks. The global dependency information includes: the number of unscheduled subtasks in the task and the estimated remaining execution time. The estimated remaining execution time is estimated by calculating the longest path length from the currently unscheduled node to the exit node in the directed acyclic graph. The local state of each node group includes the remaining memory, computing power, and expected available time of each computing node in the node group; the expected available time of each computing node in the local state is determined by estimating the time point when the node completes all currently assigned tasks based on the sum of the computing time and communication time of all tasks in the node's current task queue.
7. The IoT cloud-edge collaborative microservice scheduling method according to claim 1, characterized in that, The state transition process includes the following updates: Resource status update: Assign subtasks to nodes, and the remaining memory of the nodes reduces the memory requirement of the subtasks; A subtask is added to the node's task queue, and its expected completion time is updated to the node's current completion time plus the total time spent by the subtask on the node. Ready Subtask Set Update: Removes subtasks assigned to the current ready subtask set and adds new ready subtasks; State variable update: Calculate new aggregate statistics based on the updated node resources to form a new global state, and form a new local state based on the updated node resources.
8. The IoT cloud-edge collaborative microservice scheduling method according to claim 1, characterized in that, Selecting a subtask based on path priority includes: calculating the path length of each ready subtask, selecting the longest path length from that subtask to the exit subtask, and prioritizing the scheduling of the subtask with the longest path length.