A highly scalable serverless computing distributed scheduling system
By introducing a local-centralized distributed scheduling method into the serverless computing platform, combining local probing and global scheduling, the scheduling latency and bottleneck issues of the existing platform under high concurrency requests are solved, achieving efficient instance scheduling and scaling capabilities, and supporting the migration of larger-scale applications.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- TIANJIN UNIV
- Filing Date
- 2022-09-26
- Publication Date
- 2026-06-30
AI Technical Summary
Existing serverless computing platforms suffer from high latency in centralized schedulers and head-of-line blocking and communication delays in distributed schedulers when handling sudden high concurrency requests, resulting in low scheduling efficiency and difficulty in meeting the response time requirements of latency-sensitive applications.
A distributed scheduling method combining local and centralized scheduling is adopted. The local scheduler prioritizes the processing of requests and forwards them to the global scheduler when necessary. By combining local probing and global scheduling, the priority queue uses the shortest remaining time algorithm to reduce scheduling latency and bottleneck issues.
It significantly improves the instance scheduling throughput and elastic scaling capability of the serverless platform, reducing scheduling latency by 45 times and increasing scheduling throughput by 1.6 times. It supports scheduling 100,000 function instances per second, promoting the migration of more application scenarios to serverless computing.
Smart Images

Figure HDA0003864810100000011 
Figure HDA0003864810100000012
Abstract
Description
Technical fields:
[0001] This invention patent belongs to the field of cloud computing technology, and in particular relates to a highly scalable serverless computing distributed scheduling system. Background technology:
[0002] With the maturation of serverless computing, more types of applications, such as latency-sensitive social networks and e-commerce applications, are expected to migrate to serverless platforms in the future. This poses a more severe challenge to the platform's elastic scaling capabilities. Due to the unpredictability and burstiness of user workloads, and the platform's one-to-one mapping strategy—each function call request is scheduled to a separate function instance—to achieve the promised elastic scaling capabilities, the platform must be able to launch multiple concurrent function instances in real time to handle user requests. For example, publicly available user data from Microsoft shows that some functions are called at a rate of more than 3,000 times per second, resulting in the concurrent launch of thousands of instances. Furthermore, data from the world's leading video-sharing social service platform shows that some functions even receive tens of millions of concurrent requests per second.
[0003] Elastic scaling is divided into two phases: instance scheduling and cold start. Existing work has mitigated the cold start problem by optimizing instance startup overhead, but scalable instance scheduling techniques have not yet been explored. Current open-source serverless platforms, such as OpenFaaS and OpenWhisk, still rely on centralized schedulers for instance scheduling. OpenFaaS can only schedule about 30 instances with a 99th percentile latency of less than 100 milliseconds, and scheduling 2000 function instances takes tens of seconds, which is far from meeting the requirement of latency-sensitive applications with response times within 100 milliseconds.
[0004] Existing research on alternatives to centralized schedulers mainly includes: (1) Distributed schedulers, where multiple parallel schedulers without a global view of the cluster rely solely on probes for scheduling. Requests are only forwarded by the scheduler after a worker node responds to the probe. However, this method suffers from low scheduling efficiency due to head-of-line blocking and high communication latency between the scheduler and nodes. (2) Bottom-up schedulers, where the cluster has a global scheduler and a local scheduler on each node. Requests are forwarded directly to the local scheduler and, when local resources are insufficient, are forwarded to the global scheduler for optimal decision-making. However, under sudden workloads, the global scheduler can easily become a bottleneck due to frequent failures of the local scheduler, leading to low scheduling efficiency. Summary of the Invention:
[0005] To address the problems of existing technologies, this invention provides a highly scalable serverless computing distributed scheduling system. It employs a distributed scheduling method combining "local-centralized" and local probing. The local probing phase increases the probability of a request being successfully scheduled on the local scheduler, and a global scheduler is used to avoid repeated probing after a local probing failure. Specifically, it adopts a "local-centralized" technique that prioritizes local scheduling and forwards requests to the global scheduler when necessary, while also mitigating the bottleneck of global scheduling through local probing. This distributed scheduling technology comprises three phases: local inspection, local probing, and global scheduling.
[0006] The present invention solves its practical problem by adopting the following technical solution:
[0007] A highly scalable serverless computing distributed scheduling system includes a gateway, a central layer, and a local layer; the central layer further includes an automatic scaling engine and a global scheduler; the local layer includes a local scheduler on each node; wherein:
[0008] The automatic scaling engine is used to process user function call requests issued by the gateway and determine whether to generate a scaling task.
[0009] The local scheduler will receive the expansion task and interact with it according to the local inspection phase, the local probing phase, and the global scheduling phase.
[0010] The global scheduler issues an expansion task during the global scheduling phase, and finds the optimal node among all nodes in the cluster to schedule the task.
[0011] Furthermore, the local scheduler includes a registration repository unit, a priority queue unit, and a task processing unit; wherein:
[0012] The registered repository unit is used to store the function configuration files, instance configurations, and resource usage status of local nodes.
[0013] The priority queue unit selects tasks to be executed first based on the shortest remaining time of the expansion task; the queues inside the priority queue unit are implemented using a heap data structure, and the queues use the shortest remaining time algorithm to determine the priority of the tasks.
[0014] The task processing unit is used to check the relationship between the resources of this node and the resources required for the expansion task. The task processing unit implements an interface for receiving and processing expansion tasks, so as to receive task requests sent from the automatic scaling engine, other local schedulers or global schedulers.
[0015] Furthermore, the local scheduler will receive the expansion task and perform an interactive process based on the local inspection phase, the local probing phase, and the global scheduling phase;
[0016] During the local inspection phase: the priority queue unit selects to execute based on the shortest remaining time of the expansion task issued by the automatic scaling engine; when the expansion task is retrieved from the priority queue, the task processing unit checks the relationship between the resources required by the task and the remaining resources of the local node to determine whether to complete the task scheduling on the local node.
[0017] If the remaining resources on this node are greater than the resources required for the expansion task, the expansion task will be scheduled on this node, and the scheduling process will end; if the remaining resources on this node are insufficient, the scheduling process will enter the local probing phase.
[0018] During the local probing phase: the task processing unit will randomly select two other local schedulers and send a probe request to determine whether there are sufficient resources to complete the expansion task; the two local schedulers that receive the request will check whether the resources of their node are sufficient through their task processing unit and provide "success" or "failure" feedback to the local scheduler that issued the request; the expansion task will be scheduled to the node where the first local scheduler that responded with "success" is located, ending the entire scheduling process; if the local schedulers within the two nodes that made the request both respond with "failure", the scheduling process will enter the global scheduling phase;
[0019] During the global scheduling phase: when both query requests issued during the local probing phase return "failure", the local scheduler will forward the expansion task to the global scheduler; the global scheduler will select the priority task to execute based on the shortest remaining time of the expansion task; when the expansion task is taken out from the priority queue, the global scheduler will find the optimal node among all nodes in the cluster to schedule the task, and the entire scheduling process ends.
[0020] Furthermore, the automatic scaling engine includes a request receiving unit and a calculation unit; wherein:
[0021] The request receiving unit is used to receive user requests sent by the gateway.
[0022] The computing unit determines whether a scaling task is needed based on the number of requests sent by the user and the number of existing instances in the cluster.
[0023] Furthermore, the global scheduler includes a priority queue unit and a logic processing unit that interacts with the local scheduling module.
[0024] Beneficial effects:
[0025] To reduce scheduling latency under sudden workloads, previous studies have proposed distributed scheduling methods based on probing by multiple parallel schedulers and bottom-up scheduling methods that use a central mechanism to compensate for local scheduling failures. However, distributed schedulers suffer from head-of-line blocking and communication overhead, while bottom-up schedulers still face centralized bottlenecks. To overcome the shortcomings of these solutions, this invention combines "local-centralized" scheduling with highly scalable distributed instance scheduling technology based on local probing, significantly improving the instance scheduling throughput and elastic scaling capabilities of serverless platforms.
[0026] Compared to existing work, this invention identifies and avoids their bottlenecks: By employing local probing techniques, this invention increases the probability of successful scheduling on the local scheduler, avoiding the centralized scheduling bottleneck caused by frequent local scheduling failures and the accumulation of requests to the global scheduler under resource constraints. Simultaneously, by introducing a global scheduler, this invention avoids the continuous reprobing required by distributed scheduling methods due to probe failures, significantly improving the overall scheduling throughput efficiency of the platform.
[0027] Compared to existing work, this invention has significant advantages. Compared to OpenFaaS, an open-source serverless computing platform that uses centralized scheduling, this invention reduces scheduling latency by 45 times (99%). Under the same experimental conditions, this invention can schedule over 100,000 instances per second, increasing scheduling throughput by 1.6 times and 1.0 times compared to distributed scheduling methods and bottom-up scheduling methods, respectively.
[0028] In this invention, a time-sensitive priority queue is added to both the local and global schedulers. This queue uses a shortest remaining time first algorithm to adjust the scheduling order, thereby reducing tail latency. Results show that this invention supports scheduling over 100,000 function instances within seconds, facilitating the migration of more application scenarios to serverless computing. Attached image description:
[0029] Figure 1 This is a schematic diagram of a highly scalable serverless computing distributed scheduling system according to the present invention.
[0030] Figure 2 This is a schematic diagram of the interaction process of a highly scalable serverless computing distributed scheduling system according to the present invention. Detailed Implementation
[0031] This invention proposes a highly scalable serverless computing distributed scheduling system and method, which are described below in conjunction with... Figure 1 The implementation process of this invention patent will be described in further detail.
[0032] System Architecture
[0033] The core of this invention is that by using a collaborative scheduling method between a local scheduler and a global scheduler, the scheduling latency of instance scaling tasks in high-concurrency call scenarios can be significantly reduced, thereby improving the elastic scaling capability of serverless platforms. Figure 1 The overall architecture of this invention is described. It adopts a "local first, then central" architecture, with a local scheduler added to each cluster node and an automatic scaling engine introduced at the central layer. The scheduling process of an instance scaling task in this invention is divided into three stages: (1) local check stage, (2) local probe stage, and (3) global scheduling stage. The instance scaling task is first forwarded to the local scheduler for the local check stage, then a local probe stage is performed when local resources are insufficient, and finally, the global scheduling stage is initiated when the probe fails.
[0034] Specifically, when a user's function call request reaches the gateway, the gateway forwards it to the auto-scaling engine. The auto-scaling engine's request receiving unit accepts the request and, in its computation unit, determines whether an instance scaling task is needed based on the number of user requests and the number of existing instances in the cluster. If needed, the scaling task is randomly forwarded to a local scheduler. The task enters the local scheduler's priority queue unit. Within the queue, the task processing order is determined using a shortest remaining time first algorithm. When the request is retrieved from the head of the queue, the scheduling process enters the local check phase. The local scheduler's task processing unit checks the relationship between the resources required by the scaling task and the resources of its local node. If the local node has sufficient resources, the scheduling is completed. Otherwise, the scheduling enters the local probing phase. Figure 2 The local probing process of this invention is described. The local scheduler selects two other nodes and sends probing requests to their respective local schedulers. The task processing unit in the local scheduler of the probed node checks the relationship between its own resources and the resources required by the expansion task, and responds with "success" or "failure." The task is then scheduled to the node that first responds with "success," and the scheduling process is completed. If both local schedulers respond with "failure," the expansion task is forwarded to the priority queue of the global scheduler. The global scheduler selects the request with the shortest remaining time in the priority queue and makes a scheduling decision based on the resource status information of the entire cluster, thus completing the task scheduling.
[0035] 1. System Implementation
[0036] This invention is implemented in OpenFaaS, an open-source serverless computing platform built on Kubernetes. This invention follows the overall architecture of OpenFaaS, primarily modifying the global scheduler in the central layer and adding an automatic scaling engine module to the platform's central layer, as well as a local scheduler module on each cluster node in the platform's local layer.
[0037] Local Scheduler: The local scheduler is implemented as an independent Kubernetes daemon on each node of the cluster. It includes: (1) Registration Repository Unit: used to store the function configuration files, instance configurations and resource usage status of the local node to achieve efficient and reasonable scheduling decisions. (2) Priority Queue Unit: the queue inside the unit is implemented using a heap data structure, and the queue uses the shortest remaining time algorithm to determine the priority of tasks. (3) Task Processing Unit: this unit is used to check the size relationship between the resources of the local node and the resources required for the scaling task. At the same time, this unit implements the interface for receiving and processing scaling tasks to receive task requests sent from the auto-scaling engine, other local schedulers or the global scheduler. When the instance scaling task scheduling process is completed, this unit sets the NodeName attribute in the instance Pod's YAML file and directly creates a Kubernetes pod locally to complete the instance scaling.
[0038] Automatic scaling engine: This invention adds an automatic scaling engine module to the platform's central layer. This includes: (1) a request receiving unit: This unit implements an interface for receiving gateway requests and can receive user requests forwarded by the gateway. (2) a calculation unit: This unit determines whether instance scaling is needed based on the number of user requests and the number of existing instances in the cluster. Furthermore, this unit stores a list of IP addresses for all nodes in the cluster. When it determines that instance scaling is necessary, it forwards the instance scaling task to a random local scheduler via IP address to begin the scheduling process.
[0039] Global Scheduler: This invention modifies the default global scheduler of the serverless platform OpenFaaS by adding a priority queue unit. The queue uses a short remaining time algorithm to determine task priority. The global scheduler retrieves the task with the shortest remaining time from the queue and schedules instance scaling tasks based on the cluster's global resource view.
[0040] This invention is not limited to the embodiments described above. The above description of specific embodiments is intended to illustrate and explain the technical solutions of this invention. The specific embodiments described above are merely illustrative and not restrictive. Without departing from the spirit and scope of the claims, those skilled in the art can make many specific modifications based on the teachings of this invention, and these modifications all fall within the scope of protection of this invention.
Claims
1. A highly scalable serverless computing distributed scheduling system, the scheduling system comprising a gateway, a central layer, and a local layer; characterized in that: The central layer also includes an automatic scaling engine and a global scheduler; the local layer includes a local scheduler on each node; the local scheduler includes a registration repository unit, a priority queue unit, and a task processing unit; wherein: The automatic scaling engine is used to process user function call requests issued by the gateway and determine whether to generate a scaling task. The local scheduler will receive the expansion task and interact with it through a local inspection phase, a local probing phase, and a global scheduling phase; including: During the local inspection phase: the priority queue unit selects to execute based on the shortest remaining time of the expansion task issued by the automatic scaling engine; when the expansion task is retrieved from the priority queue, the task processing unit checks the relationship between the resources required by the task and the remaining resources of the local node to determine whether to complete the task scheduling on the local node. If the remaining resources on this node are greater than the resources required for the expansion task, the expansion task will be scheduled on this node, and the scheduling process will end; if the remaining resources on this node are insufficient, the scheduling process will enter the local probing phase. During the local probing phase: the task processing unit will randomly select two other local schedulers and send a probe request to determine whether there are sufficient resources to complete the expansion task; the two local schedulers that receive the query request will check whether the resources of their node are sufficient through their task processing unit and provide "success" or "failure" feedback to the querying local scheduler; the expansion task will be scheduled to the node where the first local scheduler that provides "success" feedback is located, ending the entire scheduling process; if the local schedulers within the two querying nodes both provide "failure" feedback, the scheduling process will enter the global scheduling phase; During the global scheduling phase: When both query requests issued during the local probing phase return "failure," the local scheduler forwards the expansion task to the global scheduler. The global scheduler selects the priority task to execute based on the shortest remaining time of the expansion task. When the expansion task is retrieved from the priority queue, the global scheduler finds the optimal node among all nodes in the cluster to schedule the task, and the entire scheduling process ends. The global scheduler issues an expansion task during the global scheduling phase, and finds the optimal node among all nodes in the cluster to schedule the task.
2. The highly scalable serverless computing distributed scheduling system according to claim 1, characterized in that: The registered repository unit is used to store the function configuration files, instance configurations, and resource usage status of local nodes. The priority queue unit selects tasks to be executed first based on the shortest remaining time of the expansion task; the queues inside the priority queue unit are implemented using a heap data structure, and the queues use the shortest remaining time algorithm to determine the priority of the tasks. The task processing unit is used to check the relationship between the resources of this node and the resources required for the expansion task. The task processing unit implements an interface for receiving and processing expansion tasks, so as to receive task requests sent from the automatic scaling engine, other local schedulers or global schedulers.
3. A highly scalable serverless computing distributed scheduling system according to any one of claims 1-2, characterized in that: The automatic scaling engine includes a request receiving unit and a calculation unit; wherein: The request receiving unit is used to receive user requests sent by the gateway. The computing unit determines whether a scaling task is needed based on the number of requests sent by the user and the number of existing instances in the cluster.
4. The highly scalable serverless computing distributed scheduling system according to claim 1, characterized in that: The global scheduler includes a priority queue unit and a logic processing unit that interacts with the local scheduling module.