A cluster task scheduling method, system and readable storage medium

By using priority sorting and a reserved resource list, the problems of task blocking and priority guarantee are solved, achieving efficient cluster resource utilization and avoiding resource waste.

CN115033354BActive Publication Date: 2026-06-16GUANGZHOU WERIDE TECH LTD CO

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
GUANGZHOU WERIDE TECH LTD CO
Filing Date
2022-03-01
Publication Date
2026-06-16

Smart Images

  • Figure CN115033354B_ABST
    Figure CN115033354B_ABST
Patent Text Reader

Abstract

The application relates to the technical field of cluster resource scheduling, in particular to a cluster task scheduling method and system and a readable storage medium. The method comprises the following steps: performing priority sorting on a received task to generate a priority queue; acquiring an available resource list of a current scheduling period of a cluster and initializing a reserved resource list; acquiring a current task from the priority queue in a high-priority manner, and performing resource scheduling on the acquired current task according to the available resource list; if the scheduling fails, moving the current task out of the priority queue, reserving resources in the reserved resource list, and then continuing the resource scheduling; after tasks with the same priority as the current task are all scheduled, updating the available resource list according to the reserved resource list; and when the priority queue or the available resource list is empty, exiting the current scheduling period. The application solves the technical problem that a cluster resource is wasted due to the fact that a task blocking and a priority guarantee mechanism cannot be satisfied simultaneously.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the technical field of cluster resource scheduling, and in particular to a cluster task scheduling method, system, and readable storage medium. Background Technology

[0002] A cluster typically consists of several master nodes and worker nodes. A cluster can integrate host machine resources and manage resources such as network, storage, CPU, and memory. Currently, in industry standards, containers are usually built within lightweight and highly scalable container scheduling clusters. The cluster resource scheduler needs to schedule user-submitted computing tasks to run on several nodes within the cluster, based on the cluster's resource availability and the task's requested resources.

[0003] In current clusters, there are many types of tasks, such as deep learning training tasks, data ETL tasks, and high-precision map creation tasks. Cluster resource scheduling has always been a problem that distributed systems need to solve. Different solutions are often used for different business scenarios, and tasks have priorities. Existing scheduling strategies are mainly divided into first-come, first-served (FIFO) and fair scheduling. The drawbacks of these two solutions are: 1. FIFO: For two tasks with the same priority, the earlier task blocks the later task. 2. Fair scheduling: Does not support priority scheduling.

[0004] In the process of developing this application, the inventors discovered that the prior art has at least the following problems: it cannot simultaneously solve the problems of task blocking and guaranteeing task priority mechanisms, resulting in a waste of cluster resources. Summary of the Invention

[0005] To address this issue, embodiments of this application provide a cluster task scheduling method, system, readable storage medium, computer device, and storage medium, which can solve the technical problem of wasting cluster resources due to the inability to simultaneously satisfy task blocking and task priority guarantee mechanisms. The specific technical solution is as follows:

[0006] In a first aspect, embodiments of this application provide a cluster task scheduling method, the method comprising:

[0007] Prioritize the received tasks and generate a priority queue;

[0008] Get the list of available resources for the current scheduling period of the cluster, and initialize the list of reserved resources;

[0009] The current task is retrieved from the priority queue in order of highest priority, and resource scheduling is performed on the retrieved current task according to the list of available resources;

[0010] If scheduling fails, the current task is removed from the priority queue, and available resources are reserved in the reserved resource list before resource scheduling continues. After all tasks with the same priority as the current task have been scheduled, the available resource list is updated according to the reserved resource list.

[0011] Exit the current scheduling cycle when the priority queue or available resource list is empty.

[0012] Preferably, after all tasks with the same priority as the current task have been scheduled, updating the available resource list according to the reserved resource list includes:

[0013] If the current task is the last task of its corresponding priority, then the available resource list is updated according to the reserved resource list.

[0014] Preferably, the priority queue is divided into multiple scheduling sub-queues according to the priority level, and the reserved resource list is divided into reserved sub-lists corresponding one-to-one with the scheduling sub-queues according to the priority level.

[0015] If the current task is the last task of its corresponding priority, then updating the available resource list according to the reserved resource list includes:

[0016] If the current task is the last task in the corresponding scheduling subqueue, then all resources in the reserved subqueue corresponding to the current scheduling subqueue are subtracted from the available resource list, and the available resource list is updated.

[0017] Preferably, a backoff queue is preset, which holds tasks that fail to be scheduled in the current scheduling period and / or tasks added after the start of the current scheduling period.

[0018] Preferably, the placed tasks are sorted in the backoff queue;

[0019] The step of exiting the current scheduling cycle when the priority queue or available resource list is empty also includes:

[0020] The priority queue is replaced by the backoff queue of the current scheduling cycle, and the next scheduling cycle begins.

[0021] Preferably, the step of prioritizing the received tasks and generating a priority queue is as follows:

[0022] Tasks with the same priority are sorted by task creation time, and tasks with different priorities are sorted by priority from highest to lowest, generating a priority queue.

[0023] Preferably, if scheduling fails, removing the current task from the priority queue and reserving available resources in the reserved resource list before continuing resource scheduling includes:

[0024] Identify the resource types required for the current task, match the required resource types with the resource types in the available resource list, and reserve the resources in the reserved resource list that successfully match the types in the available resource list.

[0025] Preferably, the method further includes:

[0026] If the list of available resources is not empty and the priority queue is not empty, then resource scheduling continues.

[0027] Secondly, embodiments of this application provide a cluster task scheduling system, the system comprising:

[0028] The sorting module is used to prioritize received tasks and generate a priority queue.

[0029] The list update module is used to obtain the list of available resources in the current scheduling cycle of the cluster and initialize the list of reserved resources;

[0030] The resource scheduling module is used to retrieve the current task from the priority queue according to the highest priority, and to perform resource scheduling on the retrieved current task according to the available resource list.

[0031] The list update module is also used to remove the current task from the priority queue if scheduling fails, and to reserve available resources in the reserved resource list before continuing resource scheduling; after all tasks with the same priority as the current task have been scheduled, the available resource list is updated according to the reserved resource list.

[0032] The control module is used to exit the current scheduling cycle when the priority queue or the list of available resources is empty.

[0033] Thirdly, embodiments of this application provide a computer-readable storage medium storing a computer program that, when executed by a processor, implements the steps of the cluster task scheduling method described in any of the preceding claims.

[0034] In summary, compared with the prior art, the beneficial effects of the technical solution provided in this application include at least the following:

[0035] Tasks are scheduled according to priority, and a reserved resource list is set up to reserve resources for tasks that fail to be scheduled. Tasks that fail to be scheduled are removed from the priority queue and the next task is executed. For tasks of the same priority, the available resource list is updated only after all tasks have been scheduled, based on the reserved resource list. Priority scheduling is satisfied to prevent low-priority tasks from occupying the resources needed by high-priority tasks that fail to be scheduled. It also prevents tasks of the same priority from being blocked by subsequent tasks due to the failure of the previous task. This solves the problem of cluster resource waste caused by task blocking and the task priority guarantee mechanism. Attached Figure Description

[0036] Figure 1 This is a flowchart illustrating a cluster task scheduling method provided in one embodiment of this application.

[0037] Figure 2 This is one of the flowcharts of a cluster task scheduling method provided in another embodiment of this application.

[0038] Figure 3 This is a second flowchart illustrating a cluster task scheduling method provided in another embodiment of this application.

[0039] Figure 4 This is the third flowchart of a cluster task scheduling method provided in another embodiment of this application.

[0040] Figure 5 This is the fourth flowchart of a cluster task scheduling method provided in another embodiment of this application.

[0041] Figure 6 This is the fifth flowchart of a cluster task scheduling method provided in another embodiment of this application. Detailed Implementation

[0042] This specific embodiment is merely an explanation of this application and is not intended to limit it. After reading this specification, those skilled in the art can make modifications to this embodiment without contributing any inventive step, but such modifications are protected by patent law as long as they fall within the scope of the claims of this application.

[0043] To make the objectives, technical solutions, and advantages of the embodiments of this application clearer, the technical solutions of the embodiments of this application will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of this application, not all embodiments. Based on the embodiments of this application, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of this application.

[0044] Furthermore, the term "and / or" in this application is merely a description of the relationship between related objects, indicating that three relationships can exist. For example, A and / or B can represent: A existing alone, A and B existing simultaneously, or B existing alone. Additionally, the character " / " in this application, unless otherwise specified, generally indicates that the preceding and following related objects have an "or" relationship.

[0045] The embodiments of this application will now be described in further detail with reference to the accompanying drawings.

[0046] Reference Figure 1 In one embodiment of this application, a cluster task scheduling method is provided, the main steps of which are described below:

[0047] S1: Prioritize the received tasks and generate a priority queue;

[0048] S2: Obtain the list of available resources for the current scheduling period of the cluster and initialize the list of reserved resources;

[0049] S3: Retrieve the current task from the priority queue according to the highest priority, and schedule resources for the retrieved current task according to the available resource list;

[0050] S4: If scheduling fails, remove the current task from the priority queue, reserve available resources in the reserved resource list, and continue resource scheduling; after all tasks with the same priority as the current task have been scheduled, update the available resource list according to the reserved resource list.

[0051] S5: Exit the current scheduling cycle when the priority queue or available resource list is empty.

[0052] Specifically, in this embodiment, tasks are assigned different priorities. For tasks with the same priority, they are arranged according to the size of the main resources required to complete the task, with the larger the required main resources, the later they are ranked.

[0053] For example, in one embodiment of this example, priorities include production priority, automation task priority, regular priority, and experimental priority, with production priority > automation task priority > regular priority > experimental priority. The priority queue is sorted according to the priority of the task itself. Task 1 and Task 2 are both experimental priority, while Task 3 and Task 4 are both production priority. In this resource scheduling, GPU resources are the primary resource. First, they are sorted according to priority size, that is, the priority of Task 3 and Task 4 is higher than the priority of Task 1 and Task 2. Task 3 requires 2 GPU resources, and Task 4 requires 1 GPU resource, so Task 4 is sorted before Task 3. Task 1 requires 3 GPU resources, and Task 2 requires 1 GPU resource, so Task 2 is sorted before Task 1. Therefore, in this example, the priority queue is sorted as: Task 4 > Task 3 > Task 2 > Task 1. In other examples of this application, the priority queue of tasks with the same priority can be sorted according to task creation time, total resource usage, etc., which will not be elaborated here. In other embodiments of this application, the priority setting method can be based on other setting logic, which is not limited here.

[0054] Specifically, in this embodiment, at the start of the current scheduling period, the available resources of the device are acquired in real time to form an available resource list. In cluster applications, the available resource list contains the current available resource types and sizes of each available node. At the beginning of the current scheduling period, the reserved resource list is initialized to be empty.

[0055] After the current scheduling period begins, the priority queue retrieves tasks from the priority queue according to the principle of high priority first, and performs resource scheduling according to the list of available resources.

[0056] If task scheduling is successful, the resources already called are subtracted from the available resource list, and the next task is obtained according to the priority queue for resource scheduling. If task scheduling fails, in this embodiment, the reason for task scheduling failure is defined as insufficient available resources. The current task is removed from the priority queue, and the resources required by the current task are recorded in the reserved resource list. If there are still tasks with the same priority, resource scheduling continues until all tasks with the same priority are scheduled. Then, the reserved resources in the reserved resource list are subtracted from the available resource list.

[0057] When the priority queue is empty, it means that all tasks that need to be scheduled have been scheduled, the current scheduling cycle ends, and the next scheduling cycle begins; if the available resource list is empty, it means that there are no resources available for scheduling in the current scheduling cycle, the current scheduling cycle ends, and the next scheduling cycle begins.

[0058] Optionally, at the start of the scheduling period, it can be determined in advance whether the list of available resources or the priority queue is empty. If so, the current scheduling period can be ended directly.

[0059] With the setup of this application, resources are scheduled according to priority within a scheduling cycle. For tasks of the same priority, if the first task fails to schedule, resources are reserved in the reserved resource list, and then the scheduling of the second task of the same priority continues. If the second task also fails to schedule, the resources required by the second task are reserved in the reserved resource list, and the scheduling of the third task of the same priority continues. If the third task is successfully scheduled, the current available resource list is updated, and the resources scheduled for the third task are deducted from the current available resource list. If the current priority task is scheduled, the resources reserved in the reserved resource list are deducted from the updated available resource list. The failure of a first task of the same priority will not affect the scheduling of subsequent tasks. After a task scheduling failure, resources in the available resource list are reserved through the reserved resource list, allowing high-priority tasks to be executed first. Even if a high-priority task is not successfully scheduled within a scheduling cycle, a low-priority task cannot occupy the resources required by the high-priority task. This allows resources to be accumulated in time after the occupied resources are released to meet the needs of the high-priority task and execute the high-priority task first.

[0060] Reference Figure 2 Optionally, in another embodiment, S4 includes:

[0061] S41: If scheduling fails, remove the current task from the priority queue, reserve available resources in the reserved resource list, and continue resource scheduling;

[0062] S42: If the current task is the last task of the corresponding priority, then update the available resource list according to the reserved resource list.

[0063] Specifically, in this embodiment, when prioritizing tasks, the number of tasks of each priority within the current scheduling period is recorded. When scheduling resources, the number of tasks scheduled for the current priority is counted. If it is the last task of the current priority, it means that the tasks that need to be scheduled for the current priority within the current scheduling period have been scheduled, and the available resource list can be updated according to the reserved resource list. In other embodiments of this application, the methods for calculating that the current task is the last task of its priority in the current scheduling period may include: labeling the last task of each priority after prioritizing, or searching the current priority queue after scheduling the current task to determine whether there are any other tasks with the same priority, etc., which will not be elaborated here.

[0064] The implementation of this application sets the operation of updating the available resource list based on the reserved resource list to be triggered by the last task, simplifying the available resource list update logic. During the update, it is only necessary to determine whether the current task is the last task corresponding to the current priority. In this implementation, after updating the available resource list based on the reserved resource list, the reserved resource list is initialized.

[0065] In this implementation, the available resource list is only updated after all tasks corresponding to a certain priority have been scheduled successfully, provided that some tasks at that priority have failed to be scheduled. If all tasks corresponding to a certain priority are scheduled successfully, there is no need to update the available resource list based on the reserved resource list after all tasks at that priority have been scheduled. This implementation eliminates the need to update the available resource list based on the reserved resource list when all tasks corresponding to a certain priority have been scheduled successfully, thus saving time and effort in the process.

[0066] Reference Figure 3 Optionally, in another embodiment, step S1 is S1': sorting tasks of the same priority according to their creation time, and sorting tasks of different priorities according to their priority level, to generate a priority queue.

[0067] Sort tasks of the same priority according to their creation time to simplify the sorting logic.

[0068] Reference Figure 4Optionally, in another embodiment, the priority queue is divided into multiple scheduling sub-queues according to the priority level, and the reserved resource list is divided into a reserved sub-list corresponding to each scheduling sub-queue according to the priority level.

[0069] S42 is S42': If the current task is the last task in the corresponding scheduling subqueue, then all resources in the reserved subqueue corresponding to the current scheduling subqueue are subtracted from the available resource list, and the available resource list is updated.

[0070] In this embodiment, the priority queue is divided into multiple levels based on the number of priority levels. Each level in the priority queue is a scheduling sub-queue corresponding to that priority level, and tasks within the scheduling sub-queue are then sorted. In this embodiment, the hierarchical division of the priority queue is implemented using a table. Other implementation methods may be used in other embodiments of this application, which will not be elaborated here. Hierarchical division of the priority queue facilitates the differentiation of tasks with different priorities. The priority queue is formed by combining multiple scheduling sub-queues according to their priorities.

[0071] The reserved resource list is divided into different reserved sublists according to the priority level. The level division of the reserved sublists corresponds one-to-one with the scheduling subqueues. For example, the scheduling subqueues include: production priority scheduling subqueue, automation task priority scheduling subqueue, etc., and the reserved sublists include: production priority reserved sublist, automation task priority reserved sublist.

[0072] Assuming that task 5 fails to be scheduled in the production priority scheduling subqueue, the available resources in the available resource list are reserved in the production priority reserved subqueue. After all tasks in the production priority scheduling subqueue are scheduled, the available resource list is updated according to the production priority reserved subqueue. This eliminates the need to initialize the reserved resource list multiple times in a single scheduling cycle, and the reserved resource list can also be used to determine which type of resource is more needed for different priorities.

[0073] Reference Figure 5 Optionally, in another embodiment, step S41 includes:

[0074] S411: If scheduling fails, remove the current task from the priority queue;

[0075] S412: Identify the resource types required by the current task, match the required resource types with the resource types in the available resource list, and reserve the resources in the reserved resource list that successfully match the types in the available resource list.

[0076] Specifically, in this embodiment, the available resource list contains different types of resources for different nodes, and different tasks require different resources.

[0077] After a task scheduling failure, the resources in the current available resource list of the current task are matched, and all available resources that match the resource type are written to the reserved resource list. Resources that do not match the type are not added to the reserved resource list to avoid reserving useless resources and further improve the utilization of cluster resources.

[0078] For example, if task 6 fails to schedule, and the resource types required by task 6 are GPU, CPU, and MEM, and in the current list of available resources, node 1 includes GPU and CPU, and node 2 includes GPU, CPU, and MEM, then the resources of node 2 are reserved, and the resources of node 1 are not reserved, because node 1 cannot provide the resources required by the current task and is therefore a useless resource for the current task.

[0079] For example, if task 7 fails to schedule, and the resource types required for task 7 are GPU and CPU, in the current list of available resources, node 1 includes GPU and CPU, and node 2 includes GPU and MEM, then the resources of node 1 are reserved, but the resources of node 2 are not reserved, because node 2 cannot provide the resources required by the current task and is therefore a useless resource for the current task.

[0080] By setting up this embodiment, the number of unused resources reserved is reduced, and the unused resources for the currently scheduled tasks can be used for resource scheduling of subsequent tasks.

[0081] Reference Figure 6 Furthermore, it also includes S7: if the list of available resources is not empty and the priority queue is not empty, then resource scheduling continues.

[0082] Specifically, even if there are tasks that fail to be scheduled in the high priority list, scheduling will continue as long as the available resource list or priority queue is not empty. This allows resources that were useless to tasks that failed to be scheduled in the high priority list to be used for subsequent scheduling, thereby improving the utilization of cluster resources.

[0083] Optionally, in another embodiment, a backoff queue is preset, which holds tasks that fail to be scheduled in the current scheduling period and / or tasks added after the start of the current scheduling period.

[0084] After the current scheduling cycle begins, if new tasks are added, a backoff queue is used to place the newly added tasks as well as the tasks that failed to be scheduled in the current scheduling cycle, so as to facilitate task storage.

[0085] Reference Figure 6 Furthermore, the tasks placed in the backoff queue are sorted, and S6 is included after S5.

[0086] S6: Replace the priority queue with the backoff queue of the current scheduling cycle and start the next scheduling cycle.

[0087] Specifically, in this embodiment, tasks for the next scheduling period are sorted within the current scheduling period, so that tasks do not need to be sorted again at the beginning of the next scheduling period, and the next scheduling period can be started quickly.

[0088] Furthermore, the backoff queue is also divided into multiple levels of backoff sub-queues according to priority. The backoff sub-queues correspond one-to-one with the scheduling sub-queues. During the current scheduling cycle, tasks that fail to be scheduled in the scheduling sub-queue are directly placed in the corresponding backoff sub-queue.

[0089] A specific example of this embodiment is as follows:

[0090] When an engineer submits a task to the cluster, the task manager sends a resource request to the scheduler, which then places the request in a priority queue based on its priority.

[0091] Assume the priority queue and backoff queue are as follows:

[0092]

[0093]

[0094] The order is: A>B>C>D.

[0095] The current list of available resources is as follows:

[0096] List of available resources Node 1 (GPU: 1, CPU: 20, MEM: 80) Node 2 (GPU: 1, CPU: 40, MEM: 100) Node 3 (GPU:0, CPU:30, MEM:60) Node 4 (GPU: 8, CPU: 160, MEM: 320)

[0097] Initialized list of reserved resources:

[0098] Reserved resource list Production priority: [] Automated task priority: [] General priority: [] Experiment priority: []

[0099] If a new task E is added at this time, the priority queue and backoff queue will be as follows:

[0100]

[0101] 1. Task A is invoked according to the priority queue. If task A is successfully invoked, it is assigned to node 4. The available resources on node 4 are reduced by the resources used by task A. Task A is removed from the scheduling queue. The updated list of available resources is as follows:

[0102] List of available resources Node 1 (GPU: 1, CPU: 20, MEM: 80) Node 2 (GPU: 1, CPU: 40, MEM: 100) Node 3 (GPU:0, CPU:30, MEM:60) Node 4 (GPU:0, CPU:0, MEM:0)

[0103] 2. At this point, task B is called from the priority queue. Since no node meets the GPU resource requirements of task B, task B enters the backoff queue. Task B is the last task in the active priority queue for automated tasks, and the reserved resource list for automated tasks is not empty. Therefore, based on the reserved resource list, the available resources of nodes 1 and 2 are cleared, but not reserved for node 3, because node 3 has no GPU resources. Therefore, task B will not be assigned to node 3 at any future time. At this point:

[0104] The priority queue and backoff queue are as follows:

[0105]

[0106] The list of available resources and the list of reserved resources are as follows:

[0107] List of available resources Reserved resource list Node 1 (GPU:0, CPU:0, MEM:0) Production priority: [] Node 2 (GPU:0, CPU:0, MEM:0) Automated task priority: [Node 1, Node 2] Node 3 (GPU:0, CPU:30, MEM:60) General priority: [] Node 4 (GPU:0, CPU:0, MEM:0) Experiment priority: []

[0108] 3. Task C fails to be scheduled because no node meets its resource requirements. Task C enters the regular priority backoff queue, and node 3 is added to the regular priority reserved resource list. At this point, task C is not the last task in the regular priority active queue, so the available resource list will not be updated.

[0109]

[0110]

[0111] List of available resources Reserved resource list Node 1 (GPU:0, CPU:0, MEM:0) Production priority: [] Node 2 (GPU:0, CPU:0, MEM:0) Automated task priority: [Node 1, Node 2] Node 3 (GPU:0, CPU:30, MEM:60) Normal priority: [Node 3] Node 4 (GPU:0, CPU:0, MEM:0) Experiment priority: []

[0112] 4. Task D is successfully scheduled and placed in node 3. Task D exits the priority queue, and the available resources of node 3 are updated. At this point, task D is the last task in the regular priority active queue, so the available resources of node 3 in the available resource list are cleared according to the reserved resource list.

[0113]

[0114] List of available resources Reserved resource list Node 1 (GPU:0, CPU:0, MEM:0) Production priority: [] Node 2 (GPU:0, CPU:0, MEM:0) Automated task priority: [Node 1, Node 2] Node 3 (GPU:0, CPU:0, MEM:0) Normal priority: [Node 3] Node 4 (GPU:0, CPU:0, MEM:0) Experiment priority: []

[0115] 5. Since the priority queue is empty at this time (the available resource list is also empty), the scheduling cycle ends. All tasks in the backoff queue are returned to the priority queue, awaiting scheduling in the next scheduling cycle.

[0116]

[0117]

[0118] It should be understood that the sequence number of each step in the above embodiments does not imply the order of execution. The execution order of each process should be determined by its function and internal logic, and should not constitute any limitation on the implementation process of the embodiments of the present invention.

[0119] In one embodiment of this application, a cluster task scheduling system is provided, which corresponds one-to-one with the cluster task scheduling methods in the above embodiments. The cluster task scheduling system includes:

[0120] The sorting module is used to prioritize received tasks and generate a priority queue.

[0121] The list update module is used to obtain the list of available resources in the current scheduling cycle of the cluster and initialize the list of reserved resources;

[0122] The resource scheduling module is used to retrieve the current task from the priority queue according to the highest priority, and to perform resource scheduling on the retrieved current task according to the available resource list.

[0123] The list update module is also used to remove the current task from the priority queue if scheduling fails, and to reserve available resources in the reserved resource list before continuing resource scheduling; after all tasks with the same priority as the current task have been scheduled, the available resource list is updated according to the reserved resource list.

[0124] The control module is used to exit the current scheduling cycle when the priority queue or the list of available resources is empty.

[0125] Furthermore, the list update module is also used to update the available resource list based on the reserved resource list if the current task is the last task of the corresponding priority.

[0126] Furthermore, the priority queue is divided into multiple scheduling sub-queues according to the priority level. The list update module is also used to subtract all resources in the reserved sub-list corresponding to the current scheduling sub-queue from the available resource list if the current task is the last task in the corresponding scheduling sub-queue, and then update the available resource list.

[0127] Furthermore, a backoff queue is preset, which holds tasks that fail to be scheduled in the current scheduling period and / or tasks added after the start of the current scheduling period.

[0128] Furthermore, the tasks placed in the backoff queue are sorted; the control module is also used to replace the priority queue with the backoff queue of the current scheduling cycle and start the next scheduling cycle.

[0129] Furthermore, the sorting module is also used to sort tasks of the same priority according to their creation time, and to sort tasks of different priorities according to their priority level, thereby generating a priority queue.

[0130] Furthermore, the list update module is also used to identify the resource types required by the current task, match the required resource types with the resource types in the available resource list, and reserve the resources in the reserved resource list that successfully match the types in the available resource list.

[0131] Furthermore, the control module is also used to continue resource scheduling if the list of available resources is not empty and the priority queue is not empty.

[0132] The modules of the aforementioned cluster task scheduling system can be implemented entirely or partially through software, hardware, or a combination thereof. These modules can be embedded in the processor of a computer device in hardware form or independent of it, or they can be stored in the memory of the computer device in software form, so that the processor can call and execute the corresponding operations of each module.

[0133] In one embodiment of this application, a computer device is provided, which may be a server. The computer device includes a processor, memory, and a network interface connected via a system bus. The processor of the computer device provides computing and control capabilities. The memory of the computer device can be implemented using any type of volatile or non-volatile storage device or a combination thereof. Volatile or non-volatile storage devices include, but are not limited to: magnetic disks, optical disks, EEPROM (Electrically Erasable Programmable Read Only Memory), EPROM (Erasable Programmable Read Only Memory), SRAM (Static Random Access Memory), ROM (Read-Only Memory), magnetic storage, flash memory, and PROM (Programmable Read-Only Memory). The memory of the computer device provides an environment for the operation of the operating system and computer programs stored within it. The network interface of the computer device is used for communication with external terminals via a network connection. When the computer program is executed by the processor, it implements the cluster task scheduling method steps described in the above embodiments.

[0134] In one embodiment of this application, a computer-readable storage medium is provided, which stores a computer program. When executed by a processor, the computer program implements the cluster task scheduling method steps described in the above embodiment. The computer-readable storage medium includes ROM (Read-Only Memory), RAM (Random-Access Memory), CD-ROM (Compact Disc Read-Only Memory), magnetic disk, floppy disk, etc.

[0135] Those skilled in the art will clearly understand that, for the sake of convenience and brevity, the above-described division of functional units and modules is used as an example. In practical applications, the above functions can be assigned to different functional units and modules as needed, that is, the internal structure of the system described in this application can be divided into different functional units or modules to complete all or part of the functions described above.

Claims

1. A cluster task scheduling method, characterized in that, The method includes: The received tasks are prioritized and a priority queue is generated. A backoff queue is preset, which is used to place tasks that fail to be scheduled in the current scheduling period and / or tasks added after the start of the current scheduling period. Get the list of available resources for the current scheduling period of the cluster, and initialize the reserved resource list to be empty; The current task is retrieved from the priority queue in order of highest priority, and resource scheduling is performed on the retrieved current task according to the list of available resources; If scheduling fails, the current task is removed from the priority queue and moved to the backoff queue. The resource type required by the current task is identified, and the required resource type is matched with each resource type in the available resource list. Resources that successfully match the type in the available resource list are reserved in the reserved resource list, and resource scheduling continues. After all tasks with the same priority as the current task have been scheduled, if the current task is the last task of its corresponding priority, the available resource list is updated according to the reserved resource list. When the priority queue or available resource list is empty, the current scheduling cycle is terminated, and the priority queue is replaced by the backoff queue of the current scheduling cycle, and the next scheduling cycle begins.

2. The cluster task scheduling method according to claim 1, characterized in that, The priority queue is divided into multiple scheduling sub-queues according to the priority level, and the reserved resource list is divided into reserved sub-lists corresponding one-to-one with the scheduling sub-queues according to the priority level. If the current task is the last task of its corresponding priority, then updating the available resource list according to the reserved resource list includes: If the current task is the last task in the corresponding scheduling subqueue, then all resources in the reserved subqueue corresponding to the current scheduling subqueue are subtracted from the available resource list, and the available resource list is updated.

3. The cluster task scheduling method according to claim 1, characterized in that, The step of prioritizing the received tasks and generating a priority queue is as follows: Tasks with the same priority are sorted by task creation time, and tasks with different priorities are sorted by priority from highest to lowest, generating a priority queue.

4. The cluster task scheduling method according to claim 1, characterized in that, The method further includes: If the list of available resources is not empty and the priority queue is not empty, then resource scheduling continues.

5. A cluster task scheduling system, characterized in that, The system is used to perform the cluster task scheduling method as described in claim 1 above, the system comprising: The sorting module is used to prioritize the received tasks and generate a priority queue. It is also used to preset a backoff queue, which is used to place tasks that failed to be scheduled in the current scheduling period and / or tasks added after the start of the current scheduling period. The list update module is used to obtain the list of available resources in the current scheduling cycle of the cluster and initialize the reserved resource list to be empty; The resource scheduling module is used to retrieve the current task from the priority queue according to the highest priority, and to perform resource scheduling on the retrieved current task according to the available resource list. The list update module is also used to remove the current task from the priority queue if scheduling fails, identify the resource type required by the current task, match the required resource type with each resource type in the available resource list, reserve the resource in the reserved resource list that matches the type in the available resource list, and continue resource scheduling; after all tasks with the same priority as the current task have been scheduled, if the current task is the last task of the corresponding priority, update the available resource list according to the reserved resource list. The control module is used to exit the current scheduling cycle when the priority queue or the list of available resources is empty, and replace the priority queue with the backoff queue of the current scheduling cycle to start the next scheduling cycle.

6. A computer-readable storage medium, characterized in that, The computer-readable storage medium stores a computer program that, when executed by a processor, implements the steps of the cluster task scheduling method according to any one of claims 1-4.