Task scheduling methods, schedulers, systems, devices, media, and program products

CN122309093APending Publication Date: 2026-06-30MOORE THREADS TECH CO LTD

View PDF 0 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Applications(China)
Current Assignee / Owner: MOORE THREADS TECH CO LTD
Filing Date: 2026-06-02
Publication Date: 2026-06-30

Application Information

Patent Timeline

02 Jun 2026

Application

30 Jun 2026

Publication

CN122309093A

IPC: G06F9/48; G06F9/50

AI Tagging

Technology Topics

Computer architectureAsynchronous communication

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

Technical Problem

In existing technologies, communication overhead has become a key bottleneck in the efficiency of distributed task execution, and existing communication masking strategies are strongly bound to model structures, lacking flexibility and versatility.

Method used

When an asynchronous communication task is initiated by the first hardware unit of the computing device, the computing tasks associated with that task are suspended, and the independent computing tasks are scheduled to be executed by the second hardware unit. The parallel execution of tasks and the switching of control are achieved by using a coroutine mechanism, and the computing tasks are resumed by using task handles and event notification mechanisms.

Benefits of technology

It decouples computational and communication tasks, enables parallel execution, hides communication latency, improves hardware resource utilization and task execution efficiency, and is suitable for various application scenarios, especially significantly improving training efficiency and hardware utilization in model training tasks.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN122309093A_ABST

Patent Text Reader

Abstract

This disclosure provides a task scheduling method, scheduler, system, device, medium, and program product, relating to the field of artificial intelligence technology. The task scheduling method includes: in response to receiving a communication request, controlling a first hardware unit of a computing device to initiate an asynchronous communication task corresponding to the communication request; suspending a first computing task currently being executed on a second hardware unit of the computing device, and scheduling a second computing task independent of the asynchronous communication task to be executed on the second hardware unit; wherein the first computing task is associated with the asynchronous communication task; and in response to the asynchronous communication task completing execution on the first hardware unit, resuming execution of the suspended first computing task on the second hardware unit. This method improves hardware resource utilization and task execution efficiency.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This disclosure relates to the field of artificial intelligence technology, specifically to a task scheduling method, scheduler, system, device, medium, and program product. Background Technology

[0002] In distributed tasks, communication overhead is often a key bottleneck restricting the overall task execution efficiency. To improve hardware utilization, the industry commonly adopts communication and computation overlap technology, also known as compute masking. Summary of the Invention

[0003] This disclosure provides a task scheduling method, scheduler, system, device, medium, and program product.

[0004] In a first aspect, embodiments of this disclosure propose a task scheduling method, comprising: in response to receiving a communication request, controlling a first hardware unit of a computing device to initiate an asynchronous communication task corresponding to the communication request; suspending a first computing task currently being executed on a second hardware unit of the computing device, and scheduling a second computing task independent of the asynchronous communication task to be executed on the second hardware unit; wherein the first computing task is associated with the asynchronous communication task; and in response to the asynchronous communication task completing execution on the first hardware unit, resuming execution of the suspended first computing task on the second hardware unit.

[0005] In some embodiments, a first computing task is pre-encapsulated as a first computing coroutine task, and a second computing task is pre-encapsulated as a second computing coroutine task. Suspending the first computing task currently being executed on the second hardware unit of the computing device and scheduling the second computing task, which is independent of the asynchronous communication task, to be executed on the second hardware unit includes: suspending the first computing task by triggering the first computing coroutine task currently being executed on the second hardware unit of the computing device to release execution control; and scheduling the second computing task to be executed on the second hardware unit by switching execution control to the second computing coroutine task, which is independent of the asynchronous communication task.

[0006] In some embodiments, in response to the completion of an asynchronous communication task on a first hardware unit, resuming execution of a suspended first computing task on a second hardware unit includes: when the asynchronous communication task is completed on the first hardware unit, sending a completion signal of the asynchronous communication task to the second hardware unit through the task handle of the asynchronous communication task, and triggering the second hardware unit to resume execution of the suspended first computing task based on the completion signal, wherein the task handle is associated with the first computing task.

[0007] In some embodiments, the first computing task is executed on multiple computing devices; in response to receiving a communication request, controlling a first hardware unit of the computing device to initiate an asynchronous communication task corresponding to the communication request includes: in response to receiving a communication request, controlling the first hardware units of the multiple computing devices to respectively initiate asynchronous communication tasks corresponding to the communication request; suspending the first computing task being executed on a second hardware unit of the computing device includes: in response to receiving a first feedback message from the multiple computing devices indicating that communication has been initiated, broadcasting a wait event to the multiple computing devices to put the first computing task in a waiting state; wherein the wait event is used to trigger the second hardware units of the multiple computing devices to suspend the first computing task being executed.

[0008] In some embodiments, in response to the completion of the asynchronous communication task on the first hardware unit, resuming the execution of the suspended first computing task on the second hardware unit includes: in response to receiving second feedback information from multiple computing devices indicating that communication has been completed, broadcasting a ready event to the multiple computing devices to put the first computing task into a ready state; wherein the ready event is used to trigger the second hardware unit to resume the execution of the suspended first computing task.

[0009] In some embodiments, after suspending the first computing task being executed on the second hardware unit, the method further includes: adding the first computing task to a waiting task queue according to the task information of the first computing task; and resuming the execution of the suspended first computing task on the second hardware unit in response to the completion of the asynchronous communication task on the first hardware unit, including: adding the first computing task in the waiting task queue to a ready task queue in response to the completion of the asynchronous communication task on the first hardware unit, and resuming the execution of the suspended first computing task on the second hardware unit based on the ready task queue.

[0010] In some embodiments, the method is applied to a model training task, which is divided into multiple batch tasks according to data batches, each batch task including a computation task and an asynchronous communication task; wherein, the first computation task is the computation task in the current batch task, and the second computation task is the computation task in the next batch task of the current batch task.

[0011] Secondly, this disclosure also provides a task scheduler, the scheduler comprising: a task initiation module, configured to, in response to receiving a communication request, control a first hardware unit of a computing device to initiate an asynchronous communication task corresponding to the communication request; a scheduling module, configured to suspend a first computing task currently being executed on a second hardware unit of the computing device, and schedule a second computing task independent of the asynchronous communication task to be executed on the second hardware unit; wherein the first computing task is associated with the asynchronous communication task; and a recovery module, configured to, in response to the completion of the asynchronous communication task on the first hardware unit, resume the execution of the suspended first computing task on the second hardware unit.

[0012] Thirdly, this disclosure also provides a task scheduling system, the system including at least one computing device, the computing device including: a first hardware unit for executing asynchronous communication tasks; a second hardware unit for executing computing tasks; wherein the computing tasks include a first computing task associated with the asynchronous communication tasks and a second computing task independent of the asynchronous communication tasks; a scheduler for executing the method as described in any of the first aspects to schedule the execution of asynchronous communication tasks and computing tasks on the first hardware unit and the second hardware unit.

[0013] Fourthly, embodiments of this disclosure provide an electronic device comprising: at least one processor; and a memory communicatively connected to the at least one processor; wherein the memory stores instructions executable by the at least one processor, the instructions being executed by the at least one processor to enable the at least one processor to implement the task scheduling method described in any implementation of the first aspect.

[0014] Fifthly, embodiments of this disclosure provide a non-transitory computer-readable storage medium storing computer instructions that enable a computer to implement the task scheduling method described in any implementation of the first aspect.

[0015] In a sixth aspect, embodiments of this disclosure provide a computer program product, including a computer program that, when executed by a processor, implements the steps of the task scheduling method as described in any implementation of the first aspect.

[0016] The task scheduling method provided in this disclosure, upon receiving a communication request, initiates and executes an asynchronous communication task on a first hardware unit of a computing device, and suspends a first computing task associated with the asynchronous communication task. Furthermore, while initiating the asynchronous communication task, the computing device does not enter a blocking waiting state, but instead schedules a second computing task that has no data dependency on the asynchronous communication task and is independent of it to a second hardware unit for continued execution. This fully utilizes the computing resources of the second hardware unit during the execution of the asynchronous communication task, achieving parallel execution of the computing and communication tasks, hiding communication latency, and improving hardware resource utilization and task execution efficiency. The task scheduling method provided in this disclosure decouples computing and communication tasks, making it applicable to various application scenarios and improving versatility and flexibility. Attached Figure Description

[0017] Other features, objects, and advantages of this disclosure will become more apparent from the following detailed description of non-limiting embodiments with reference to the accompanying drawings: Figure 1 A flowchart of a task scheduling method provided in this embodiment of the disclosure; Figure 2 A structural block diagram of a task scheduler provided in an embodiment of this disclosure; Figure 3 A structural block diagram of a task scheduling system provided in this embodiment of the disclosure; Figure 4 This is a schematic diagram of the hardware structure of an electronic device provided in an embodiment of this disclosure. Detailed Implementation

[0018] The exemplary embodiments of this disclosure are described below with reference to the accompanying drawings, including various details of the embodiments to aid understanding; these should be considered merely exemplary. Therefore, those skilled in the art will recognize that various changes and modifications can be made to the embodiments described herein without departing from the scope and spirit of this disclosure. Similarly, for clarity and brevity, descriptions of well-known functions and structures are omitted in the following description. It should be noted that, unless otherwise specified, the embodiments and features described in this disclosure can be combined with each other.

[0019] For example, for distributed tasks that include asynchronous communication and computation tasks, communication operations can be tightly embedded in a specific model computation graph to achieve general computation masking. However, this approach results in deep coupling between communication operations and algorithmic logic.

[0020] Tensor Parallelism (TP) operator-level overlap schemes specifically include: Represented by the Megatron language model (a large model training framework), this approach involves meticulously designing the splitting of model operators (such as column parallelism and row parallelism) and their execution order, allowing AllReduce communication to be interleaved with subsequent matrix multiplication computations on the hardware pipeline. Furthermore, communication is hidden by merging communication and computation kernels into a larger unified kernel. The drawbacks of this multi-scheme approach are that optimization strategies are strongly bound to specific model structures (such as Transformer layers) and operator implementations, resulting in poor generalization ability; any modification to the model architecture may disrupt the overlap effect.

[0021] The pipeline orchestration scheme in Expert Parallelism (EP) specifically includes: for all-to-all communication in Mixture of Experts (MoE) models, further dividing the model computation blocks into finer data blocks and rearranging the execution order of computation and communication operators to form a cyclic pipeline. The disadvantages of this scheme are: it requires intrusive reconstruction of the computation graph, introduces complex pipeline bubbles and management overhead, and lacks flexibility in strategy adjustment.

[0022] Schedulingable communication schemes in context parallelism (CP) include ring attention, which supports training in very long contexts by overlapping attention computations with ring communication for long sequences. The drawback of this scheme is that it is designed specifically for a particular sequence parallelism model and is difficult to transfer to other communication models (such as TP, EP, etc.) or other computational models.

[0023] Therefore, it can be seen that the communication masking strategy, which tightly embeds communication operations into a specific model computation graph to achieve general computation masking, is strongly bound to the model structure, parallel strategy and algorithm implementation, and lacks a general, flexible resource scheduling layer that is decoupled from the computation logic.

[0024] Based on this, this disclosure proposes a task scheduling method.

[0025] Figure 1 This is a flowchart illustrating a task scheduling method provided in an embodiment of the present disclosure. The method can be applied to a scheduler in a computing device. The computing device is used to execute computing tasks and asynchronous communication tasks associated with the computing tasks. The computing device further includes a first hardware unit and a second hardware unit, the first hardware unit being used to execute the asynchronous communication tasks, and the second hardware unit being used to execute the computing tasks.

[0026] Specifically, the computing device is an electronic computing device with dual hardware resources for hardware computation and hardware communication, internally integrating a scheduler, a first hardware unit, and a second hardware unit. The computing device can be a server, a system-on-a-chip, a processor, etc. For example, the computing device can be a graphics processing unit (GPU), but this disclosure does not limit it to this.

[0027] like Figure 1 As shown, the task scheduling method specifically includes the following steps: Step 101: In response to receiving a communication request, the first hardware unit controlling the computing device initiates an asynchronous communication task corresponding to the communication request.

[0028] Specifically, when the scheduler in the computing device receives a communication request for the first computing task, it can first determine at least one piece of communication information, including the data to be transmitted, the communication method, and the hardware node information involved in the communication, by parsing the communication request. Then, it constructs and encapsulates the corresponding asynchronous communication task based on the communication information. Subsequently, it issues a communication execution instruction to the first hardware unit of the computing device to control the first hardware unit to execute the asynchronous communication task asynchronously.

[0029] The communication tasks include blocking collective communication tasks, which can employ at least one of the following: full reduction communication, all-to-all communication, and full collection communication. The asynchronous communication task is obtained by asynchronously processing the blocking collective communication task.

[0030] Step 102: Suspend the first computing task currently being executed on the second hardware unit of the computing device, and schedule the second computing task, which is independent of the asynchronous communication task, to be executed on the second hardware unit.

[0031] The first computation task is associated with the asynchronous communication task.

[0032] Specifically, the association between the first computation task and the asynchronous communication task means that there is a data dependency between them. The essence of this data dependency is that the output data of one task is the input data of another, causing them to be executed in a specific order. In this embodiment, the first computation task depends on the asynchronous communication task; it requires the communication result data from the asynchronous communication task as input. Therefore, the first computation task needs to be suspended and waited for the asynchronous communication task to complete before its execution can resume.

[0033] Because the first computational task has a data dependency with the asynchronous communication task initiated this time, the first computational task cannot continue its subsequent computation process until the asynchronous communication task is completed. Therefore, after initiating the asynchronous communication task, the first computational task currently being executed on the second hardware unit of the computing device can be suspended, halting its computation on the second hardware unit. Simultaneously, the scheduler selects a second computational task that has no data dependency on the asynchronous communication task and is independent of it, scheduling the second computational task to continue execution on the second hardware unit. This allows for full utilization of the computing resources of the second hardware unit during the execution of the asynchronous communication task, achieving parallel execution of the computational and communication tasks and improving hardware resource utilization.

[0034] Step 103: In response to the completion of the asynchronous communication task on the first hardware unit, the suspended first computing task is resumed on the second hardware unit.

[0035] Specifically, when the scheduler in the computing device detects that the asynchronous communication task has been completed on the first hardware unit, it indicates that the communication data required by the first computing task associated with the asynchronous communication task has been prepared, and the first computing task meets the conditions for continued execution. At this time, the second computing task is suspended on the second hardware unit, and the previously suspended first computing task is resumed on the second hardware unit, so that the first computing task can continue to complete the subsequent computing process from the suspended position, thereby realizing the pipelined collaborative execution of computing tasks and communication tasks.

[0036] This disclosure provides a task scheduling method. Upon receiving a communication request, an asynchronous communication task is initiated and executed on a first hardware unit of a computing device, while a first computing task associated with the asynchronous communication task is suspended. Furthermore, while initiating the asynchronous communication task, the computing device does not enter a blocking waiting state. Instead, a second computing task, which has no data dependency on the asynchronous communication task and is independent of it, is scheduled to be executed on a second hardware unit. This fully utilizes the computing resources of the second hardware unit during the execution of the asynchronous communication task, achieving parallel execution of the computing and communication tasks. This hides communication latency and improves hardware resource utilization and task execution efficiency. The task scheduling method provided in this disclosure decouples computing and communication tasks, making it applicable to various application scenarios and improving versatility and flexibility.

[0037] The task scheduling method disclosed herein can be applied to any scenario where the first computation task and the asynchronous communication task are executed on different hardware units, and where there is a data dependency between the first computation task and the asynchronous communication task. Specifically, this task scheduling method can be applied to model training tasks.

[0038] In related technologies, for model training tasks, most overlap occurs only within a single batch, making it impossible to schedule computing resources across batches to fill idle periods caused by communication waiting. However, the task scheduling method provided in this disclosure, when applied to model training tasks, exhibits a particularly significant masking effect between computation and communication tasks, greatly improving training efficiency and hardware utilization.

[0039] In some embodiments, the task scheduling method can be applied to model training tasks, which are divided into multiple batch tasks according to data batches. Each batch task includes a computation task and an asynchronous communication task. The first computation task is the computation task in the current batch task, and the second computation task is the computation task in the next batch task of the current batch task.

[0040] Specifically, in model training tasks, the data processing of different batches is logically independent, and therefore, the computational tasks within different batches are independent of each other. When this task scheduling method is applied to model training tasks, batches can be used as scheduling units. A scheduler on the computing device can switch between computational tasks corresponding to multiple batches. When a batch's computational task is suspended due to an asynchronous communication task, the computing resources of the second hardware unit can be allocated to other executable batches.

[0041] The embodiments provided in this disclosure, for model training tasks, enable rapid switching between different batches of computation tasks, thereby fully utilizing the independence between batches to fill communication bubbles. This disclosure provides a comprehensive computational masking method that enables dynamic scheduling at the batch level and possesses low-overhead cross-process synchronization capabilities, thus more thoroughly hiding communication latency and improving the overall resource utilization of distributed training and inference.

[0042] To support the decoupled execution of computation tasks and asynchronous communication tasks, the suspension and resumption of the first computation task can be achieved through at least one of the following methods: coroutines, threads, state machines, and context saving and restoration. Specifically, implementing it through threads allows user-space management of the execution context and stack, enabling task suspension, switching, and resumption without entering the kernel, with overhead approaching that of coroutines. Implementing it through a state machine involves dividing the first computation task into multiple interruptible execution segments. After each segment executes, its execution state is recorded. When encountering communication waiting, the state is saved and execution is paused. Once communication is complete, subsequent segments can continue execution based on the current state. Implementing it through context saving and restoration involves explicitly saving the computation task's registers, program counter, data pointers, and other execution context to achieve suspension. After communication is complete, the context is restored, allowing the task to resume execution from the breakpoint.

[0043] Coroutines are used to achieve lightweight task switching and execution control management.

[0044] In response to the above Figure 1 Step 102 in the text is followed by a specific implementation method using coroutines.

[0045] In some embodiments, the first computing task is pre-packaged as a first computing coroutine task, and the second computing task is pre-packaged as a second computing coroutine task.

[0046] Furthermore, the steps of suspending the first computing task currently being executed on the second hardware unit of the computing device and scheduling the second computing task, which is independent of the asynchronous communication task, to be executed on the second hardware unit include: suspending the first computing task by triggering the first computing coroutine task currently being executed on the second hardware unit of the computing device to release execution control; and scheduling the second computing task to be executed on the second hardware unit by switching execution control to the second computing coroutine task, which is independent of the asynchronous communication task.

[0047] Specifically, a coroutine, also known as a microthread or user-mode lightweight execution unit, is a program execution entity that can actively suspend and save its execution context during execution, and can resume execution from the breakpoint at any time later. It belongs to a non-preemptive, cooperative concurrency mechanism, runs inside a single thread, and is autonomously controlled and scheduled by the program logic. It does not require the operating system kernel to participate in thread switching and has the characteristics of low context switching overhead and low resource consumption.

[0048] In other words, coroutines can be suspended, paused, resumed, and redirected by the program itself, achieving concurrent and cooperative task execution with minimal memory and scheduling overhead. Coroutine scheduling is achieved by the coroutine voluntarily relinquishing control, rather than by the system forcibly scheduling it.

[0049] The scheduling of tasks (including computational tasks and communication tasks) requires thread scheduling through the operating system kernel, which increases memory and scheduling overhead. This disclosure pre-encapsulates computational tasks (including a first computational task, a second computational task, and other computational tasks to be scheduled) into corresponding computational coroutines by calling coroutine functions, and pre-encapsulates standard, blocking collection communication operations (i.e., asynchronous communication tasks) into communication coroutines that can be asynchronously waited for. Specifically, this disclosure can process the interface of the underlying communication library, transforming it from a synchronous interface to an asynchronous interface. Therefore, based on the processed interface of the underlying communication library, after initiating a communication request, the first computational coroutine corresponding to the first computational task running on the second hardware unit can be immediately suspended, while the communication coroutine corresponding to the asynchronous communication task executes independently on the first hardware unit.

[0050] When asynchronous communication causes the first computation task to wait, the first computation coroutine corresponding to the first computation task can proactively relinquish execution control, allowing the first computation task to be efficiently suspended. By switching this execution control to a second computation coroutine task that is independent of the asynchronous communication task, the second computation task is scheduled to be executed by the second hardware unit. After the asynchronous communication task is completed, this control is quickly switched back from the second computation coroutine to the first computation coroutine, enabling the first computation coroutine to quickly resume execution without the overhead of operating system intervention in thread or process switching.

[0051] The embodiments provided in this disclosure, through the coroutine mechanism, enable the concurrent scheduling of multiple computing tasks (including the first computing task and the second computing task) on the second hardware execution unit of the computing device. This allows for seamless switching to execute other independent computing tasks (such as the second computing task) during idle periods while waiting for asynchronous communication tasks to complete. This achieves mutual masking of communication waiting time and computing execution time, thereby improving the utilization of computing resources and the overall training throughput.

[0052] Furthermore, in some embodiments, in response to the completion of the asynchronous communication task on the first hardware unit, resuming the execution of the suspended first computing task on the second hardware unit includes: when the asynchronous communication task is completed on the first hardware unit, sending a completion signal of the asynchronous communication task to the second hardware unit through the task handle of the asynchronous communication task, and triggering the second hardware unit to resume the execution of the suspended first computing task based on the completion signal, wherein the task handle is associated with the first computing task.

[0053] Specifically, after initiating an asynchronous communication task, a task handle can be assigned to the asynchronous communication task. The task handle serves as the unique identifier of the asynchronous communication task and is used to identify, track, and manage the execution status of the asynchronous communication task.

[0054] After initiating an asynchronous communication task, the task handle corresponding to the asynchronous communication task is associated and bound to the first computing task awaiting its execution result, thereby establishing a correspondence between the asynchronous communication task and the first computing task. When the asynchronous communication task is completed on the first hardware unit, a completion signal indicating the completion of the asynchronous communication task is sent to the second hardware unit through the aforementioned associated task handle, thereby triggering the second hardware unit to resume the execution of the previously suspended first computing task, allowing it to continue the subsequent computing process from its suspended position.

[0055] In some embodiments, associating the task handle of an asynchronous communication task with a first computation task can be achieved through an event.

[0056] Specifically, when initiating an asynchronous communication task, a corresponding communication handle can be allocated for the communication operation, and this communication handle can be associated with a preset asynchronous waiting event. The asynchronous communication task is executed independently and asynchronously by the first hardware unit. When the asynchronous communication task is completed, the asynchronous waiting event is triggered through a pre-configured event notification mechanism, thereby waking up the first computing task that is in a suspended state and waiting for the event, so that the first computing task can resume execution on the second hardware unit. The event notification mechanism may include callback functions or semaphores, etc., which are not limited in this disclosure.

[0057] In some embodiments, the first computing task may also be pre-encapsulated as a corresponding first computing coroutine, so that the first computing coroutine can quickly resume execution after the asynchronous communication task is completed, without the operating system incurring the overhead of thread or process switching.

[0058] The embodiments provided in this disclosure associate the task handle of an asynchronous communication task with a first computing task, so that when the asynchronous communication task is completed, the task handle can be accurately matched and the resumption of execution of the corresponding first computing task can be triggered, thereby avoiding task confusion and ensuring the correctness of scheduling.

[0059] This disclosure provides a task scheduling method applicable to distributed scenarios where multiple computing devices collaborate on execution. The following detailed description, in conjunction with specific implementation methods, provides further details.

[0060] It should be noted that this distributed task includes computational tasks and asynchronous communication tasks, and the computational and communication tasks must be executed on different hardware units. This distributed task can be the model training task described above, or other distributed tasks that meet the conditions; this disclosure does not limit its scope.

[0061] In some embodiments, the first computing task is executed on multiple computing devices; in response to receiving a communication request, controlling a first hardware unit of the computing device to initiate an asynchronous communication task corresponding to the communication request includes: in response to receiving a communication request, controlling the first hardware units of the multiple computing devices to respectively initiate asynchronous communication tasks corresponding to the communication request.

[0062] Furthermore, suspending the first computing task currently being executed on the second hardware unit of the computing device includes: in response to receiving a first feedback message from multiple computing devices indicating that communication has been initiated, broadcasting a wait event to the multiple computing devices to put the first computing task into a waiting state. This wait event is used to trigger the second hardware units of the multiple computing devices to suspend the first computing task currently being executed.

[0063] Specifically, in a distributed multi-computing device scenario, the first computing task can be divided into multiple computing fragments according to a preset parallel strategy and distributed to multiple computing devices for parallel execution. The preset parallel strategy can be any one of data parallelism, model parallelism, or tensor parallelism.

[0064] In this scenario, multiple computing devices can be globally synchronized to ensure that all processes involved in the computation maintain strict consistency in the scheduling status of the first computing task.

[0065] Specifically, the scheduler of any of the multiple computing devices can act as a central coordinator. Upon receiving a communication request, it controls the multiple computing devices to initiate their respective asynchronous communication tasks. Then, after initiating their asynchronous communication tasks, each computing device returns its initial feedback information to the central coordinator. Finally, upon receiving the initial feedback information from all participating computing devices, the central coordinator indicates that all participating computing devices are ready to begin communication. At this point, all participating computing devices can suspend their respective initial computing tasks. The central coordinator then broadcasts a wait event indicating that the initial computing tasks are in a waiting state to all participating computing devices, triggering the second hardware units of all participating computing devices to suspend their currently executing initial computing tasks.

[0066] Taking model training as an example, the first computation task can be the current batch task. This ensures that, in a distributed multi-device environment, all processes participating in the computation maintain strict consistency in their scheduling status for the current batch task.

[0067] This disclosure presents a lightweight distributed collaborative control protocol. The protocol comprises two key components: synchronization point barriers and event broadcasting. During the synchronization point barrier phase, when a device initiates asynchronous communication for the current batch of tasks X, it sends a "communication initiated" notification to the central coordinator. The central coordinator maintains a global state table, and once it confirms that communication requests for the current batch of tasks X have been initiated on all relevant devices, it broadcasts a unified "current batch of tasks X enters waiting" event to all devices. Upon receiving the event, each device atomically and synchronously sets the coroutine state corresponding to the current batch of tasks X to "suspended." Furthermore, it can move the coroutine to a local waiting queue, thereby ensuring that all devices synchronously enter the waiting state.

[0068] In the embodiments provided in this disclosure, the scheduler controls the first hardware units of multiple computing devices to initiate asynchronous communication tasks corresponding to the communication request after receiving a communication request; and only after the multiple computing devices have fed back the first feedback information that the communication has been initiated to the scheduler, does the scheduler broadcast the waiting event that puts the first computing task in a waiting state to the multiple computing devices, thereby ensuring that in a distributed multi-device environment, all processes participating in the computing maintain a strictly consistent scheduling state for the first computing task.

[0069] Furthermore, in some embodiments, in response to the completion of the asynchronous communication task on the first hardware unit, resuming the execution of the suspended first computing task on the second hardware unit includes: in response to receiving second feedback information from multiple computing devices indicating that communication has been completed, broadcasting a ready event to the multiple computing devices to put the first computing task into a ready state. This ready event is used to trigger the second hardware unit to resume the execution of the suspended first computing task.

[0070] Specifically, in a distributed training scenario, the scheduler monitors the execution status of asynchronous communication tasks on each computing device in real time. When it receives second feedback information from all participating computing devices indicating the completion of their local communication tasks, it determines that the global ensemble communication operation is complete. At this point, a ready event is broadcast to multiple computing devices to switch the suspended first computing task to an executable state. Upon receiving this ready event, each computing device determines that the suspended first computing task meets the conditions for continued execution, thereby triggering the second hardware unit to resume execution of the previously suspended first computing task, allowing the first computing task to continue the subsequent computation process from the point of suspension.

[0071] Taking the model training task as an example, during the event broadcast phase, devices that have completed communication send a "communication completed" notification to the central coordinator. The coordinator maintains a global completion status table. When it confirms that all communication required by the current batch task X has been completed, it broadcasts a unified "current batch task X is ready" event to all devices. Upon receiving this event, each device atomically and synchronously sets the status of all coroutines waiting for the current batch task to "ready" and re-queues them into its local ready queue.

[0072] In this embodiment of the disclosure, by broadcasting a ready event used to put the first computing task into a ready state to multiple computing devices, the state of the multiple computing devices is ensured to remain strictly consistent.

[0073] In addition, the scheduler of each computing device can maintain its own task queue, which includes a ready queue and a waiting queue, etc., so that the scheduler can select target tasks from the ready queue and schedule them to be executed on the second hardware unit to improve scheduling efficiency. A specific implementation method is given below.

[0074] In some embodiments, after suspending the first computing task being executed on the second hardware unit, the scheduling method further includes: adding the first computing task to a waiting task queue according to the task information of the first computing task.

[0075] Furthermore, in response to the completion of the asynchronous communication task on the first hardware unit, the execution of the suspended first computing task is resumed on the second hardware unit, including: in response to the completion of the asynchronous communication task on the first hardware unit, adding the first computing task in the waiting task queue to the ready task queue, and resuming the execution of the suspended first computing task on the second hardware unit based on the ready task queue.

[0076] Specifically, after suspending the first computing task currently executing on the second hardware unit, the first computing task can proactively submit at least one piece of task information, such as its corresponding task identifier, execution context, and suspension position, to the scheduler. Then, the scheduler adds the suspended first computing task to a preset waiting task queue for maintenance, ensuring that the first computing task is in a state of waiting for communication completion. Simultaneously, the scheduler schedules other computing tasks independent of the asynchronous communication task from the ready task queue to the second hardware unit for execution.

[0077] The waiting task queue stores waiting tasks that are in a waiting state. Waiting tasks that are in a waiting state are not eligible to be executed because they do not meet the data dependencies or resource conditions. They must wait for the corresponding triggering event to be completed before they can be scheduled for execution. The ready queue stores ready tasks that are ready. Ready tasks that are ready have met the data dependencies and hardware resource configuration conditions and can be directly scheduled and allocated to the corresponding hardware unit for execution by the scheduler.

[0078] Furthermore, when the scheduler detects that the asynchronous communication task has been completed, it can also retrieve the first computing task corresponding to the asynchronous communication task from the waiting task queue and add it to the ready task queue, so that the first computing task is converted into a ready state that can be scheduled for execution. Subsequently, the scheduler schedules the previously suspended first computing task from the ready task queue to the second hardware unit to resume execution, so that the first computing task can continue to execute the subsequent computing process from the suspended position, thereby realizing rapid switching and time reuse of computing resources on a single computing device.

[0079] The first computation task and the asynchronous communication task can be pre-encapsulated as corresponding coroutines to achieve lightweight communication and reduce additional overhead.

[0080] Taking model training as an example, a global, centralized coroutine scheduler is established on each computing device. This scheduler manages all active computing tasks uniformly, using data batches as the granularity. The forward and backward propagation processes of each batch are encapsulated as independent coroutines. The scheduler maintains multiple logical queues, including ready queues and waiting queues, and continuously listens for scheduling events initiated by coroutines. Its core workflow follows an event-driven preemptive scheduling strategy.

[0081] Specifically, when any running batch coroutine reaches an asynchronous communication operation, it proactively suspends (yields) and submits its own identifier and the communication event it is waiting for to the scheduler. The scheduler then changes its status from "running" to "waiting for communication" and moves it to the waiting queue. Immediately afterwards, the scheduler selects the next batch coroutine from the ready queue and resumes its execution, thereby achieving rapid switching and time reuse of computing resources on a single device.

[0082] The embodiments provided in this disclosure encapsulate aggregate communication operations into independent, asynchronously waiting coroutine tasks, thereby preventing communication scheduling from intruding into the model computation graph and achieving logical decoupling between communication and computation. This approach is applicable to any distributed training / inference scenario employing aggregate communication, including TP, EP, CP, and any combination thereof, demonstrating greater generalization ability. Secondly, it supports dynamic scheduling across batches. Using batches as the scheduling unit, a coroutine scheduler switches between computational streams corresponding to multiple batches on a single computing device. When a batch is suspended due to communication, computational resources are immediately allocated to other executable batches. Through cross-batch scheduling, the independence between batches can be used to fill communication bubbles, achieving more thorough communication hiding than overlap within a single batch, thus improving device utilization.

[0083] Furthermore, this disclosure ensures distributed state consistency by providing a lightweight, low-overhead cross-process synchronization control mechanism. This ensures atomic consistency of suspend and wake-up operations on the same batch of computation streams across all devices, preventing data errors or deadlocks. Finally, through finer-grained scheduling, communication cavitation is more effectively masked, improving the utilization of computing devices and thus enhancing training and inference throughput and system resource utilization.

[0084] This disclosure innovatively separates scheduling logic from computation logic, thus creating a pluggable, general-purpose infrastructure that significantly improves adaptability and maintainability. For the first time, this disclosure systematically proposes and implements cross-stream scheduling at the data batch level, fully leveraging the inherent parallelism between batches in deep learning tasks. It explores deeper resource reuse potential from a temporal perspective, theoretically achieving superior communication masking. The synchronization point barrier and event broadcasting mechanism designed in this disclosure is a lightweight consensus protocol tailored for scheduling consistency. It is more efficient and accurate than traditional global barriers or complex distributed state machines, minimizing the overhead of distributed scheduling while ensuring correctness.

[0085] Based on the same inventive concept as the above-described task scheduling method, this disclosure also provides a task scheduler.

[0086] Figure 2 This is a structural block diagram of a task scheduler provided in an embodiment of the present disclosure.

[0087] Further reference Figure 2 As an implementation of the methods shown in the above figures, this scheduler embodiment is similar to... Figure 1 Corresponding to the method embodiments shown, this scheduler can be specifically applied to various electronic devices.

[0088] like Figure 2 As shown, the scheduler includes a task initiation module 201, a scheduling module 202, and a recovery module 203. The task initiation module 201, in response to receiving a communication request, controls a first hardware unit of the computing device to initiate an asynchronous communication task corresponding to the communication request. The scheduling module 202 suspends the first computing task currently being executed on a second hardware unit of the computing device and schedules a second computing task, independent of the asynchronous communication task, to be executed on the second hardware unit; wherein the first computing task is associated with the asynchronous communication task. The recovery module 203, in response to the completion of the asynchronous communication task on the first hardware unit, resumes the execution of the suspended first computing task on the second hardware unit.

[0089] In this embodiment, the specific processing of the task initiation module 201, scheduling module 202, and recovery module 203 in the scheduler, and the resulting technical effects, can be found by referring to [reference needed]. Figure 1 The relevant descriptions of steps 101-103 in the corresponding embodiments will not be repeated here.

[0090] In the scheduler provided in this embodiment, after receiving a communication request, the task initiation module 201 initiates and executes an asynchronous communication task on the first hardware unit of the computing device, and suspends the first computing task associated with the asynchronous communication task. Furthermore, while initiating the asynchronous communication task, the scheduling module 202 does not enter a blocking waiting state on the computing device. Instead, it schedules a second computing task, which has no data dependency on the asynchronous communication task and is independent of it, to the second hardware unit for continued execution. This fully utilizes the computing resources of the second hardware unit during the execution of the asynchronous communication task, achieving parallel execution of the computing and communication tasks, hiding communication latency, and improving hardware resource utilization and task execution efficiency. Further, after the asynchronous communication task is completed, the recovery module 203 resumes the execution of the suspended first computing task on the second hardware unit. The task scheduling method provided in this disclosure can decouple computing and communication tasks, making it applicable to various application scenarios and improving versatility and flexibility.

[0091] In some embodiments, the first computing task is pre-encapsulated as a first computing coroutine task, and the second computing task is pre-encapsulated as a second computing coroutine task. The scheduling module 202 is specifically configured to: suspend the first computing task by triggering the first computing coroutine task currently executing on the second hardware unit of the computing device to release execution control; and schedule the second computing task to be executed on the second hardware unit by switching execution control to the second computing coroutine task, which is independent of the asynchronous communication task.

[0092] In some embodiments, the recovery module 203 is specifically configured to: when the asynchronous communication task is completed on the first hardware unit, send a completion signal of the asynchronous communication task to the second hardware unit through the task handle of the asynchronous communication task, and trigger the second hardware unit to resume execution of the suspended first computing task based on the completion signal, wherein the task handle is associated with the first computing task.

[0093] In some embodiments, the first computing task is executed on multiple computing devices; the task initiation module 201 is specifically used to: in response to receiving a communication request, control the first hardware units of the multiple computing devices to respectively initiate asynchronous communication tasks corresponding to the communication request.

[0094] Furthermore, the scheduling module 202 is specifically used to broadcast a waiting event to the multiple computing devices in response to receiving a first feedback message that communication has been initiated from multiple computing devices; wherein the waiting event is used to trigger the second hardware unit of the multiple computing devices to suspend the first computing task that is being executed.

[0095] In some embodiments, the recovery module 203 is further configured to: in response to receiving a second feedback message from multiple computing devices indicating that communication has been completed, broadcast a ready event to the multiple computing devices to put the first computing task into a ready state; wherein the ready event is used to trigger the second hardware unit to resume execution of the suspended first computing task.

[0096] In some embodiments, the scheduler may further include a submission module, which is specifically used to: add the first computing task to the waiting task queue according to the task information of the first computing task.

[0097] In some embodiments, the submission module is further configured to: add the first computation task in the waiting task queue to the ready task queue in response to the completion of the asynchronous communication task on the first hardware unit. Further, the recovery module 203 is further configured to resume the execution of the suspended first computation task on the second hardware unit based on the ready task queue.

[0098] In some embodiments, the device can be used to schedule model training tasks, which are divided into multiple batch tasks according to data batches. Each batch task includes a computation task and an asynchronous communication task. The first computation task is the computation task in the current batch task, and the second computation task is the computation task in the next batch task of the current batch task.

[0099] The specific implementation details and technical effects of the scheduler embodiments provided in this disclosure are the same as the implementation details and technical effects of the task scheduling method embodiments described above, and will not be repeated here.

[0100] This embodiment is a device embodiment corresponding to the above method embodiment. The task scheduler provided in this embodiment, upon receiving a communication request, initiates and executes an asynchronous communication task on the first hardware unit of the computing device, and suspends the first computing task associated with the asynchronous communication task. Furthermore, while initiating the asynchronous communication task, the computing device does not enter a blocking waiting state, but instead schedules a second computing task that has no data dependency on the asynchronous communication task and is independent of it to the second hardware unit for continued execution. This fully utilizes the computing resources of the second hardware unit during the execution of the asynchronous communication task, achieving parallel execution of the computing and communication tasks, hiding communication latency, and improving hardware resource utilization and task execution efficiency. The task scheduling method provided in this disclosure can decouple computing and communication tasks, making it applicable to various application scenarios and improving versatility and flexibility.

[0101] Furthermore, this disclosure also provides a task scheduling system.

[0102] Figure 3This is a structural block diagram of a task scheduling system provided in an embodiment of the present disclosure.

[0103] like Figure 3 As shown, the task scheduling system 300 includes at least one computing device 301, which includes a first hardware unit 311, a second hardware unit 321, and a scheduler 331.

[0104] The first hardware unit 311 is used to execute asynchronous communication tasks. The second hardware unit 321 is used to execute computation tasks. The computation tasks include a first computation task associated with the asynchronous communication tasks and a second computation task independent of the asynchronous communication tasks. The scheduler 331 is used to execute any of the task scheduling methods described in the above embodiments to schedule the asynchronous communication tasks and computation tasks on the first hardware unit 311 and the second hardware unit 321.

[0105] The specific implementation details and technical effects of the task scheduling system embodiments provided in this disclosure are the same as the implementation details and technical effects of the task scheduling method embodiments described above, and will not be repeated here.

[0106] According to embodiments of the present disclosure, the present disclosure also provides an electronic device, the electronic device comprising: at least one processor; and a memory communicatively connected to the at least one processor; wherein the memory stores instructions executable by the at least one processor, the instructions being executed by the at least one processor to enable the at least one processor to implement the task scheduling method described in any of the above embodiments when executed.

[0107] Figure 4 This is a schematic diagram of the hardware structure of an electronic device provided in an embodiment of this disclosure. For example... Figure 4 As shown, the electronic device 400 of this embodiment includes a processor 401 and a memory 402; wherein, the memory 402 is used to store computer execution instructions; the processor 401 is used to execute the computer execution instructions stored in the memory to implement the various steps performed by the electronic device in the above embodiment. For details, please refer to the relevant descriptions in the foregoing method embodiments. For example, the electronic device 400 can be a general-purpose processor, a graphics processing device, a neural network computing device, or a graph neural network computing device.

[0108] In some embodiments, the memory 402 can be either standalone or integrated with the processor 401.

[0109] When the memory 402 is set up independently, the electronic device also includes a bus 403 for connecting the memory 402 and the processor 401.

[0110] It should be understood that the processor 401 described above can be a Central Processing Unit (CPU), or other general-purpose processors, digital signal processors (DSPs), application-specific integrated circuits (ASICs), etc. The general-purpose processor can be a microprocessor or any conventional processor. The steps of the method disclosed in this invention can be directly manifested as being executed by a hardware processor, or executed by a combination of hardware and software modules within the processor.

[0111] The memory 402 may include high-speed RAM memory, and may also include non-volatile memory NVM, such as at least one disk storage device, and may also be a USB flash drive, portable hard drive, read-only memory, disk or optical disc, etc.

[0112] Bus 403 can be an Industry Standard Architecture (ISA) bus, a Peripheral Component Interconnect (PCI) bus, or an Extended Industry Standard Architecture (EISA) bus, etc. Buses can be categorized as address buses, data buses, control buses, etc. For ease of illustration, the buses shown in the accompanying drawings are not limited to a single bus or a single type of bus.

[0113] This disclosure also provides a computer storage medium storing computer execution instructions, which, when executed by a processor, implement the steps of the task scheduling method in any of the above method embodiments.

[0114] This disclosure also provides a computer program product, including a computer program that, when executed by a processor, implements the steps of the task scheduling method according to any of the above embodiments.

[0115] In the several embodiments provided in this disclosure, it should be understood that the disclosed devices and methods can be implemented in other ways. For example, the device embodiments described above are merely illustrative; for instance, the division of modules is only a logical functional division, and in actual implementation, there may be other division methods. For example, multiple modules may be combined or integrated into another system, or some features may be ignored or not executed. Furthermore, the coupling or direct coupling or communication connection shown or discussed may be indirect coupling or communication connection through some interfaces, devices, or modules, and may be electrical, mechanical, or other forms.

[0116] The modules described as separate components may or may not be physically separate. The components shown as modules may or may not be physical units; that is, they may be located in one place or distributed across multiple network units. Some or all of the modules can be selected to implement the solution of this embodiment according to actual needs.

[0117] Furthermore, the functional modules in the various embodiments of this disclosure can be integrated into one processing unit, or each module can exist physically separately, or two or more modules can be integrated into one unit. The aforementioned modular unit can be implemented in hardware or in a combination of hardware and software functional units.

[0118] The integrated modules described above, implemented as software functional modules, can be stored in a computer-readable storage medium. These software functional modules, stored in a storage medium, include several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) or processor to execute partial steps of the methods in the various embodiments of this application.

[0119] The aforementioned storage medium can be implemented by any type of volatile or non-volatile storage device or a combination thereof, such as static random access memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic storage, flash memory, magnetic disk, or optical disk. The storage medium can be any available medium that can be accessed by a general-purpose or special-purpose computer.

[0120] An exemplary storage medium is coupled to a processor, enabling the processor to read information from and write information to the storage medium. Alternatively, the storage medium can be an integral part of the processor. Both the processor and the storage medium can reside in application-specific integrated circuits (ASICs). Alternatively, the processor and storage medium can exist as discrete components in an electronic device or host device.

[0121] Those skilled in the art will understand that all or part of the steps of the above-described method embodiments can be implemented by hardware related to program instructions. The aforementioned program can be stored in a computer-readable storage medium. When executed, the program performs the steps of the above-described method embodiments; and the aforementioned storage medium includes various media capable of storing program code, such as ROM, RAM, magnetic disks, or optical disks.

[0122] In the above embodiments, the descriptions of each embodiment have their own emphasis. For parts not described in detail in a certain embodiment, please refer to the relevant descriptions of other embodiments. The technical features of the above embodiments can be combined arbitrarily. For the sake of brevity, not all possible combinations of the technical features in the above embodiments are described. However, as long as the combination of these technical features does not contradict each other, it should be considered within the scope of this specification.

[0123] It should be understood that the various forms of processes shown above can be used to rearrange, add, or delete steps. For example, the steps described in this disclosure can be executed in parallel, sequentially, or in different orders, as long as the desired result of the technical solution disclosed in this disclosure can be achieved, and this is not limited herein.

[0124] The specific embodiments described above do not constitute a limitation on the scope of protection of this disclosure. Those skilled in the art should understand that various modifications, combinations, sub-combinations, and substitutions can be made according to design requirements and other factors. Any modifications, equivalent substitutions, and improvements made within the spirit and principles of this disclosure should be included within the scope of protection of this disclosure.

Claims

1. A task scheduling method, characterized in that, The method includes: In response to receiving a communication request, the first hardware unit of the control computing device initiates an asynchronous communication task corresponding to the communication request; The first computing task currently being executed on the second hardware unit of the computing device is suspended, and a second computing task that is independent of the asynchronous communication task is scheduled to be executed on the second hardware unit; wherein, the first computing task is associated with the asynchronous communication task; In response to the completion of the asynchronous communication task on the first hardware unit, the suspended first computing task is resumed on the second hardware unit.

2. The method according to claim 1, characterized in that, The first computing task is pre-packaged as a first computing coroutine task, and the second computing task is pre-packaged as a second computing coroutine task; The step of suspending the first computing task currently being executed on the second hardware unit of the computing device and scheduling a second computing task, which is independent of the asynchronous communication task, to be executed on the second hardware unit includes: The first computing task is suspended by releasing execution control of the first computing coroutine task that is being executed on the second hardware unit of the computing device. By switching the execution control to a second computational coroutine task that is independent of the asynchronous communication task, the second computational task is scheduled to be executed by the second hardware unit.

3. The method according to claim 1, characterized in that, The step of resuming execution of the suspended first computing task on the second hardware unit in response to the completion of the asynchronous communication task on the first hardware unit includes: When the asynchronous communication task is completed on the first hardware unit, the completion signal of the asynchronous communication task is sent to the second hardware unit through the task handle of the asynchronous communication task, and the second hardware unit is triggered to resume the execution of the suspended first computing task based on the completion signal, wherein the task handle is associated with the first computing task.

4. The method according to claim 1, characterized in that, The first computing task is executed on multiple computing devices; The step of controlling the first hardware unit of the computing device to initiate an asynchronous communication task corresponding to the received communication request includes: In response to receiving a communication request, the first hardware unit controlling the plurality of computing devices respectively initiates an asynchronous communication task corresponding to the communication request; Suspending the first computing task currently being executed on the second hardware unit of the computing device includes: In response to receiving a first feedback message indicating that communication has been initiated from the plurality of computing devices, a wait event is broadcast to the plurality of computing devices to put the first computing task into a waiting state; The waiting event is used to trigger the second hardware unit of the plurality of computing devices to suspend the first computing task that is being executed.

5. The method according to claim 4, characterized in that, The step of resuming execution of the suspended first computing task on the second hardware unit in response to the completion of the asynchronous communication task on the first hardware unit includes: In response to receiving a second feedback message indicating that communication has been completed from the plurality of computing devices, a ready event is broadcast to the plurality of computing devices to put the first computing task into a ready state; The ready event is used to trigger the second hardware unit to resume execution of the suspended first computing task.

6. The method according to claim 1, characterized in that, After suspending the first computing task being executed on the second hardware unit of the computing device, the method further includes: Based on the task information of the first computing task, the first computing task is added to the waiting task queue; The step of resuming execution of the suspended first computing task on the second hardware unit in response to the completion of the asynchronous communication task on the first hardware unit includes: In response to the completion of the asynchronous communication task on the first hardware unit, the first computing task in the waiting task queue is added to the ready task queue, and based on the ready task queue, the suspended first computing task is resumed to be executed on the second hardware unit.

7. The method according to any one of claims 1-6, characterized in that, The method is applied to a model training task, which is divided into multiple batch tasks according to data batches. Each batch task includes a computation task and an asynchronous communication task. Wherein, the first computing task is a computing task in the current batch of tasks, and the second computing task is a computing task in the next batch of tasks of the current batch of tasks.

8. A task scheduler, characterized in that, The scheduler includes: The task initiation module is used to control the first hardware unit of the computing device to initiate an asynchronous communication task corresponding to the received communication request in response to the received communication request. The scheduling module is used to suspend the first computing task being executed on the second hardware unit of the computing device, and schedule the second computing task, which is independent of the asynchronous communication task, to be executed on the second hardware unit; wherein the first computing task is associated with the asynchronous communication task; A recovery module is configured to resume execution of the suspended first computing task on the second hardware unit in response to the completion of the asynchronous communication task on the first hardware unit.

9. A task scheduling system, characterized in that, The system includes at least one computing device, the computing device comprising: The first hardware unit is used to perform asynchronous communication tasks; The second hardware unit is used to execute computing tasks; wherein the computing tasks include a first computing task associated with the asynchronous communication task and a second computing task independent of the asynchronous communication task; A scheduler for performing the method as described in any one of claims 1-7 to schedule the asynchronous communication task and the computation task to be executed on the first hardware unit and the second hardware unit.

10. An electronic device, characterized in that, include: At least one processor; as well as A memory communicatively connected to the at least one processor; wherein, The memory stores instructions that can be executed by the at least one processor to enable the at least one processor to perform the task scheduling method according to any one of claims 1-7.

11. A non-transitory computer-readable storage medium storing computer instructions, characterized in that, The computer instructions are used to cause the computer to execute the task scheduling method according to any one of claims 1-7.

12. A computer program product, characterized in that, It includes a computer program, which, when executed by a processor, implements the steps of the task scheduling method according to any one of claims 1-7.