Task preemption methods, command stream parsers, systems, chips, devices, storage media, and application products.

By implementing a preemption mechanism using three sets of dedicated registers in the GPU's command stream parser, the problem of high-priority tasks waiting due to low-priority tasks occupying resources is solved, thus improving the GPU's processing efficiency.

CN122309067APending Publication Date: 2026-06-30GLENFLY TECH CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
GLENFLY TECH CO LTD
Filing Date
2026-03-23
Publication Date
2026-06-30

AI Technical Summary

Technical Problem

In a graphics processing unit (GPU), when low-priority tasks consume resources, high-priority tasks have to wait too long to obtain those resources, resulting in reduced processing efficiency.

Method used

By using three sets of dedicated registers with separate functions in the processor's command stream parser, preemption signal synchronization, command stream jump control, and precise progress recording are achieved. This allows for the identification and response to thread group preemption, enabling high-priority tasks to immediately stop the execution of low-priority task thread groups and avoid waiting for the current task to complete.

Benefits of technology

This significantly reduces the waiting time for high-priority tasks and improves the processing efficiency of the GPU.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122309067A_ABST
    Figure CN122309067A_ABST
Patent Text Reader

Abstract

This application relates to a task preemption method, a command stream parser, a system, a chip, a device, a storage medium, and a program product. The command stream parser, applied to a processor, includes the following method: when the status flag in a first preemption register is set to a preset flag, and the main command parser sends the current scheduling command of a first task to the scheduling command thread group splitter, the main command parser sends a stop signal to the scheduling command thread group splitter and receives the splitting status of the current scheduling command's thread group from the scheduling command thread group splitter; when the splitting status is recorded in a second preemption register, the position of the current scheduling command in the sub-command sequence packet is recorded in a third preemption register, and the main command parser clears the status flag in the first preemption register; the main command parser reads the address of a second task from the command queue register and executes the second task. This method can improve task processing efficiency.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of graphics processing architecture technology, and in particular to a task preemption method, a command stream parser, a system, a chip, a computer device, a computer-readable storage medium, and a computer program product. Background Technology

[0002] With the increasing complexity of graphics processing unit (GPU) task requirements (including but not limited to real-time rendering, computation, AI inference, background resource loading, etc.), if there is a lack of preemption mechanism, low-priority tasks (such as AI inference, where a general computing command may contain tens of thousands of thread groups, and each thread group may contain thousands of threads) may occupy GPU resources for a long time.

[0003] When a GPU is executing a low-priority computation task that includes multiple DMA buffers, a traditional GPU must wait for the entire current task (or even the entire command queue) to complete before switching, making it unable to respond promptly to suddenly arriving high-priority tasks. Since a single DMA buffer may contain massive computations (tens of thousands of thread groups), high-priority tasks may have to wait excessively for resources, leading to reduced GPU processing efficiency. Summary of the Invention

[0004] Therefore, it is necessary to provide a task preemption method, command stream parser, system, chip, computer device, computer-readable storage medium, and computer program product that can improve task processing efficiency in response to the above-mentioned technical problems.

[0005] In a first aspect, this application provides a task preemption method applied to a processor's command stream parser, the method comprising:

[0006] When the status flag in the first preemption register is a preset flag, and the command parser sends the current scheduling command of the first task to the scheduling command thread group splitter, the command parser sends a stop signal to the scheduling command thread group splitter and receives the splitting status of the thread group of the current scheduling command from the scheduling command thread group splitter; wherein, the preset flag indicates that the priority of the second task is higher than the priority of the first task; the current scheduling command is the scheduling command currently being executed in the sub-command sequence packet of the first task;

[0007] When the splitting state is recorded through the second preemption register, the position of the current scheduling command in the sub-command sequence packet is recorded through the third preemption register, and the status flag of the first preemption register is cleared through the command master parser;

[0008] The command master parser reads the address of the second task from the command queue register and executes the second task; wherein the address of the second task is sent to the command queue register of the command stream parser after the software driver layer detects that the status flag has been cleared.

[0009] In one embodiment, after recording the position of the current scheduling command in the sub-command sequence packet via a third preemption register, the method further includes:

[0010] Update the current value using the task counter to obtain the updated value;

[0011] The command data storage cache receives the return data returned from memory; wherein the return data carries the value recorded by the task counter;

[0012] If the value recorded by the task counter carried in the returned data is the same as the updated value, then the returned data is stored in the command data storage cache.

[0013] In one embodiment, the method further includes:

[0014] If the value of the task counter recorded in the returned data is different from the updated value, then the returned data will not be stored through the command data storage cache.

[0015] In one embodiment, prior to receiving the returned data from memory via the command data storage cache, the method further includes:

[0016] The command data storage cache is cleared of all data belonging to the first task.

[0017] In one embodiment, before clearing the status flag of the first preemption register via the command master parser, the method further includes:

[0018] The command master parser executes the stored command sequence pre-stored in the command data storage cache, and saves the state of the command stream parser that is executing the first task to the memory.

[0019] In one embodiment, the split state includes preemption boundary type and thread group progress; the split state is recorded through a second preemption register, including:

[0020] The second preemption register records the preemption boundary type as the thread group boundary, and the thread group progress.

[0021] After reading the address of the second task from the command queue register via the command master parser and executing the second task, the method further includes:

[0022] The command master parser executes a portion of the recovery command sequence from the recovery command sequence pre-stored in the command data storage cache to restore the state of the second preemption register and the third preemption register;

[0023] The command master parser obtains the preemption boundary type recorded in the second preemption register as the thread group boundary, and continues to execute the command sequence after the partial recovery command sequence in the recovery command sequence to restore the state of the command stream parser;

[0024] The command grabber retrieves the resubmitted current scheduling command from memory within the sub-command sequence packet of the first task;

[0025] The current scheduling command is obtained through the command master parser;

[0026] The thread group progress recorded in the second preemption register in the splitting state is passed to the scheduling command thread group splitter by the command master parser. The scheduling command thread group splitter is used to determine the breakpoint according to the thread group progress and resume the splitting and sending work at the breakpoint.

[0027] In one embodiment, each of the sub-command sequence packets includes a hardware / software synchronization command; the hardware / software synchronization command is located at the end of the sub-command sequence packet; the hardware / software synchronization command is used to instruct the command master parser to write the counting information into a specified area in the memory pointed to by the address information.

[0028] In one embodiment, the method further includes: when the state flag in the first preemption register is the preset flag, and the thread group splitting completion status of the current scheduling command is received from the scheduling command thread group splitter, and the splitting completion status is recorded through the second preemption register, the position of the current scheduling command in the sub-command sequence packet corresponding to the first task is recorded through the third preemption register, and the state flag in the first preemption register is cleared through the command master parser.

[0029] The command master parser reads the address of the second task from the command queue register and executes the second task.

[0030] In one embodiment, after reading the address of the second task from the command queue register via the command master parser and executing the second task, the method further includes:

[0031] The command master parser executes a portion of the recovery command sequence from the recovery command sequence pre-stored in the command data storage cache to restore the state of the second preemption register and the third preemption register;

[0032] The command master parser obtains the preemption boundary type recorded in the second preemption register as the scheduling command boundary, and continues to execute the command sequence after the partial recovery command sequence in the recovery command sequence to restore the state of the command stream parser;

[0033] The command grabber retrieves the sub-command sequence packet of the first task corresponding to the resubmitted current scheduling command from memory.

[0034] The command master parser obtains the next command of the current scheduling command based on the position of the current scheduling command in the sub-command sequence packet recorded in the third preemption register, and processes the next command.

[0035] In one embodiment, the method further includes: when the status flag in the first preemption register is the preset flag and the current scheduling command is received through the command master parser, the command master parser records the position of the current scheduling command in the sub-command sequence packet of the first task in the second preemption register, and the command master parser clears the status flag of the first preemption register;

[0036] The command master parser reads the address of the second task from the command queue register and executes the second task.

[0037] In one embodiment, after reading the address of the second task from the command queue register via the command master parser and executing the second task, the method further includes:

[0038] The command master parser executes a portion of the recovery command sequence from the recovery command sequence pre-stored in the command data storage cache to restore the state of the second preemption register and the third preemption register;

[0039] The command master parser obtains the preemption boundary type recorded in the second preemption register as the scheduling command boundary, and continues to execute the command sequence after the partial recovery command sequence in the recovery command sequence to restore the state of the command stream parser;

[0040] The command grabber retrieves the sub-command sequence packet of the first task corresponding to the resubmitted current scheduling command from memory.

[0041] The command master parser obtains the current scheduling command based on its position in the sub-command sequence packet as recorded in the third preemption register, and then processes the current scheduling command.

[0042] In one embodiment, when the status flag in the first preemption register is the preset flag, and the sub-command sequence packet containing the current scheduled command has been executed, the command master parser records the position of the sub-command sequence packet of the first task containing the current scheduled command in the second preemption register, and the command master parser clears the status flag of the first preemption register; after reading the address of the second task in the command queue and executing the second task through the command master parser, the method further includes:

[0043] The command master parser executes a portion of the recovery command sequence from the recovery command sequence pre-stored in the command data storage cache to restore the state of the second preemption register and the third preemption register;

[0044] If the preemption boundary type recorded in the second preemption register is a direct memory access boundary, the command master parser processes the next sub-command sequence packet of the sub-command sequence packet containing the current scheduling command.

[0045] Secondly, this application also provides a task preemption method applied to the software driver layer of a processor, the method comprising:

[0046] When a second task is detected, a preset flag is written as a status flag to the first preemption register of the command stream parser. The command stream parser, upon obtaining the preset flag in the first preemption register and sending the current scheduling command of the first task to the scheduling command thread group splitter, sends a stop signal to the scheduling command thread group splitter via the main command parser and receives the splitting status of the thread group for the current scheduling command from the scheduling command thread group splitter. If the splitting status is recorded via the second preemption register, the position of the current scheduling command in the sub-command sequence packet is recorded via the third preemption register, and the status flag in the first preemption register is cleared via the main command parser. The preset flag indicates that the priority of the second task is higher than that of the first task; the current scheduling command is the currently executing scheduling command in the sub-command sequence packet of the first task.

[0047] When the status flag of the first preemption register is cleared, the address of the second task is sent to the command queue register of the command stream parser; wherein, the command stream parser is used to read the address of the second task in the command queue register through the command master parser and execute the second task.

[0048] In one embodiment, before reading the address of the second task from the command queue register via the command master parser and executing the second task, the method further includes:

[0049] The storage command sequence and the recovery command sequence are stored in memory; wherein, the storage command sequence is used to save the state of the command stream parser that is executing the first task to the memory when the preset flag is present; the storage command sequence corresponds to the recovery command sequence;

[0050] The memory addresses of the stored command sequence and the recovery command sequence are sent to the command stream parser; wherein, the command stream parser is used to store the stored command sequence and the recovery command sequence read according to the memory address through a command data storage cache.

[0051] In one embodiment, each of the sub-command sequence packets includes a hardware / software synchronization command; the method further includes:

[0052] A hardware / software synchronization command is inserted at the end of the subcommand sequence packet; the hardware / software synchronization command includes address information and count information; the hardware / software synchronization command is used to instruct the main command parser to write the count information into a specified area in the memory pointed to by the address information.

[0053] Thirdly, this application also provides a preemption method applied to a processor's scheduling command thread group splitter, comprising:

[0054] When the status flag in the first preemption register is a preset flag, and the command master parser sends the current scheduling command of the first task to the scheduling command thread group splitter, the stop signal sent by the command master parser is received; wherein, the preset flag indicates that the priority of the second task is higher than the priority of the first task; the current scheduling command is the scheduling command currently being executed in the sub-command sequence packet of the first task;

[0055] The thread group of the current scheduling command is stopped from being split according to the stop signal, and the splitting status of each thread group is fed back to the command master parser.

[0056] In one embodiment, the splitting status includes preemption boundary type and thread group progress; the step of feeding back the splitting status of each thread group to the command master parser includes:

[0057] The progress of each thread group and the preemption boundary type are fed back to the command master parser, wherein the progress of each thread group includes the thread group splitting completion status and the thread group not splitting completion status.

[0058] In one embodiment, the method further includes:

[0059] When the thread group splitter of the scheduling command reports that the thread group of the current scheduling command is in a splitting completed state, it feeds back the splitting completed state and the preemption boundary type to the command master parser. The command master parser is used to record that the preemption boundary type is a scheduling command boundary.

[0060] In one embodiment, the method further includes:

[0061] When the thread group splitter of the scheduling command reports that the thread group of the current scheduling command is in an incomplete splitting state, it feeds back the incomplete splitting state, the state of the thread group that has been split, and the preemption boundary type to the command master parser. The command master parser records the preemption boundary type as the scheduling command thread group boundary.

[0062] Fourthly, this application also provides a command stream parser for task preemption, the command stream parser comprising:

[0063] The first preemption register is used to receive a preset flag indicating that the priority of the second task is higher than that of the first task;

[0064] The command master parser is configured to send a stop signal to the scheduling command thread group splitter when the status flag in the first preemption register is set to a preset flag and the command master parser sends the current scheduling command of the first task to the scheduling command thread group splitter, and receive the splitting status of the thread group of the current scheduling command from the scheduling command thread group splitter; the current scheduling command is the scheduling command currently being executed in the sub-command sequence packet of the first task.

[0065] The second preemption register is used to record the splitting state;

[0066] The third preemption register is used to record the position of the current scheduling command in the sub-command sequence packet;

[0067] The command parser is also used to clear the status flag of the first preemption register;

[0068] The command master parser is also used to read the address of the second task in the command queue register and execute the second task; wherein the address of the second task is sent to the command queue register of the command stream parser after the software driver layer detects that the status flag has been cleared.

[0069] Fifthly, this application also provides a preemption system, the system comprising a software driver layer, a command stream parser, and a scheduled command thread group splitter; the command stream parser comprises a first preemption register, a second preemption register, a third preemption register, a command master parser, and a command queue register;

[0070] The software driver layer is used to write a preset flag as a status flag to the first preemption register of the command stream parser when a second task is detected; wherein the preset flag indicates that the priority of the second task is higher than the priority of the first task;

[0071] The command master parser is configured to send a stop signal to the scheduling command thread group splitter when the status flag in the first preemption register is a preset flag and the command master parser sends the current scheduling command of the first task to the scheduling command thread group splitter; wherein, the current scheduling command is the scheduling command currently being executed in the sub-command sequence packet of the first task;

[0072] The scheduling command thread group splitter is used to receive a stop signal sent by the command master parser, stop splitting the thread group of the current scheduling command according to the stop signal, and feed back the splitting status of each thread group of the current scheduling command to the command master parser.

[0073] The command master parser is also used to receive the splitting status of the thread group of the current scheduling command fed back by the scheduling command thread group splitter;

[0074] The second preemption register is used to record the splitting state;

[0075] The third preemption register is used to record the position of the current scheduling command in the sub-command sequence packet;

[0076] The command parser is used to clear the status flag of the first preemption register;

[0077] The software driver layer is also used to send the address of the second task to the command queue register of the command stream parser when it is detected that the status flag of the first preemption register is cleared;

[0078] The command master parser is used to read the address of the second task in the command queue register and execute the second task.

[0079] Sixthly, this application also provides a chip including a memory and a processor, wherein the memory stores a computer program, and the processor executes the computer program to implement the steps of the above-described method.

[0080] In a seventh aspect, this application also provides a computer device, including a memory and a processor, wherein the memory stores a computer program, and the processor executes the computer program to implement the steps of the method described above.

[0081] Eighthly, this application also provides a computer-readable storage medium having a computer program stored thereon, which, when executed by a processor, implements the steps of the above-described method.

[0082] Ninthly, this application also provides a computer program product, including a computer program that, when executed by a processor, implements the steps of the above-described method.

[0083] The aforementioned task preemption method, command stream parser, system, chip, computer device, computer-readable storage medium, and computer program product, through three sets of functionally separated dedicated registers, respectively undertake the functions of preemption signal synchronization, command stream jump control, and precise progress recording. Simultaneously, they work in coordination with the command stream parser and the scheduling command thread group splitter, enabling the command stream parser to identify, respond to, and record thread group preemption occurring within a scheduling command. When a second task preempts, the thread group processing the first task can be immediately stopped, without waiting for the sub-command sequence packet containing the current scheduling command of the first task or for the current scheduling command to complete before executing the second task, significantly reducing waiting time and improving processing efficiency. Attached Figure Description

[0084] To more clearly illustrate the technical solutions in the embodiments of this application or related technologies, the drawings used in the description of the embodiments of this application or related technologies will be briefly introduced below. Obviously, the drawings described below are only some embodiments of this application. For those skilled in the art, other related drawings can be obtained based on these drawings without creative effort.

[0085] Figure 1 This is a diagram illustrating the application environment of the task preemption method in one embodiment;

[0086] Figure 2 This is a schematic diagram of the command stream parser in one embodiment;

[0087] Figure 3 This is a flowchart illustrating a task preemption method in one embodiment;

[0088] Figure 4This is a flowchart illustrating the task preemption method in another embodiment;

[0089] Figure 5 This is a flowchart illustrating the process of filtering data using a task counter in one embodiment;

[0090] Figure 6 This is a schematic diagram of the state at time T0 in one embodiment;

[0091] Figure 7 This is a schematic diagram of the state at time T1 in one embodiment;

[0092] Figure 8 This is a schematic diagram of the state at time T2 in one embodiment;

[0093] Figure 9 This is a schematic diagram of the state at time T3 in one embodiment;

[0094] Figure 10 This is a schematic diagram of the storage command sequence used during preemption in one embodiment;

[0095] Figure 11 This is a flowchart illustrating a preemption process that occurs when executing the current scheduling command in one embodiment.

[0096] Figure 12 This is a schematic diagram of the recovery command sequence used during resubmission in one embodiment;

[0097] Figure 13 This is a flowchart illustrating the resubmission process in one embodiment;

[0098] Figure 14 This is a schematic diagram illustrating the insertion of software and hardware synchronization commands in one embodiment;

[0099] Figure 15 This is a flowchart illustrating a preemption method that occurs when the current scheduling command completes execution, as shown in one embodiment.

[0100] Figure 16 This is a flowchart illustrating a preemption method that occurs when the current scheduling command is not executed, as shown in one embodiment.

[0101] Figure 17 This is an internal structural diagram of a computer device in one embodiment. Detailed Implementation

[0102] To make the objectives, technical solutions, and advantages of this application clearer, the following detailed description is provided in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative and not intended to limit the scope of this application.

[0103] Microsoft introduced the concept of GPU task preemption in Direct3D and supports querying the preemption granularity already supported by the GPU hardware via DXGI. This helps applications understand the current preemption capabilities of the GPU and further optimize task distribution strategies. The preemption granularity levels supported by the GPU, from coarse to fine, include preemption at the boundaries of the entire DMA command buffer, preemption at the boundaries of a single compute dispatch, preemption at the boundaries of a thread group (Warp / Wavefront), preemption at the boundaries of a single thread, and preemption at the boundaries of a single instruction. A DMA buffer contains data that will be consumed by one or more subsequent dispatch commands. A dispatch command (such as a CUDA kernel startup) specifies that thousands of threads need to be started, organized into many thread groups. A thread group consists of multiple threads. Preemption occurring at the boundaries of the DMA command buffer is classified as DMA buffer level; preemption occurring at the boundaries of a dispatch command is classified as dispatch command level; and preemption occurring at the boundaries of a thread group is classified as thread group boundary level.

[0104] To facilitate the scheduling and preemption of multiple tasks across the GPU, Direct3D 12 requires users to fill in the priority of the current task when creating a command queue. The priorities, from low to high, are "normal priority", "high priority" and "global real-time priority".

[0105] The task preemption method provided in this application embodiment can be applied to, for example, Figure 1 The application environment shown includes the software driver layer, command stream parser, memory resources, graphics pipeline, and general computing pipeline. Figure 1The graphics pipeline and general-purpose computing pipeline are illustrated separately, but considering hardware overhead and adapting to user scenarios, they often share the same set of hardware resources at certain nodes: for example, the front-end shader in the graphics pipeline and the compute shader in the general-purpose computing pipeline share stream processors; they also share on-chip caches, texture units, etc. Different tasks are scheduled by the software driver layer, and one is selected to be placed into the "GPU task" of memory resources. At the same time, the command stream parser reads the corresponding commands from the "GPU task" and parses them. Therefore, it can be imagined that only the graphics pipeline runs for a period of time, and then switches to the general-purpose computing pipeline for another period of time. If the GPU is currently running a general-purpose computing task, and the software driver layer has already dropped several complete compute packets (each corresponding to a Direct Memory Access Buffer (DMA buffer)) before that, when the user issues a desktop rendering task, in order to ensure user experience, a preemption mechanism needs to be activated to allow the desktop rendering task to preempt all GPU resources as quickly as possible. In most computing scenarios, each computing data packet may contain several dispatch commands, and each dispatch command may contain many thread groups. If only DMA-level preemption is responded to, the timeliness of preemption in these scenarios will be very poor. Therefore, it is necessary to consider more granular "thread group level" preemption.

[0106] Figure 2 for Figure 1 The specific hardware structure of the command stream parser is described below. The command stream parser includes P1. Register Pool, P2. Command Grabber, P3. Command / Data Storage Buffer, P4. Command Pre-parser, P5. Command Main Parser, and P6. Task Counter. Hardware modifications are concentrated in the "Command Stream Parser" (specifically, "P1. Register Pool," "P3. Command / Data Storage Buffer," "P4. Command Pre-parser," "P5. Command Main Parser," and "P6. Task Counter"). The bolded sections represent improvements made to the command stream parser to support the task preemption method described in this application. Corresponding adjustments were also made to "P7. Scheduled Command Thread Group Splitter," which interacts with the command stream parser.

[0107] P1. Register Pool: Records register information for all required low-priority tasks, including but not limited to the RB (ring buffer) register. The RB register is used to describe... Figure 1 The specific location and size of the "GPU task" in memory resources.

[0108] P2. Command Grabber: Detects the RB register in P1 to determine if a new task has been issued by the software driver layer. If so, it begins reading the corresponding command data from memory resources based on the RB register in P1.

[0109] P3. Command / Data Storage Cache: Used to cache command streams read back from external storage and some indirect data required by commands.

[0110] P4. Command Pre-parser: For commands that use indirect data, after allocating the corresponding space for this indirect data in P3, P4 sends a corresponding read request to the external memory. The read data is stored in P3. Each "command pre-parser" is used to pre-parse a type of command, including but not limited to register configuration commands, plotting commands, general calculation commands, resource copy commands, etc.

[0111] P5. Command Master Parser: Used to parse commands one by one and output the corresponding command and register information to subsequent pipelines. Indirect data needed during command parsing is read directly from P3. Each "Command Master Parser" is used to parse a type of command, including but not limited to register configuration commands, plotting commands, general calculation commands, resource copy commands, etc.

[0112] P6. Task Counter: Updated only during task switching to discard useless data returned from memory resources.

[0113] P7. Scheduling Command Thread Group Splitter: Used to split thread groups in scheduling commands and send different thread groups to different computing units according to the scheduling algorithm.

[0114] In one exemplary embodiment, such as Figure 3 As shown, a task preemption method is provided, which can be applied to... Figure 1 or Figure 2 Taking the command stream parser in the example, the explanation includes the following steps S302 to S306. Wherein:

[0115] In step S302, when the status flag in the first preemption register is a preset flag and the command master parser sends the current scheduling command of the first task to the scheduling command thread group splitter, the command master parser sends a stop signal to the scheduling command thread group splitter and receives the splitting status of the current scheduling command thread group from the scheduling command thread group splitter.

[0116] The preset flag indicates that the priority of the second task is higher than that of the first task.

[0117] The current scheduling command is the currently executing scheduling command in the sub-command sequence package of the first task. For example, the first task includes three sub-command sequence packages, namely DMA1, DMA2 and DMA3. Each sub-command sequence package includes one or more sub-scheduling commands, and each sub-scheduling command includes multiple thread groups.

[0118] The first preemption register, Preemption_Reg, is used to synchronize the state between the command stream parser and the software driver layer. The software driver layer notifies the command stream parser that preemption has occurred by writing a preset flag to the first preemption register. Once the command stream parser is ready, it also informs the software driver layer through the first preemption register that it can begin issuing the second preempted task.

[0119] Optionally, if the command stream parser is executing a low-priority first task, such as the first sub-command sequence packet for background AI inference, and the software driver layer detects that a new frame needs to be rendered immediately (a high-priority real-time rendering task), the software driver layer informs the command stream parser of preemption via the first preemption register (Preemption_Reg), by modifying the status flag of the first preemption register to a preset flag, such as 1. The command stream parser then executes the first sub-command sequence packet (including the currently scheduled command) through the first dedicated command master parser in the command master parser. The command master parser includes the first dedicated command master parser, such as... Figure 2 The command parser shown is 4.

[0120] After the status flag in the first preemption register is set to the preset flag, and the first dedicated command parser sends the current scheduling command of the first task to the scheduling command thread group splitter, if the "scheduling command thread group splitter" reports completion of the previous stages, it indicates that the current scheduling command has started execution, but the current scheduling command has not yet been completed, that is, as... Figure 4 The "Hardware: Has Preemption Occurred (2)" stage is shown. A stop signal is sent to the scheduling command thread group splitter through the first dedicated command master parser. After receiving the stop signal, the scheduling command thread group splitter immediately stops splitting and feeds back the splitting status of the current scheduling command thread group to the first dedicated command master parser. The first dedicated command master parser receives the splitting status of the current scheduling command thread group fed back by the scheduling command thread group splitter.

[0121] Step S304: If the split status is recorded through the second preemption register, the position of the current scheduling command in the sub-command sequence packet is recorded through the third preemption register, and the status flag of the first preemption register is cleared through the command master parser.

[0122] The second preemption register, Preemption_Reg, is used to record the splitting state when preemption occurs, such as the specific boundary (DMA boundary or scheduling command boundary) and the specific completion state under each boundary. This facilitates the restoration of the command stream parser's state during subsequent resubmissions, allowing the unfinished parts of the first task to be completed.

[0123] The third preemption register, Preemption_Reg, is used to record all commands belonging to the first task that need to be skipped. When preemption occurs, these commands belonging to the first task are skipped, and the second task is started directly.

[0124] Optionally, the first dedicated command parser updates the split status of the second preemption register Preemption_Reg; if the split status is recorded through the second preemption register, the position of the current scheduling command in the subcommand sequence packet is recorded through the third preemption register, and the status flag of the first preemption register is cleared through the first dedicated command parser to inform the software driver layer to prepare for the execution of high-priority tasks.

[0125] Step S306: Read the address of the second task in the command queue register through the command master parser and execute the second task.

[0126] The address of the second task is sent to the command queue register of the command stream parser after the software driver layer detects the clearing status flag.

[0127] Optionally, after the software driver layer detects that the status flag of the first preemption register has been cleared, it updates the second task in the command stream parser: the software driver layer sends the memory address of the second task, such as the address of a direct memory access buffer, to the command queue register of the command stream parser. At this time, the command stream parser receives the memory address of the second task through the command queue register. The first dedicated command master parser then reads and executes the second task based on its memory address.

[0128] It should be noted that while the main command parser executes the second task, the command pre-parser executes the second task in parallel. P4. The command pre-parser parses the second task. For example, if there are x commands in the second task, where x is a positive integer, the command pre-parser pre-parses each command sequentially; only commands that have been pre-parsed can be further executed by the main command parser. The main command parser also parses the pre-parsed commands in the same order. The main command parser and the command pre-parser parse the commands in parallel. For example, at time T, the command pre-parser pre-parses the y-th command; while the main command parser parses the z-th command, where y is greater than z, both y and z are less than x, and both y and z are positive integers.

[0129] In the above task preemption method, three sets of dedicated registers with separate functions are used to handle preemption signal synchronization, command stream jump control, and precise progress recording, respectively. These registers work in conjunction with the command stream parser and the scheduling command thread group splitter, enabling the command stream parser to identify, respond to, and record thread group preemption occurring within a scheduling command. When the second task preempts, the thread group processing the first task can be immediately stopped, without waiting for the sub-command sequence packet containing the current scheduling command of the first task or for the current scheduling command to complete before executing the second task. This significantly reduces waiting time and improves processing efficiency.

[0130] In one exemplary embodiment, data is filtered by a task counter, such as Figure 5 As shown, after recording the position of the current scheduling command in the sub-command sequence packet through the third preemption register, steps S502 to S510 are also included. Wherein:

[0131] Step S502: Update the current value using the task counter to obtain the updated value.

[0132] Step S504: Clear all data belonging to the first task from the command data storage cache.

[0133] Step S506: Receive the return data returned from memory through the command data storage cache. The return data carries the value recorded by the task counter.

[0134] Step S508: If the value recorded by the task counter carried in the returned data is the same as the updated value, then the returned data is stored in the command data storage cache.

[0135] In step S510, if the value of the task counter recorded in the returned data is different from the updated value, the returned data is not stored in the command data storage cache.

[0136] The "P6.Task Counter" is initialized to 0, and its value is incremented by 1 each time a new task is preempted. This counter value is carried out with the read request and then brought back along with the read data for comparison with the local value. If they match, the read data is stored in the "P3.Command / Data Storage Cache"; otherwise, it is discarded. Before the command stream parser begins executing the high-priority task (the second task), all data in the "P3.Command / Data Storage Cache" must be cleared to avoid affecting the execution of the second task.

[0137] The principle behind setting up the task counter is as follows: Considering that "P4. Command Pre-parser" and "P5. Command Main Parser" execute in parallel within the command stream parser, the progress of "P4. Command Pre-parser" may be ahead. When the second task needs to preempt the hardware resources of the first task, the software driver layer notifies the hardware through the first preemption register, Preemption_Reg. After "P5. Command Main Parser" finds the most recent scheduled command in the first task, it begins preparing to switch to the second task. At this time, "P4. Command Pre-parser" has already executed some commands belonging to the first task and issued a batch of read requests. For these read requests, the command stream parser will not parse the corresponding commands, so when the data returns, the command stream parser needs to discard them directly.

[0138] like Figure 6 The image shows a specific usage example of "P6. Task Counter". At time T0: "P6. Task Counter" is initialized to 0; "P3. Command / Data Storage Cache" is completely empty; the main parser and pre-parser begin preparing to execute the first task.

[0139] like Figure 7 As shown, at time T1: the software notifies the command stream parser of a preemption event via the first preemption register Preemption_Reg, and the command stream parser begins preparing for preemption; at this time, "P5. Command Master Parser" is parsing the scheduling commands in the first task DMAbuffer1, and upon seeing the preemption signal, it immediately notifies "P7. Scheduling Command Thread Group Splitter" to provide feedback on the current status; simultaneously, "P6. Task Counter" increments by one, and from this moment on, the "Command Stream Parser" only receives read data when "P6. Task Counter = 1" and writes it to "P3. Command / Data Storage Buffer"; at this time, "P4. Command Pre-parser" is already pre-parsing the first task DMA buffer2; "P3. Command / Data Storage Buffer" contains the data belonging to the first task DMA buffer1 that has not yet been used by "P5. Command Master Parser", as well as some data belonging to the first task DMA buffer2 read back by "P4. Command Pre-parser"; due to circuit delay, some data belonging to the first task DMA buffer2 is still on the circuit and has not been returned to the command stream parser in time.

[0140] like Figure 8 As shown, at time T2: "Command Master Parser 4" stores the information fed back from "P7. Scheduling Command Thread Group Splitter" into the corresponding register, and then starts to execute the pre-prepared "Storage Command Sequence"; and clears "P3. Command / Data Storage Buffer", and the read data belonging to the first task DMA buffer2 that returned from time T1 to T2 is all discarded by comparing with "P6. Task Counter".

[0141] like Figure 9 As shown, at time T3: the command stream parser completes the "store command sequence" and begins executing the second task; the read data belonging to the first task DMA buffer2 that returned during the time interval T2~T3 is all discarded by comparing with "P6. Task Counter"; at this time, "P3. Command / Data Storage Buffer" is still in a completely empty state and is ready to receive commands / data from the second task.

[0142] In this embodiment, by setting the P6 task counter, the execution of incoming tasks is prevented from being affected by the data of low-priority tasks due to delays in various stages of the hardware pipeline and circuitry, thus reducing the risk of hardware crashes.

[0143] In an exemplary embodiment, before clearing the state flag of the first preemption register by the command master parser, the method further includes: executing a stored command sequence pre-stored in the command data storage cache by the command master parser, and saving the state of the command stream parser that is executing the first task into memory.

[0144] Among them, the storage command sequence is as follows Figure 10 As shown, this includes the command stream parser's state storage commands and the state storage commands of the remaining hardware modules. The stored command sequence is pre-stored in the P3 command data storage cache. The stored command sequence is used to record the execution state of the command stream parser and the remaining hardware modules at the moment of preemption during the first task.

[0145] When a second task preempts a first task, considering that the preempted first task can continue executing its unfinished commands upon subsequent resubmission, the command stream parser needs to execute a storage command before switching to preempt the second task. This storage command records the execution state of the first task at the moment of preemption. The content of this storage command is as follows: Figure 10 As shown. To improve preemption efficiency, the command stream parser will pre-fetch this stored command and place it in the "Stored Command Sequence" position in "P3. Command / Data Storage Cache". When preemption occurs, it can be used directly without spending extra time fetching.

[0146] It should be noted that the first, second, and third preemption registers are stored in memory locations together when parsing the "store command sequence".

[0147] Optionally, the command stream parser executes, for example... Figure 5 As shown, after filtering data through the task counter, before clearing the status flag of the first preemption register through the command master parser, the stored command sequence is executed, including: executing the stored command sequence pre-stored in the P3 command data storage cache through the first dedicated command master parser, and saving the status of the command stream parser that is executing the first task to memory.

[0148] In this embodiment, the pre-stored sequence of storage commands can improve preemption efficiency.

[0149] In one exemplary embodiment, such as Figure 11 As shown, the split state includes the preemption boundary type and the thread group progress; the split state is recorded through the second preemption register, including step S1102: the preemption boundary type of the thread group boundary and the thread group progress are recorded through the second preemption register.

[0150] After reading the address of the second task from the command queue register through the command master parser and executing the second task, the method further includes steps S1104 to S1112:

[0151] Step S1104: The command master parser executes a portion of the recovery command sequence from the recovery command sequence pre-stored in the command data storage cache to restore the state of the second preemption register and the third preemption register.

[0152] The recovery command sequence is as follows: Figure 12 As shown, this corresponds to the aforementioned storage command sequence. A portion of the recovery command sequence is shown below. Figure 12 The command stream parser in the command stream is used to restore the state of some commands in the command stream.

[0153] The split status includes preemption boundary type and thread group progress; it can also be the identifier of the current scheduling command, such as ID. For example, if the identifier of the current scheduling command is scheduling command 1 (the scheduling command in the DMA buffer of the first task), the preemption boundary type is thread group boundary; scheduling command 1 includes 1000 threads, and the thread group progress is that threads 1-199 have completed, and threads 200-1000 have not completed. Another example: if the identifier of the current scheduling command is scheduling command 1 (the scheduling command in the DMA buffer of the first task), the preemption boundary type is scheduling command boundary; scheduling command 1 includes 1000 threads, and the thread group progress is that threads 1-1000 have not completed.

[0154] Preemption boundary types can include thread group boundaries, scheduling command boundaries, and DMA boundaries (granularity from smallest to largest).

[0155] For the first task that has not been completed, when the software driver layer resubmits the incomplete first task to the command stream parser, the command stream parser needs to first execute a portion of the recovery command sequence to restore the state before the first task ended, and then execute the unfinished commands in the first task, such as... Figure 12 The remaining part of the recovery command sequence, as well as the remaining hardware module state recovery commands.

[0156] Step S1106: Obtain the preemption boundary type recorded in the second preemption register as the thread group boundary through the command master parser, and continue to execute the command sequence after the partial recovery command sequence in the recovery command sequence to restore the state of the command stream parser.

[0157] Among them, the command sequence following a portion of the recovery command sequence is also as follows: Figure 12 The remaining part of the recovery command sequence in the command stream parser state recovery command, as well as the remaining hardware module state recovery commands.

[0158] Optionally, since both this embodiment and the aforementioned embodiments involve preemption occurring between the time the "command master parser 4" sends the scheduling command to the "P7. scheduling command thread group splitter" and the time the "P7. scheduling command thread group splitter" reports completion, i.e., "thread group-level preemption," therefore, as Figure 13 As shown, when the preemption boundary type recorded in the second preemption register is the thread group boundary, the first dedicated command main parser continues to execute the command sequence after the partial recovery command sequence in the recovery command sequence, so as to restore the state of the complete command stream parser when it executed the first task.

[0159] Optionally, when executing the "command stream parser state recovery command", if the previous preemption boundary (recorded in the second preemption register Preemption_Reg) is found to be a scheduling command boundary, the command stream parser needs to continue parsing the subsequent recovery commands in the recovery command sequence to restore the state of the entire GPU; however, if the previous preemption boundary is found to be a DMA boundary, in order to improve efficiency, the subsequent recovery commands in the recovery command sequence can be skipped directly.

[0160] Similar to "Store Command Sequence", the command stream parser will retrieve this recovery command in advance and place it in the "Recover Command Sequence" position in "P3. Command / Data Storage Cache". When a resubmission occurs, it can be used directly without spending extra time retrieving it.

[0161] Prioritizing the parsing of "command stream parser state recovery commands" during resubmission can guide whether subsequent commands in the "recovery command sequence" are executed, further improving the efficiency of resubmission.

[0162] Step S1108: The command grabber retrieves the resubmitted current scheduling command from memory as a subcommand sequence packet of the first task.

[0163] Continue as Figure 13As shown, the command stream parser retrieves the resubmitted currently scheduled command from memory into the sub-command sequence packet (DMA buffer) of the first task through the command grabber, but does not execute the commands in the sub-command sequence packet.

[0164] Step S1110: The command master parser obtains the current scheduling command based on the position of the current scheduling command in the sub-command sequence packet recorded in the third preemption register.

[0165] Since this embodiment involves thread group-level preemption, the current scheduling command has not yet been completed. The first dedicated command master parser also needs to cooperate with P7. the scheduling command thread group splitter to continue completing the incomplete portion of the current thread group in the sub-command sequence packet of the first task. The command stream parser skips commands before the position of the current scheduling command recorded in the third preemption register in the sub-command sequence packet through the first dedicated command master parser until it obtains the current scheduling command.

[0166] Step S1112: The thread group progress recorded in the splitting state of the second preemption register is passed to the scheduling command thread group splitter through the command master parser.

[0167] The scheduling command thread group splitter is used to determine the breakpoint based on the thread group progress and resume the splitting and sending work at the breakpoint.

[0168] Optionally, the command stream parser obtains the thread group progress in the recorded split state from the second preemption register through the first dedicated command master parser, and passes the thread group progress to the scheduling command thread group splitter, which then starts to reallocate computing units based on the incomplete thread groups (i.e., breakpoints).

[0169] In this embodiment, on the one hand, the practice of prefetching the "store command sequence" needed when preemption occurs and the "recovery command sequence" needed when resubmitting occurs, and storing them in the local cache, can further improve the efficiency of preemption / resubmitting; on the other hand, the split status recorded by the second preemption register, which includes the thread group progress and preemption boundary type, can support thread group-level preemption, resulting in higher preemption efficiency in actual application scenarios.

[0170] In an exemplary embodiment, each subcommand sequence packet includes a hardware / software synchronization command; the hardware / software synchronization command is located at the end of the subcommand sequence packet; the hardware / software synchronization command is used to instruct the command master parser to write the counting information into a specified area in the memory pointed to by the address information.

[0171] The command master parser also includes a second dedicated command master parser, such as... Figure 2 The "Command Master Parser 5" shown.

[0172] After a preemption occurs, in order for the software driver layer to know the hardware's execution progress of Task 1 and to resubmit only the unfinished DMA from Task 1 to the hardware after Task 2 finishes execution, the software driver layer is required to insert a new "software-hardware synchronization command" at the end of each DMA buffer, such as... Figure 14 As shown, a hardware / software synchronization command is inserted after each DMA buffer. Hardware / software synchronization command 1 is inserted after DMA buffer1; hardware / software synchronization command 2 is inserted after DMA buffer2; and hardware / software synchronization command 3 is inserted after DMA buffer3.

[0173] Optionally, the hardware / software synchronization command includes address information and counter information. Each address information corresponding to the same task is identical, and each counter information corresponds to a sub-command sequence packet, with the counter information increasing with the number of sub-command sequence packets. Each hardware / software synchronization command carries both address and counter information. For ease of memory management, the address information of the hardware / software synchronization commands carried by different DMA buffers for the same task can be consistent, while the counter records the number of DMA buffers that have been sent. For example, in the above diagram, "DMA buffer1~3" correspond to the same computation task, so the addresses of "hardware / software synchronization commands 1~3" are the same; "DMA buffer1" is the first DMA buffer of this computation task, and the "counter = 1" in the corresponding "hardware / software synchronization command 1"; "DMA buffer2" and "DMA buffer3" correspond to the "counters" in "hardware / software synchronization commands 2" and "hardware / software synchronization commands 3" respectively, with values ​​of 2 and 3.

[0174] Add "Command Pre-parser 5" to "P4. Command Pre-parser" to pre-parse "hardware-software synchronization commands". Command Pre-parser 5 needs to report that pre-parse is complete upon receiving the command. (The following is a continuation of the previous sentence, likely related to a different topic): Figure 2 Command pre-parsers 1, 2, 3, and 4, as shown, have completed the pre-parse of all commands in sub-command sequence packet 1 in DMA buffer 1. At this point, the last command parsed is the hardware / software synchronization command. Therefore, the dedicated pre-parse unit of the command pre-parser, such as command pre-parser 5, identifies and extracts the hardware / software synchronization command. Since the hardware / software synchronization command does not involve return data, command pre-parser 5 can immediately report the completion of pre-parse upon receiving the command.

[0175] Similarly, a "Command Master Parser 5" is added to the "P5. Command Master Parser". The "P5. Command Master Parser" is used to parse the "hardware-software synchronization command". Upon receiving the command, Command Master Parser 5 writes the "counter" information carried on the command into the memory area (resource) pointed to by the "address" information, so that the software driver layer can obtain the hardware execution progress in real time. The command stream parser communicates with the Command Master Parser as follows: Figure 2 The command parsers 1, 2, 3, and 4 (dedicated to executing scheduling commands) shown have completed parsing all commands in sub-command sequence packet 1 in DMA buffer 1. At this point, the last command parsed is the hardware / software synchronization command. The command stream parser updates the information in the hardware / software synchronization command to memory through the dedicated main parsing unit of the command parser, such as command parser 5. Each update of a hardware / software synchronization command to memory indicates that the command parser has completed parsing a sub-command sequence packet for a task.

[0176] In this embodiment, by introducing a "software-hardware synchronization command", the information in the software-hardware synchronization command is updated in memory so that the software driver layer can keep track of the hardware's execution progress in real time, providing a strong basis for the resubmission of low-priority tasks that have not been executed. Furthermore, the use of this command can make it easier for the software driver layer to skip completed DMA buffers and avoid repeatedly submitting completed command sequences.

[0177] In an exemplary embodiment, a preemption method occurs when the current scheduling command completes execution, such as... Figure 15 As shown, it also includes steps S1502 to S1504. Wherein:

[0178] In step S1502, if the status flag in the first preemption register is a preset flag, and the thread group splitting completion status of the current scheduling command is fed back by the thread group splitter, and the splitting completion status is recorded through the second preemption register, the position of the current scheduling command in the sub-command sequence packet corresponding to the first task is recorded through the third preemption register, and the status flag in the first preemption register is cleared through the command master parser.

[0179] Optionally, if a preemption occurs at a certain time, the status flag in the first preemption register is set to a preset flag, such as... Figure 4The "Hardware: Has Preemption Occurred (3)" stage is shown. This indicates that preemption occurs when the "Command Master Parser 4" receives the completion status feedback from "P7. Scheduling Command Thread Group Splitter", but the command dispatcher has not yet started distributing new commands. The command stream parser can respond immediately. Preemption at this stage also belongs to scheduling command level preemption, but the scheduling command does not need to be executed again when resubmitted. The first dedicated command master parser receives the splitting completion status of the thread group of the current scheduling command from the scheduling command thread group splitter. For example, if scheduling command 1 includes 1000 threads, all 1000 threads have been split. The command stream parser records the splitting completion status through the second preemption register.

[0180] When the status flag in the first preemption register is a preset flag, and the thread group splitting completion status of the current scheduling command is fed back by the thread group splitter, and the splitting completion status is recorded through the second preemption register, the position of the current scheduling command in the sub-command sequence packet corresponding to the first task is recorded through the third preemption register. The command stream parser can also perform the data filtering through the task counter and the execution of the stored command sequence as described in the aforementioned embodiment; and clear the status flag in the first preemption register through the command master parser to notify the software driver layer command stream parser that it is ready to execute the second task.

[0181] Step S1504: Read the address of the second task in the command queue register through the command master parser and execute the second task.

[0182] Optionally, after the software driver layer detects that the status flag of the first preemption register has been cleared, it updates the second task in the command stream parser: the software driver layer sends the memory address of the second task, such as the direct memory access buffer address, to the command queue register of the command stream parser. At this time, the command stream parser receives the memory address of the second task through the command queue register. The first dedicated main command parser (command main parser 4) then reads and executes the second task based on its memory address.

[0183] After reading the address of the second task from the command queue register through the command master parser and executing the second task, the method further includes steps S1506 to S1512. Wherein:

[0184] Step S1506: The command master parser executes a portion of the recovery command sequence from the recovery command sequence pre-stored in the command data storage cache to restore the state of the second preemption register and the third preemption register.

[0185] Step S1508: Obtain the preemption boundary type recorded in the second preemption register as the scheduling command boundary through the command master parser, and continue to execute the command sequence after the partial recovery command sequence in the recovery command sequence to restore the state of the command stream parser.

[0186] The recovery command sequence is as follows: Figure 12 As shown, this corresponds to the aforementioned storage command sequence. A portion of the recovery command sequence is shown below. Figure 12 The command stream parser in the command stream restores a portion of the command sequence. The command sequence following the restored command sequence is also shown below. Figure 12 The command stream parser contains another part of the state recovery commands, as well as the remaining hardware module state recovery commands.

[0187] Optionally, the command stream parser executes a partial recovery command sequence from the recovery command sequence pre-stored in the command data storage cache through the main command parser, to first restore the states of the second and third preemption registers. Since this embodiment involves preemption occurring when the "main command parser 4" receives the completion status feedback from "P7. Scheduling Command Thread Group Splitter," but the command dispatcher has not yet started distributing new commands, therefore, as... Figure 13 As shown, the command stream parser obtains the preemption boundary type, which is the scheduling command boundary, recorded in the second preemption register through the main command parser. Figure 13 The judgment shown in the figure, "whether the hardware task is preempted at the scheduling command boundary", means that if it is, the command sequence after the partial recovery command sequence in the recovery command sequence will continue to be executed, and the remaining hardware modules will execute the first task.

[0188] Step S1510 uses a command grabber to retrieve the sub-command sequence packet of the first task corresponding to the resubmitted current scheduling command from memory.

[0189] Optionally, continue as follows Figure 13 As shown, the command stream parser retrieves the resubmitted currently scheduled command from memory into the sub-command sequence packet (DMA buffer) of the first task through the command grabber, but does not execute the commands in the sub-command sequence packet.

[0190] Step S1512: The command master parser obtains the next command of the current scheduling command according to the position of the current scheduling command in the sub-command sequence packet recorded in the third preemption register, and processes the next command.

[0191] Since this embodiment involves preemption at the scheduling command level, once the current scheduling command has been executed, the first dedicated command parser skips the commands preceding the current scheduling command's position in the sub-command sequence packet (as recorded in the third preemption register) and retrieves the next command, then processes it.

[0192] In this embodiment, on the one hand, pre-storing the "storage command sequence" and "restore command sequence" can further improve the efficiency of preemption / resubmission; on the other hand, the split status recorded by the second preemption register, which includes the thread group progress and preemption boundary type, can support preemption at the scheduling command level, resulting in higher preemption efficiency in practical application scenarios.

[0193] In an exemplary embodiment, a preemption method occurs when the current scheduling command is not executed, such as... Figure 16 As shown, it also includes steps S1602 to S1604. Wherein:

[0194] In step S1602, if the status flag in the first preemption register is a preset flag and the current scheduling command is received through the command master parser, the position of the current scheduling command in the sub-command sequence packet of the first task is recorded in the second preemption register through the command master parser, and the status flag of the first preemption register is cleared through the command master parser.

[0195] Optionally, the command stream parser receives the current scheduled command through the main command parser. Preemption occurs before the current scheduled command is executed; that is, preemption occurs before "main command parser 4" sends the scheduled command to "P7. Scheduled Command Thread Group Splitter". For example... Figure 4 The "Has preemption occurred?" (1) indicates that this is equivalent to preemption at the scheduling command level, but the scheduling command needs to be executed again when it is resubmitted.

[0196] Step S1604: Read the address of the second task in the command queue register through the command master parser and execute the second task.

[0197] Optionally, after the software driver layer detects that the status flag of the first preemption register has been cleared, it updates the second task in the command stream parser: the software driver layer sends the memory address of the second task, such as the direct memory access buffer address, to the command queue register of the command stream parser. At this time, the command stream parser receives the memory address of the second task through the command queue register. The first dedicated main command parser (command main parser 4) then reads and executes the second task based on its memory address.

[0198] After reading the address of the second task from the command queue register through the command master parser and executing the second task, the method further includes steps S1606 to S1612. Wherein:

[0199] Step S1606: The command master parser executes a portion of the recovery command sequence from the recovery command sequence pre-stored in the command data storage cache to restore the state of the second preemption register and the third preemption register.

[0200] Step S1608: Obtain the preemption boundary type recorded in the second preemption register as the scheduling command boundary through the command master parser, and continue to execute the command sequence after the partial recovery command sequence in the recovery command sequence to restore the state of the command stream parser.

[0201] For the first task that has not been completed, when the software driver layer resubmits the incomplete first task to the command stream parser, the command stream parser needs to first execute a recovery command to restore the state of the second and third preemption registers before the first task ended, and then execute the unfinished commands in the first task. Since this is now preemption at the scheduling command level, such as... Figure 13 As shown, the command sequence following the partial recovery command sequence in the recovery command sequence continues to be executed to restore the state of the complete command stream parser.

[0202] Step S1610: Use a command grabber to retrieve the sub-command sequence packet of the first task corresponding to the resubmitted current scheduling command from memory.

[0203] Continue as Figure 13 As shown, the command stream parser retrieves the resubmitted currently scheduled command from memory into the sub-command sequence packet (DMA buffer) of the first task through the command grabber, but does not execute the commands in the sub-command sequence packet.

[0204] Step S1612: The command master parser obtains the current scheduling command based on the position of the current scheduling command in the sub-command sequence packet recorded in the third preemption register, and processes the current scheduling command.

[0205] Since this embodiment involves preemption at the scheduling command level, the current scheduling command has not yet begun execution. The command stream parser, through the command master parser, skips the commands preceding the position of the current scheduling command recorded in the third preemption register within the sub-command sequence packet, obtains the current scheduling command, and processes it. Processing the current scheduling command includes: sending the current scheduling command to the scheduling command thread group splitter, which splits the current scheduling command and sends it to the corresponding computation unit for execution.

[0206] In this embodiment, preemption at the scheduling command level can be supported, resulting in higher preemption efficiency in practical application scenarios.

[0207] In an exemplary embodiment, when the status flag in the first preemption register is a preset flag and the sub-command sequence packet containing the current scheduling command has been executed, the command master parser records the position of the sub-command sequence packet of the first task containing the current scheduling command in the second preemption register, and clears the status flag of the first preemption register. After reading the address of the second task in the command queue register and executing the second task through the command master parser, the method further includes: executing a portion of the recovery command sequence in the recovery command sequence pre-stored in the command data storage cache through the command master parser to restore the status of the second preemption register and the third preemption register; when the preemption boundary type recorded in the second preemption register is a direct memory access boundary, the command master parser obtains the next sub-command sequence packet of the sub-command sequence packet according to the position of the sub-command sequence packet recorded in the third preemption register, and processes the next sub-command sequence packet.

[0208] Optionally, such as Figure 13 As shown, the command master parser executes a portion of the recovery command sequence from the recovery command sequence pre-stored in the command data storage cache to restore the states of the second preemption register and the third preemption register. If the command master parser obtains that the preemption boundary type recorded in the second preemption register is a direct memory access (DMA) boundary, the command master parser skips the remaining recovery command sequence from the recovery command sequence pre-stored in the command data storage cache to improve efficiency. At the same time, the command master parser obtains the next sub-command sequence packet from the sub-command sequence packet based on the position of the sub-command sequence packet recorded in the third preemption register and processes the next sub-command sequence packet.

[0209] In this embodiment, DMA-level preemption can be supported, which can improve efficiency.

[0210] In one exemplary embodiment, a task preemption method is provided, which is applied to Figure 1 Taking the software driver layer as an example, the process includes: when a second task is detected, a preset flag is written to the first preemption register of the command stream parser as a status flag. When the status flag in the first preemption register is cleared, the address of the second task is sent to the command queue register of the command stream parser.

[0211] The command stream parser, when the status flag in the first preemption register is set to a preset flag and the current scheduling command of the first task is sent to the scheduling command thread group splitter, sends a stop signal to the scheduling command thread group splitter through the main command parser and receives the splitting status of the thread group of the current scheduling command from the scheduling command thread group splitter. When the splitting status is recorded through the second preemption register, the parser records the position of the current scheduling command in the sub-command sequence packet through the third preemption register and clears the status flag in the first preemption register through the main command parser. The command stream parser is used to read the address of the second task from the command queue register through the main command parser and execute the second task. The preset flag indicates that the priority of the second task is higher than that of the first task; the current scheduling command is the currently executing scheduling command in the sub-command sequence packet of the first task.

[0212] Optionally, if the command stream parser is executing a low-priority first task, such as the first sub-command sequence packet for background AI inference, and the software driver layer detects that a new frame needs to be rendered immediately (a high-priority real-time rendering task, the second task), then the software driver layer notifies the command stream parser of preemption via the first preemption register (Preemption_Reg), achieved by modifying the status flag of the first preemption register to a preset flag, such as 1. The command stream parser then executes the first sub-command sequence packet (including the currently scheduled command) through the first dedicated command master parser in the command master parser. The command master parser includes the first dedicated command master parser, such as... Figure 2 The command parser shown is 4.

[0213] After the status flag in the first preemption register is set to the preset flag, and the first dedicated command parser sends the current scheduling command of the first task to the scheduling command thread group splitter, if the "scheduling command thread group splitter" reports completion of the previous stages, it indicates that the current scheduling command has started execution, but the current scheduling command has not yet been completed, that is, as... Figure 4The "Hardware: Has Preemption Occurred (2)" stage is shown. A stop signal is sent to the scheduling command thread group splitter via the first dedicated command master parser. Upon receiving the stop signal, the scheduling command thread group splitter immediately stops splitting and feeds back the splitting status of the current scheduling command's thread group to the first dedicated command master parser. The first dedicated command master parser receives the splitting status of the current scheduling command's thread group from the scheduling command thread group splitter. The first dedicated command master parser updates the splitting status of the second preemption register, Preemption_Reg. While recording the splitting status through the second preemption register, the third preemption register records the position of the current scheduling command in the sub-command sequence packet. The first dedicated command master parser clears the status flag of the first preemption register to inform the software driver layer to prepare for executing high-priority tasks.

[0214] After the software driver layer detects that the status flag of the first preemption register has been cleared, it updates the second task in the command stream parser: the software driver layer sends the memory address of the second task, such as the address of a direct memory access buffer, to the command queue register of the command stream parser. At this time, the command stream parser receives the memory address of the second task through the command queue register. The first dedicated main command parser then reads the second task based on its memory address and executes it.

[0215] In this embodiment, by transmitting notifications between the command stream parser and the software driver layer through the first preemption register, the delay between the issuance of the preemption request and the start of execution of the high-priority task can be greatly shortened, thereby improving preemption efficiency.

[0216] In an exemplary embodiment, before reading the address of the second task in the command queue register through the command master parser and executing the second task, the method further includes: storing the stored command sequence and the restored command sequence in memory; and sending the memory addresses of the stored command sequence and the restored command sequence to the command stream parser.

[0217] The stored command sequence is used to save the state of the command stream parser executing the first task into memory when a preset flag is present. The stored command sequence corresponds to the restored command sequence. The command stream parser uses a command data storage cache to store the stored command sequence and the restored command sequence read from memory addresses. The stored command sequence is also used to save the state of the GPU corresponding to the command stream parser executing the first task into memory when a preset flag is present; similarly, the GPU state can be obtained when the restored command sequence is executed.

[0218] Optionally, the software driver layer needs to store the aforementioned "store command sequence" and "restore command sequence" in a memory unit beforehand. When the graphics processing unit (GPU) is powered on, the software driver layer informs the command stream parser of the corresponding memory unit address. The command stream parser reads the "store command sequence" and "restore command sequence" according to the memory unit address and stores them in the corresponding space of "P3. Command / Data Storage Cache".

[0219] In this embodiment, by prefetching the "storage command sequence" needed when preemption occurs and the "recovery command sequence" needed when resubmission occurs, and storing them in the local cache, the efficiency of preemption / resubmission can be further improved.

[0220] In an exemplary embodiment, each subcommand sequence packet includes a hardware / software synchronization command; the method further includes: inserting a hardware / software synchronization command at the end of the subcommand sequence packet; the hardware / software synchronization command includes address information and count information; the hardware / software synchronization command is used to instruct the command master parser to write the count information into a specified area in the memory pointed to by the address information.

[0221] The software driver layer inserts a new "hardware-software synchronization command" at the end of each DMA buffer, such as... Figure 14 As shown, a hardware / software synchronization command is inserted after each DMA buffer. Hardware / software synchronization command 1 is inserted after DMA buffer1; hardware / software synchronization command 2 is inserted after DMA buffer2; and hardware / software synchronization command 3 is inserted after DMA buffer3.

[0222] Each address information corresponding to the same task is identical, and each counter information corresponds to a sub-command sequence packet, with the counter information increasing with the number of sub-command sequence packets. Each "hardware-software synchronization command" carries "address" information and "counter" information. From the perspective of convenient memory management, the "address" information of the "hardware-software synchronization commands" carried by different DMA buffers for the same task can be consistent, and the "counter" records the number of DMA buffers that have been sent. For example, "DMA buffer1~3" in the above figure correspond to the same computing task, so the addresses of "hardware-software synchronization commands 1~3" are the same; "DMA buffer1" is the first DMA buffer of this computing task, and the "counter = 1" in the corresponding "hardware-software synchronization command 1"; "DMA buffer2" and "DMA buffer3" correspond to the "counters" in "hardware-software synchronization commands 2" and "hardware-software synchronization commands 3" respectively, with values ​​of 2 and 3.

[0223] In this embodiment, by introducing a "software-hardware synchronization command", the software driver layer can keep track of the hardware's execution progress in real time, providing a strong basis for subsequent task resubmission. Furthermore, the use of this command can make it easier for the software driver layer to skip completed DMA buffers and avoid repeatedly submitting completed command sequences.

[0224] In one exemplary embodiment, a task preemption method is provided, which is applied to Figure 2 The following is an explanation using the scheduling command thread group splitter as an example: when the status flag in the first preemption register is set to the preset flag and the command master parser sends the current scheduling command of the first task to the scheduling command thread group splitter, the system receives a stop signal sent by the command master parser; it stops splitting the thread group of the current scheduling command according to the stop signal and feeds back the splitting status of each thread group to the command master parser.

[0225] The preset flag indicates that the priority of the second task is higher than that of the first task. The current scheduling command is the currently executing scheduling command in the sub-command sequence packet of the first task.

[0226] Optionally, when the status flag in the first preemption register is a preset flag, and the command master parser sends the current scheduling command of the first task to the scheduling command thread group splitter—that is, after the first dedicated command master parser sends the current scheduling command of the first task to the scheduling command thread group splitter—the scheduling command thread group splitter completes the preemption that occurred in the previous stage. The scheduling command thread group splitter receives a stop signal sent by the command master parser; it stops splitting the thread group of the current scheduling command according to the stop signal, and feeds back the splitting status of each thread group to the command master parser. For example, if the current scheduling command is scheduling command 1, there are 1000 threads, thread groups 1-199 have been split, while thread groups 200-1000 have not yet been split. The splitting status is fed back to the first dedicated command master parser, which updates the splitting status in the second preemption register.

[0227] In this embodiment, thread-level task preemption can be achieved through the interaction between the command stream parser and the scheduling command thread group splitter.

[0228] In an exemplary embodiment, the split status includes preemption boundary type and thread group progress; feeding back the split status of each thread group to the command master parser includes: feeding back the progress and preemption boundary type of each thread group to the command master parser.

[0229] The progress of each thread group includes the status of the thread group being split and the status of the thread group not being split.

[0230] Since the first dedicated command parser sends the current scheduling command for the first task to the scheduling command thread group splitter, and the scheduling command thread group splitter reports back the preemption that occurred before completion, the progress of each thread group includes both the split-completed and unsplit-completed states of the thread group. The preemption boundary type is also the scheduling command boundary. The scheduling command thread group splitter reports both the thread group progress and the preemption boundary type of the scheduling command boundary to the second preemption register.

[0231] In this embodiment, thread-level task preemption can be achieved by feeding back thread group progress and preemption boundary type.

[0232] In an exemplary embodiment, the method further includes: when the scheduling command thread group splitter reports that the thread group of the current scheduling command is in a splitting completed state, feeding back the splitting completed state and the preemption boundary type to the command master parser, wherein the command master parser records that the preemption boundary type is a scheduling command boundary.

[0233] Optionally, preemption occurs when the scheduling command thread group splitter reports that the thread group of the current scheduling command is in a splitting completed state, that is, when "command master parser 4" receives the completed state from "P7. scheduling command thread group splitter", but the command dispatcher has not yet started distributing new commands. In this case, the scheduling command thread group splitter feeds back the splitting completed state and the preemption boundary type to the command master parser; the command master parser records the preemption boundary type as a scheduling command boundary.

[0234] In an exemplary embodiment, the method further includes: when the thread group splitter of the current scheduling command reports that the thread group of the current scheduling command is in an incomplete splitting state, feeding back the incomplete splitting state, the state of the thread group that has been split, and the preemption boundary type to the command master parser, and the command master parser records the preemption boundary type as the thread group boundary of the scheduling command.

[0235] Optionally, preemption occurs when the thread group splitter of the current scheduling command reports that the thread group is in an incomplete splitting state, i.e., when the "command master parser 4" receives feedback from "P7. scheduling command thread group splitter" that the thread group is in an incomplete splitting state. In this case, the scheduling command thread group splitter feeds back the incomplete splitting state, the state of the thread group that has been split, and the preemption boundary type to the command master parser. The command master parser records the preemption boundary type as the scheduling command thread group boundary.

[0236] In this embodiment, thread-level task preemption can be achieved by feeding back thread group progress and preemption boundary type.

[0237] In one exemplary embodiment, a preemption system includes a software driver layer, a command stream parser, and a scheduled command thread group splitter; the command stream parser includes a first preemption register, a second preemption register, a third preemption register, a command master parser, and a command queue register.

[0238] The software driver layer is used to write a preset flag as a status flag to the first preemption register of the command stream parser when a second task is detected; the preset flag indicates that the priority of the second task is higher than that of the first task.

[0239] The command master parser is used to send a stop signal to the scheduling command thread group splitter when the status flag in the first preemption register is set to a preset flag and the command master parser sends the current scheduling command of the first task to the scheduling command thread group splitter; wherein, the current scheduling command is the scheduling command currently being executed in the sub-command sequence packet of the first task.

[0240] The scheduling command thread group splitter receives a stop signal from the command master parser, stops splitting the thread group of the current scheduling command according to the stop signal, and feeds back the splitting status of each thread group of the current scheduling command to the command master parser.

[0241] The command master parser is also used to receive the splitting status of the thread group for the current scheduling command from the thread group splitter.

[0242] The second preemption register is used to record the splitting state.

[0243] The third preemption register is used to record the position of the current scheduling command in the subcommand sequence packet.

[0244] The command master parser is used to clear the status flags of the first preemption register.

[0245] The software driver layer is also used to send the address of the second task to the command queue register of the command stream parser when the status flag of the first preemption register is detected to be cleared.

[0246] The command master parser is used to read the address of the second task from the command queue register and execute the second task.

[0247] It should be understood that although the steps in the flowcharts of the embodiments described above are shown sequentially according to the arrows, these steps are not necessarily executed in the order indicated by the arrows. Unless explicitly stated herein, there is no strict order restriction on the execution of these steps, and they can be executed in other orders. Moreover, at least some steps in the flowcharts of the embodiments described above may include multiple steps or multiple stages. These steps or stages are not necessarily completed at the same time, but can be executed at different times. The execution order of these steps or stages is not necessarily sequential, but can be performed alternately or in turn with other steps or at least some of the steps or stages of other steps.

[0248] Based on the same inventive concept, this application also provides a command stream parser for implementing the task preemption method described above. The solution provided by this device is similar to the implementation described in the above method; therefore, the specific limitations in one or more command stream parser embodiments provided below can be found in the limitations of the task preemption method described above, and will not be repeated here.

[0249] In one exemplary embodiment, such as Figure 2 As shown, a command stream parser is provided, including: a first preemption register, a second preemption register, a third preemption register, and a main command parser, wherein:

[0250] The first preemption register is used to receive a preset flag indicating that the priority of the second task is higher than that of the first task.

[0251] The command master parser is used to send a stop signal to the scheduling command thread group splitter when the status flag in the first preemption register is set to a preset flag and the command master parser sends the current scheduling command of the first task to the scheduling command thread group splitter, and to receive the splitting status of the thread group of the current scheduling command fed back by the scheduling command thread group splitter; the current scheduling command is the scheduling command currently being executed in the sub-command sequence packet of the first task.

[0252] The second preemption register is used to record the splitting state.

[0253] The third preemption register is used to record the position of the current scheduling command in the subcommand sequence packet.

[0254] The command parser is also used to clear the status flags of the first preemption register.

[0255] The main command parser is also used to read the address of the second task in the command queue register and execute the second task; wherein, the address of the second task is sent to the command queue register of the command stream parser after the software driver layer detects the clear status flag.

[0256] In one exemplary embodiment, the command stream parser further includes:

[0257] The task counter is used to update the current value and obtain the updated value.

[0258] The command data storage cache is used to receive return data from memory; the return data carries the value recorded by the task counter; if the value recorded by the task counter carried by the return data is the same as the updated value, the return data is stored.

[0259] In one exemplary embodiment, the command data storage cache is further configured to: if the value recorded by the task counter carried in the returned data is different from the updated value, then the returned data is not stored.

[0260] In one exemplary embodiment, the command data storage cache is further configured to: clear all data belonging to the first task from the command data storage cache.

[0261] In an exemplary embodiment, the command master parser is also configured to execute a stored command sequence pre-stored in the command data storage cache and save the state of the command stream parser performing the first task into memory.

[0262] In an exemplary embodiment, the split state includes a preemption boundary type and a thread group progress; a second preemption register is used to record the preemption boundary type as the thread group boundary and the thread group progress.

[0263] The command master parser executes a portion of the recovery command sequence from the recovery command sequence pre-stored in the command data storage cache to restore the state of the second and third preemption registers; it obtains the preemption boundary type recorded in the second preemption register as the thread group boundary, and continues to execute the command sequence following the portion of the recovery command sequence to restore the state of the command stream parser.

[0264] A command fetcher is used to fetch a subcommand sequence packet of the currently scheduled command that has been resubmitted from memory in the first task.

[0265] The command parser is also used to obtain the current scheduling command; it passes the thread group progress in the split state recorded in the second preemption register to the scheduling command thread group splitter, wherein the scheduling command thread group splitter is used to determine the breakpoint based on the thread group progress and resume the splitting and sending work at the breakpoint.

[0266] In an exemplary embodiment, each subcommand sequence packet includes a hardware / software synchronization command; the hardware / software synchronization command is located at the end of the subcommand sequence packet; the hardware / software synchronization command is used to instruct the command master parser to write the counting information into a specified area in the memory pointed to by the address information.

[0267] In an exemplary embodiment, when the status flag in the first preemption register is a preset flag, and the thread group splitting completion status of the current scheduling command is fed back by the thread group splitter, and the splitting completion status is recorded through the second preemption register, the third preemption register is used to record the position of the current scheduling command in the sub-command sequence packet corresponding to the first task.

[0268] The command parser is used to clear the status flag of the first preemption register; read the address of the second task in the command queue register and execute the second task.

[0269] In an exemplary embodiment, the command master parser is configured to execute a partial recovery command sequence from the recovery command sequence pre-stored in the command data storage cache to restore the state of the second preemption register and the third preemption register; obtain the preemption boundary type recorded in the second preemption register as the scheduling command boundary; and continue executing the command sequence following the partial recovery command sequence from the recovery command sequence to restore the state of the command stream parser.

[0270] The command fetcher is used to fetch the subcommand sequence packet of the first task corresponding to the resubmitted current scheduling command from memory.

[0271] The command parser is also used to obtain the next command of the current scheduled command based on the position of the current scheduled command in the subcommand sequence packet recorded in the third preemption register, and to process the next command.

[0272] In an exemplary embodiment, when the status flag in the first preemption register is a preset flag and the current scheduling command is received through the command master parser, the second preemption register is used to record the position of the current scheduling command in the sub-command sequence packet of the first task.

[0273] The command master parser is used to clear the status flag of the first preemption register; the address of the second task in the command queue register is read and the second task is executed.

[0274] In an exemplary embodiment, the command master parser is configured to execute a partial recovery command sequence from the recovery command sequence pre-stored in the command data storage cache to restore the state of the second preemption register and the third preemption register; obtain the preemption boundary type recorded in the second preemption register as the scheduling command boundary; and continue executing the command sequence following the partial recovery command sequence from the recovery command sequence to restore the state of the command stream parser.

[0275] The command fetcher is used to fetch the subcommand sequence packet of the first task corresponding to the resubmitted current scheduling command from memory.

[0276] The command master parser is used to obtain the current scheduling command based on the position of the current scheduling command in the subcommand sequence packet recorded in the third preemption register, and to process the current scheduling command.

[0277] In an exemplary embodiment, the command master parser is configured to execute a portion of the recovery command sequence in the recovery command sequence pre-stored in the command data storage cache to restore the state of the second preemption register and the third preemption register; if the command master parser obtains the preemption boundary type recorded in the second preemption register as a direct memory access boundary, the command master parser is also configured to process the next sub-command sequence packet of the sub-command sequence packet in which the current scheduling command is located.

[0278] In one exemplary embodiment, a computer device is provided, which may be a server, and its internal structure diagram may be as follows: Figure 17 As shown, this computer device includes a processor, memory, input / output interfaces (I / O), and a communication interface. The processor, memory, and I / O interfaces are connected via a system bus, and the communication interface is also connected to the system bus via the I / O interfaces. The processor provides computational and control capabilities. The memory includes non-volatile storage media and internal memory. The non-volatile storage media stores the operating system, computer programs, and a database. The internal memory provides the environment for the operation of the operating system and computer programs stored in the non-volatile storage media. The database stores task data. The I / O interfaces are used for exchanging information between the processor and external devices. The communication interface is used for communication with external terminals via a network connection. When the computer program is executed by the processor, it implements a task preemption method.

[0279] Those skilled in the art will understand that Figure 17The structure shown is merely a block diagram of a portion of the structure related to the present application and does not constitute a limitation on the computer device to which the present application is applied. Specific computer devices may include more or fewer components than those shown in the figure, or combine certain components, or have different component arrangements.

[0280] In one embodiment, a chip is also provided, including a memory and a processor, wherein the memory stores a computer program, and the processor executes the computer program to implement the steps in the above method embodiments.

[0281] In one embodiment, a computer device is also provided, including a memory and a processor, wherein the memory stores a computer program, and the processor executes the computer program to implement the steps in the above method embodiments.

[0282] In one embodiment, a computer-readable storage medium is provided having a computer program stored thereon that, when executed by a processor, implements the steps in the above method embodiments.

[0283] In one embodiment, a computer program product is provided, including a computer program that, when executed by a processor, implements the steps in the above method embodiments.

[0284] Those skilled in the art will understand that all or part of the processes in the methods of the above embodiments can be implemented by a computer program instructing related hardware. The computer program can be stored in a non-volatile computer-readable storage medium, and when executed, it can include the processes of the embodiments of the above methods. Any references to memory, databases, or other media used in the embodiments provided in this application can include at least one of non-volatile memory and volatile memory. Non-volatile memory can include read-only memory (ROM), magnetic tape, floppy disk, flash memory, optical memory, high-density embedded non-volatile memory, resistive random access memory (ReRAM), magnetic random access memory (MRAM), ferroelectric random access memory (FRAM), phase change memory (PCM), graphene memory, etc. Volatile memory can include random access memory (RAM) or external cache memory, etc. By way of illustration and not limitation, RAM can take many forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM). The databases involved in the embodiments provided in this application may include at least one type of relational database and non-relational database. Non-relational databases may include, but are not limited to, blockchain-based distributed databases. The processors involved in the embodiments provided in this application may be general-purpose processors, central processing units, graphics processing units, digital signal processors, programmable logic devices, quantum computing-based data processing logic devices, artificial intelligence (AI) processors, etc., and are not limited to these.

[0285] The technical features of the above embodiments can be combined in any way. For the sake of brevity, not all possible combinations of the technical features in the above embodiments are described. However, as long as there is no contradiction in the combination of these technical features, they should be considered to be within the scope of this application.

[0286] The embodiments described above are merely illustrative of several implementation methods of this application, and while the descriptions are specific and detailed, they should not be construed as limiting the scope of this patent application. It should be noted that those skilled in the art can make various modifications and improvements without departing from the concept of this application, and these all fall within the protection scope of this application. Therefore, the protection scope of this application should be determined by the appended claims.

Claims

1. A task preemption method, characterized in that, The method, which applies a command stream parser to a processor, includes: When the status flag in the first preemption register is a preset flag, and the command parser sends the current scheduling command of the first task to the scheduling command thread group splitter, the command parser sends a stop signal to the scheduling command thread group splitter and receives the splitting status of the thread group of the current scheduling command from the scheduling command thread group splitter; wherein, the preset flag indicates that the priority of the second task is higher than the priority of the first task; the current scheduling command is the scheduling command currently being executed in the sub-command sequence packet of the first task; When the splitting state is recorded through the second preemption register, the position of the current scheduling command in the sub-command sequence packet is recorded through the third preemption register, and the status flag of the first preemption register is cleared through the command master parser; The command master parser reads the address of the second task from the command queue register and executes the second task; wherein, the address of the second task is sent to the command queue register of the command stream parser after the software driver layer detects that the status flag has been cleared.

2. The method according to claim 1, characterized in that, After recording the position of the current scheduling command in the sub-command sequence packet via the third preemption register, the method further includes: Update the current value using the task counter to obtain the updated value; The command data storage cache receives the return data returned from memory; wherein the return data carries the value recorded by the task counter; If the value recorded by the task counter carried in the returned data is the same as the updated value, then the returned data is stored in the command data storage cache.

3. The method according to claim 2, characterized in that, The method further includes: If the value of the task counter recorded in the returned data is different from the updated value, then the returned data will not be stored through the command data storage cache.

4. The method according to claim 2 or 3, characterized in that, Before receiving the returned data from memory via the command data storage cache, the method further includes: The command data storage cache is cleared of all data belonging to the first task.

5. The method according to claim 1, characterized in that, Before clearing the status flag of the first preemption register through the command master parser, the method further includes: The command master parser executes the stored command sequence pre-stored in the command data storage cache, and saves the state of the command stream parser that is executing the first task into memory.

6. The method according to claim 5, characterized in that, The splitting state includes the preemption boundary type and thread group progress; the splitting state is recorded through a second preemption register, including: The second preemption register records the preemption boundary type as the thread group boundary, and the thread group progress. After reading the address of the second task from the command queue register through the command master parser and executing the second task, the method further includes: The command master parser executes a portion of the recovery command sequence from the recovery command sequence pre-stored in the command data storage cache to restore the state of the second preemption register and the third preemption register; The command master parser obtains the preemption boundary type recorded in the second preemption register as the thread group boundary, and continues to execute the command sequence after the partial recovery command sequence in the recovery command sequence to restore the state of the command stream parser; The command grabber retrieves the resubmitted current scheduling command from memory within the sub-command sequence packet of the first task; The command master parser obtains the current scheduling command based on the position of the current scheduling command in the sub-command sequence packet recorded in the third preemption register; The thread group progress recorded in the second preemption register in the splitting state is passed to the scheduling command thread group splitter by the command master parser. The scheduling command thread group splitter is used to determine the breakpoint according to the thread group progress and resume the splitting and sending work at the breakpoint.

7. The method according to claim 1, characterized in that, Each of the subcommand sequence packets includes a hardware / software synchronization command; the hardware / software synchronization command is located at the end of the subcommand sequence packet; the hardware / software synchronization command is used to instruct the command master parser to write the counting information into a specified area in the memory pointed to by the address information.

8. The method according to claim 1, characterized in that, The method further includes: When the status flag in the first preemption register is the preset flag, and the thread group splitting completion status of the current scheduling command is received from the thread group splitter of the scheduling command, and the splitting completion status is recorded through the second preemption register, the position of the current scheduling command in the sub-command sequence packet corresponding to the first task is recorded through the third preemption register, and the status flag of the first preemption register is cleared through the command master parser. The command master parser reads the address of the second task from the command queue register and executes the second task.

9. The method according to claim 8, characterized in that, After reading the address of the second task from the command queue register through the command master parser and executing the second task, the method further includes: The command master parser executes a portion of the recovery command sequence from the recovery command sequence pre-stored in the command data storage cache to restore the state of the second preemption register and the third preemption register; The command master parser obtains the preemption boundary type recorded in the second preemption register as the scheduling command boundary, and continues to execute the command sequence after the partial recovery command sequence in the recovery command sequence to restore the state of the command stream parser; The command grabber retrieves the sub-command sequence packet of the first task corresponding to the resubmitted current scheduling command from memory. The command master parser obtains the next command of the current scheduling command based on the position of the current scheduling command in the sub-command sequence packet recorded in the third preemption register, and processes the next command.

10. The method according to claim 1, characterized in that, The method further includes: If the status flag in the first preemption register is the preset flag and the current scheduling command is received through the command master parser, the command master parser records the position of the current scheduling command in the sub-command sequence packet of the first task in the second preemption register, and the command master parser clears the status flag of the first preemption register. The command master parser reads the address of the second task from the command queue register and executes the second task.

11. The method according to claim 10, characterized in that, After reading the address of the second task from the command queue register through the command master parser and executing the second task, the method further includes: The command master parser executes a portion of the recovery command sequence from the recovery command sequence pre-stored in the command data storage cache to restore the state of the second preemption register and the third preemption register; The command master parser obtains the preemption boundary type recorded in the second preemption register as the scheduling command boundary, and continues to execute the command sequence after the partial recovery command sequence in the recovery command sequence to restore the state of the command stream parser; The command grabber retrieves the sub-command sequence packet of the first task corresponding to the resubmitted current scheduling command from memory. The command master parser obtains the current scheduling command based on its position in the sub-command sequence packet as recorded in the third preemption register, and then processes the current scheduling command.

12. The method according to claim 1, characterized in that, If the status flag in the first preemption register is the preset flag, and the sub-command sequence packet containing the current scheduling command has been executed, the command master parser records the position of the sub-command sequence packet of the first task containing the current scheduling command in the second preemption register, and the command master parser clears the status flag of the first preemption register. After reading the address of the second task from the command queue register through the command master parser and executing the second task, the method further includes: The command master parser executes a portion of the recovery command sequence from the recovery command sequence pre-stored in the command data storage cache to restore the state of the second preemption register and the third preemption register; If the preemption boundary type recorded in the second preemption register is a direct memory access boundary, the command master parser obtains the next sub-command sequence packet of the sub-command sequence packet according to the position of the sub-command sequence packet recorded in the third preemption register, and processes the next sub-command sequence packet.

13. A task preemption method, characterized in that, The method, applied to a software driver layer of a processor, includes: When a second task is detected, a preset flag is written as a status flag to the first preemption register of the command stream parser. The command stream parser, upon obtaining the preset flag in the first preemption register and sending the current scheduling command of the first task to the scheduling command thread group splitter, sends a stop signal to the scheduling command thread group splitter via the main command parser and receives the splitting status of the thread group for the current scheduling command from the scheduling command thread group splitter. If the splitting status is recorded via the second preemption register, the position of the current scheduling command in the sub-command sequence packet is recorded via the third preemption register, and the status flag in the first preemption register is cleared via the main command parser. The preset flag indicates that the priority of the second task is higher than that of the first task; the current scheduling command is the currently executing scheduling command in the sub-command sequence packet of the first task. When the status flag of the first preemption register is cleared, the address of the second task is sent to the command queue register of the command stream parser; wherein, the command stream parser is used to read the address of the second task in the command queue register through the command master parser and execute the second task.

14. The method according to claim 13, characterized in that, Before reading the address of the second task from the command queue register via the command master parser and executing the second task, the method further includes: The storage command sequence and the recovery command sequence are stored in memory; wherein, the storage command sequence is used to save the state of the command stream parser that is executing the first task to the memory when the preset flag is present; the storage command sequence corresponds to the recovery command sequence; The memory addresses of the stored command sequence and the recovery command sequence are sent to the command stream parser; wherein, the command stream parser is used to store the stored command sequence and the recovery command sequence read according to the memory address through a command data storage cache.

15. The method according to claim 13, characterized in that, Each of the sub-command sequence packets includes a hardware / software synchronization command; the method further includes: A hardware / software synchronization command is inserted at the end of the subcommand sequence packet; the hardware / software synchronization command includes address information and count information; the hardware / software synchronization command is used to instruct the main command parser to write the count information into a specified area in the memory pointed to by the address information.

16. A preemption method, characterized in that, A processor-specific scheduling command thread group splitter, including: When the status flag in the first preemption register is a preset flag, and the command master parser sends the current scheduling command of the first task to the scheduling command thread group splitter, the stop signal sent by the command master parser is received; wherein, the preset flag indicates that the priority of the second task is higher than the priority of the first task; the current scheduling command is the scheduling command currently being executed in the sub-command sequence packet of the first task; The thread group of the current scheduling command is stopped from being split according to the stop signal, and the splitting status of each thread group is fed back to the command master parser.

17. The method according to claim 16, characterized in that, The splitting status includes the preemption boundary type and thread group progress; the step of feeding back the splitting status of each thread group to the command master parser includes: The progress of each thread group and the preemption boundary type are fed back to the command master parser, wherein the progress of each thread group includes the thread group splitting completion status and the thread group not splitting completion status.

18. The method according to claim 17, characterized in that, The method further includes: When the thread group splitter of the scheduling command reports that the thread group of the current scheduling command is in a splitting completed state, it feeds back the splitting completed state and the preemption boundary type to the command master parser. The command master parser is used to record that the preemption boundary type is a scheduling command boundary.

19. The method according to claim 17, characterized in that, The method further includes: When the thread group splitter of the scheduling command reports that the thread group of the current scheduling command is in an incomplete splitting state, it feeds back the incomplete splitting state, the state of the thread group that has been split, and the preemption boundary type to the command master parser. The command master parser records the preemption boundary type as the scheduling command thread group boundary.

20. A command stream parser for task preemption, characterized in that, The command stream parser includes: The first preemption register is used to receive a preset flag indicating that the priority of the second task is higher than that of the first task; The command master parser is configured to send a stop signal to the scheduling command thread group splitter when the status flag in the first preemption register is set to a preset flag and the command master parser sends the current scheduling command of the first task to the scheduling command thread group splitter, and receive the splitting status of the thread group of the current scheduling command from the scheduling command thread group splitter; the current scheduling command is the scheduling command currently being executed in the sub-command sequence packet of the first task. The second preemption register is used to record the splitting state; The third preemption register is used to record the position of the current scheduling command in the sub-command sequence packet; The command parser is also used to clear the status flag of the first preemption register; The command master parser is also used to read the address of the second task in the command queue register and execute the second task; wherein the address of the second task is sent to the command queue register of the command stream parser after the software driver layer detects that the status flag has been cleared.

21. A preemption system, characterized in that, The system includes a software driver layer, a command stream parser, and a scheduled command thread group splitter; the command stream parser includes a first preemption register, a second preemption register, a third preemption register, a command master parser, and a command queue register; The software driver layer is used to write a preset flag as a status flag to the first preemption register of the command stream parser when a second task is detected; wherein the preset flag indicates that the priority of the second task is higher than the priority of the first task; The command master parser is configured to send a stop signal to the scheduling command thread group splitter when the status flag in the first preemption register is a preset flag and the command master parser sends the current scheduling command of the first task to the scheduling command thread group splitter; wherein, the current scheduling command is the scheduling command currently being executed in the sub-command sequence packet of the first task; The scheduling command thread group splitter is used to receive a stop signal sent by the command master parser, stop splitting the thread group of the current scheduling command according to the stop signal, and feed back the splitting status of each thread group of the current scheduling command to the command master parser. The command master parser is also used to receive the splitting status of the thread group of the current scheduling command fed back by the scheduling command thread group splitter; The second preemption register is used to record the splitting state; The third preemption register is used to record the position of the current scheduling command in the sub-command sequence packet; The command parser is used to clear the status flag of the first preemption register; The software driver layer is also used to send the address of the second task to the command queue register of the command stream parser when it is detected that the status flag of the first preemption register is cleared; The command master parser is used to read the address of the second task in the command queue register and execute the second task.

22. A chip, characterized in that, The method includes a memory and a processor, the memory storing a computer program, characterized in that the processor executes the computer program to implement the steps of the method according to any one of claims 1 to 19.

23. A computer device comprising a memory and a processor, wherein the memory stores a computer program, characterized in that, When the processor executes the computer program, it implements the steps of the method according to any one of claims 1 to 19.

24. A computer-readable storage medium having a computer program stored thereon, characterized in that, When the computer program is executed by a processor, it implements the steps of the method according to any one of claims 1 to 19.

25. A computer program product, comprising a computer program, characterized in that, When the computer program is executed by a processor, it implements the steps of the method according to any one of claims 1 to 19.