Method and system for processing tasks based on multithreading
By employing a multi-threaded processing approach and leveraging the collaborative work of thread control registers and the task manager, task gaps in the SoC chip are reduced, processor performance is improved, hardware resources are saved, and the problem of untimely interrupt signal response caused by excessive CPU load is resolved.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- ALLWINNER TECH CO LTD
- Filing Date
- 2024-08-12
- Publication Date
- 2026-06-12
AI Technical Summary
Existing technologies in SoC chips suffer from untimely interrupt signal response due to excessive CPU load, resulting in task gaps and failing to meet the real-time data processing requirements of the operating system. This also increases hardware resource overhead and the degree of coupling between the system and hardware.
A multi-threaded processing method is adopted, with the number of threads between the thread control register and the task manager being greater than 1. Each thread has a unique corresponding command queue and task queue. The task manager arbitrates and determines the configuration parameters. Multiple threads send tasks to the task queue in parallel, accurately determining the control register configuration parameters and reducing task gaps.
It reduces task gaps, improves processor performance, reduces hardware resource overhead, enhances the performance utilization of execution units, reduces the coupling between the operating system and the hardware processor, and saves costs.
Smart Images

Figure CN119248435B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of data processing technology, and in particular to a method and system for processing tasks based on multithreading. Background Technology
[0002] In common consumer electronics products such as tablets, projectors, and dashcams, various data processing requirements are typically involved, such as layer overlay, scaling, and rotation. For the main control chip, these types of data processing requirements are generally handled by a dedicated data processor within the SoC (System-on-a-Chip). The CPU schedules this data processor to meet the operating system's real-time data processing needs. When the SoC chip is executing data processing tasks, it may be unable to respond to interrupt signals from the data processor in a timely manner due to excessive CPU load. This can lead to gaps between tasks, causing a performance degradation in the data processor and ultimately failing to meet the operating system's real-time data processing requirements.
[0003] To reduce the gap between tasks, current mainstream technical solutions in the industry include using dual control registers (ping-pong registers) to pre-configure the next task in software, thereby reducing the gap between tasks, or using methods such as... Figure 1 A schematic diagram of a synchronous multithreaded processor is disclosed, as follows: Figure 1 As shown, the first pipeline control unit and the second pipeline control unit, i.e., two sets of pipeline control units, are instantiated to schedule the first dedicated execution unit, the second dedicated execution unit, i.e., two sets of dedicated execution units and an independent shared execution unit, in order to reduce task gaps.
[0004] However, in practice, it has been found that both of these technical solutions increase hardware resource consumption and the coupling between the system and hardware, and the increased cost does not lead to an improvement in the upper limit of hardware performance. Therefore, there is an urgent need to propose a technical solution that can save hardware resources (such as area and cost) while reducing task gaps. Summary of the Invention
[0005] The technical problem to be solved by the present invention is to provide a method and system for processing tasks based on multithreading, which can reduce task gaps and save hardware resources.
[0006] To address the aforementioned technical problems, a first aspect of this invention discloses a method for processing tasks based on multithreading. The method is applied to a data processing system, which includes a thread control register and a task manager. The number of threads between the thread control register and the task manager is greater than one, all threads are independent of each other, and each thread has a unique corresponding command queue and a unique corresponding task queue. The method includes:
[0007] The task manager retrieves the target task from the command queue corresponding to each thread from the thread control register, and stores each target task in the task queue matched by the thread corresponding to the target task.
[0008] The task manager arbitrates all target tasks in all task queues to obtain an arbitration result, and determines the configuration parameters corresponding to each target task based on the arbitration result and the configuration queue data stored in the pre-determined storage space.
[0009] The configuration parameters for each target task serve as the basis for performing data processing on that target task.
[0010] As an optional implementation, in the first aspect of the present invention, the task manager determines the configuration parameters corresponding to each target task based on the arbitration result and the configuration queue data stored in a pre-determined storage space, including:
[0011] The task manager determines the header information corresponding to each target task based on the arbitration result;
[0012] The task manager analyzes the header information corresponding to each target task and the configuration queue data stored in the pre-determined storage space to obtain the configuration parameters that match the header information, and determines the configuration parameters that match the header information as the configuration parameters corresponding to the target task that matches the header information.
[0013] As an optional implementation, in the first aspect of the present invention, the data processing system further includes a command queue parser;
[0014] The task manager determines the header information corresponding to each target task based on the arbitration result, including:
[0015] Based on the arbitration result, the task manager selects one of the target tasks from all target tasks in all the task queues that have not performed data processing operations as the current target task;
[0016] The task manager generates a task start signal based on the current target task and sends the task start signal to the command queue parser. The task start signal carries the current target task.
[0017] The command queue parser determines the header information corresponding to the current target task based on the task initiation signal.
[0018] As an optional implementation, in the first aspect of the present invention, each target task includes at least one command, and the header information corresponding to each target task includes the header address of the first-ordered command among all the commands of the target task and the header length of the next command corresponding to each command.
[0019] The command queue parser determines the header information corresponding to the current target task based on the task initiation signal, including:
[0020] For any of the current target tasks, for the first-ordered command in the current target task, the command queue parser parses the task start signal to obtain the header address corresponding to the command and the header length of the next command corresponding to each command; for commands that are not first-ordered in the current target task, the header address corresponding to the command is determined based on the header length address corresponding to the previous command and the header length of the next command corresponding to the previous command.
[0021] As an optional implementation, in the first aspect of the present invention, the data processing system further includes a data processor;
[0022] The method further includes:
[0023] The command queue parser performs configuration operations on the control register of the data processor according to the configuration parameters corresponding to each target task;
[0024] After configuring the control register of the data processor, the command queue parser controls the execution unit of the data processor to perform data processing operations that match the target task according to the configuration parameters corresponding to each target task.
[0025] As an optional implementation, in the first aspect of the present invention, the command queue parser controls the execution unit of the data processor to perform data processing operations matching the target task according to the configuration parameters corresponding to each target task, including:
[0026] The command queue parser sends the start signal of the current command of the current target task to the execution unit of the data processor;
[0027] The execution unit receives the start signal of the current command, switches the current state to the working state according to the start signal of the current command, and processes the data corresponding to the current command according to the current configuration parameters configured in the control register; after the data corresponding to the current command is processed, it sends a processing completion signal of the current command to the command queue parser.
[0028] The command queue parser receives the completion signal of the current command and switches its current state to the working state based on the completion signal. After switching to the working state, it determines whether the current command is the last command in the current target task based on the header length of the next command. If the result is not positive, it updates the next command to the current command and restarts the operation of sending the start signal of the current command to the execution unit of the data processor until the data corresponding to the last command in the current target task is processed.
[0029] As an optional implementation, in the first aspect of the present invention, the method further includes:
[0030] After the data corresponding to the last command in the current target task has been processed, the command queue parser sends a task completion signal for the current target task to the task manager.
[0031] The task manager receives the task end signal of the current target task, and according to the arbitration result, selects the target task that is currently ranked first from all target tasks that have not performed data processing operations in all task queues, and makes it the next target task;
[0032] The task manager updates the next target task to the current target task and re-executes the operation of generating a task start signal based on the current target task until the data processing of the last target task in the command queue corresponding to the task manager is completed.
[0033] As an optional implementation, in the first aspect of the present invention, the data processing system further includes an interrupt manager;
[0034] The method further includes:
[0035] The interrupt manager determines whether there is a thread among all the threads that has completed all the target tasks. When the result is yes, the interrupt flag of that thread is pulled high.
[0036] Each of the aforementioned threads has a unique corresponding register command queue start address and a unique corresponding interrupt flag bit.
[0037] A second aspect of this invention discloses a system for processing tasks based on multithreading. The system includes a thread control register and a task manager. The number of threads between the thread control register and the task manager is greater than one. All threads are independent of each other. Each thread has a unique command queue and a unique task queue.
[0038] The task manager is used to obtain the target task in the command queue corresponding to each thread from the thread control register, and store each target task in the task queue matched by the thread corresponding to the target task;
[0039] The task manager is also used to arbitrate all target tasks in all the task queues, obtain an arbitration result, and determine the configuration parameters corresponding to each target task based on the arbitration result and the configuration queue data stored in the pre-determined storage space.
[0040] The configuration parameters for each target task serve as the basis for performing data processing on that target task.
[0041] As an optional implementation, in the second aspect of the present invention, the specific method by which the task manager determines the configuration parameters corresponding to each target task based on the arbitration result and the configuration queue data stored in the pre-determined storage space includes:
[0042] Based on the arbitration result, determine the header information corresponding to each target task;
[0043] The header information corresponding to each target task and the configuration queue data stored in the pre-determined storage space are analyzed to obtain the configuration parameters that match the header information, and the configuration parameters that match the header information are determined to be the configuration parameters corresponding to the target task that matches the header information.
[0044] As an optional implementation, in a second aspect of the invention, the data processing system further includes a command queue parser;
[0045] The specific method by which the task manager determines the header information corresponding to each target task based on the arbitration result includes:
[0046] Based on the arbitration result, a target task is selected from all target tasks that have not performed data processing operations in all the task queues as the current target task. A task start signal is generated based on the current target task, and the task start signal is sent to the command queue parser. The task start signal carries the current target task to trigger the command queue parser to determine the header information corresponding to the current target task based on the task start signal.
[0047] As an optional implementation, in a second aspect of the present invention, each target task includes at least one command, and the header information corresponding to each target task includes the header address of the first-ordered command among all the commands of the target task and the header length of the next command corresponding to each command.
[0048] The specific method by which the command queue parser determines the header information corresponding to the current target task based on the task initiation signal includes:
[0049] For any of the current target tasks, for the first-ordered command in the current target task, the task start signal is parsed to obtain the header address corresponding to the command and the header length of the next command corresponding to each command; for commands that are not first-ordered in the current target task, the header address corresponding to the command is determined based on the header address corresponding to the previous command and the header length of the next command corresponding to the previous command.
[0050] As an optional implementation, in a second aspect of the invention, the data processing system further includes a data processor;
[0051] The command queue parser is also used to perform configuration operations on the control register of the data processor according to the configuration parameters corresponding to each target task;
[0052] The command queue parser is further configured to, after configuring the control register of the data processor, control the execution unit of the data processor to perform data processing operations that match the target task according to the configuration parameters corresponding to each target task.
[0053] As an optional implementation, in the second aspect of the present invention, the specific method by which the command queue parser controls the execution unit of the data processor to perform data processing operations matching the target task according to the configuration parameters corresponding to each target task includes:
[0054] A start signal for the current command of the current target task is sent to the execution unit of the data processor to trigger the execution unit to receive the start signal of the current command, and switch the current state to the working state according to the start signal of the current command, and process the data corresponding to the current command according to the current configuration parameters configured in the control register; after the data corresponding to the current command is processed, a processing completion signal for the current command is sent to the command queue parser.
[0055] The system receives the completion signal of the current command and switches the current state to the working state based on the completion signal. After switching to the working state, it determines whether the current command is the last command in the current target task based on the header length of the next command. If the result is not positive, the next command is updated to the current command, and the operation of sending the start signal of the current command to the execution unit of the data processor is repeated until the data corresponding to the last command in the current target task is processed.
[0056] As an optional implementation, in a second aspect of the present invention, the command queue parser is further configured to send a task end signal of the current target task to the task manager after the data corresponding to the last command in the current target task has been processed.
[0057] The task manager is also configured to receive the task end signal of the current target task, and, based on the arbitration result, select the target task currently ranked first from all target tasks that have not performed data processing operations in all task queues, and designate it as the next target task.
[0058] The task manager is also used to update the next target task to the current target task, and re-execute the operation of generating a task start signal based on the current target task, until the data processing of the last target task in the command queue corresponding to the task manager is completed.
[0059] As an optional implementation, in a second aspect of the invention, the data processing system further includes an interrupt manager;
[0060] The interrupt manager is used to determine whether there is a thread among all the threads that has completed all the target tasks. When the result is yes, the interrupt flag of that thread is pulled high.
[0061] Each of the aforementioned threads has a unique corresponding register command queue start address and a unique corresponding interrupt flag bit.
[0062] A third aspect of the present invention discloses another system for processing tasks based on multithreading, the system comprising:
[0063] Memory containing executable program code;
[0064] A processor coupled to the memory;
[0065] The processor calls the executable program code stored in the memory to execute some or all of the steps in any of the multi-threaded task processing methods disclosed in the first aspect of the present invention.
[0066] The fourth aspect of the present invention discloses a computer storage medium storing computer instructions, which, when invoked, are used to execute some or all of the steps of any of the multi-threaded task processing methods disclosed in the first aspect of the present invention.
[0067] Compared with the prior art, the embodiments of the present invention have the following beneficial effects:
[0068] This invention discloses a method and system for processing tasks based on multithreading. The method is applied to a data processing system, which includes a thread control register and a task manager. The number of threads between the thread control register and the task manager is greater than one, and all threads do not interfere with each other. Each thread has a unique command queue and a unique task queue. The method includes: the task manager retrieves the target task from the command queue corresponding to each thread from the thread control register and stores each target task in the task queue matched by the thread corresponding to that target task; the task manager arbitrates all target tasks in all task queues to obtain an arbitration result, and determines the configuration parameters corresponding to each target task based on the arbitration result and the configuration queue data stored in a pre-determined storage space; wherein the configuration parameters of each target task are used as the basis for performing data processing corresponding to that target task. As can be seen, this embodiment of the invention obtains corresponding tasks from the thread control register through multiple threads, that is, multiple threads simultaneously issue tasks to the task queue corresponding to each thread, arbitrate the tasks in the task queue corresponding to each thread, and then accurately determine the configuration parameters of the corresponding control register based on the arbitration result and the configuration queue data in the memory storage space, so as to update the control register. In other words, all tasks are sent serially to the execution unit for time-sharing processing, which can reduce the occurrence of task gaps caused by the processor's untimely interrupt response, thereby reducing the performance loss caused by the processor waiting, improving the resource utilization of the computing logic, and thus improving the processor performance; and by improving... By storing the register configuration parameters of multiple tasks in the internal storage space, the execution unit of the data processor can reach the upper limit of hardware performance without increasing hardware resources (such as hardware area), thereby improving the performance utilization of the execution unit while saving costs. It also reduces the waiting time of the execution unit in a single thread, thus improving the processing performance of the execution unit. Furthermore, it effectively reduces the coupling between the operating system and the hardware processor, and reduces the software scheduling overhead of the CPU. Attached Figure Description
[0069] To more clearly illustrate the technical solutions in the embodiments of the present invention, the accompanying drawings used in the description of the embodiments will be briefly introduced below. Obviously, the accompanying drawings described below are only some embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.
[0070] Figure 1 This is a schematic diagram of the structure of a synchronous multithreaded processor disclosed in the prior art;
[0071] Figure 2 This is a schematic diagram of a system architecture for processing tasks based on multithreading, as disclosed in an embodiment of the present invention.
[0072] Figure 3 This is a flowchart illustrating a method for processing tasks based on multithreading, as disclosed in an embodiment of the present invention.
[0073] Figure 4 This is a schematic diagram of a storage space for configuring queue data, as disclosed in an embodiment of the present invention.
[0074] Figure 5 This is a schematic diagram of a thread instantiation structure disclosed in an embodiment of the present invention;
[0075] Figure 6 This is a timing diagram of a multi-threaded parallel distribution of subtasks to execution units, as disclosed in an embodiment of the present invention.
[0076] Figure 7 This is a schematic diagram of a chain-structured configuration queue data disclosed in an embodiment of the present invention;
[0077] Figure 8 This is a schematic diagram of the architecture of a task manager and thread control register disclosed in an embodiment of the present invention;
[0078] Figure 9 This is a flowchart illustrating the control state machine of a task manager disclosed in an embodiment of the present invention;
[0079] Figure 10 This is a timing diagram of a command queue parser disclosed in an embodiment of the present invention;
[0080] Figure 11 This is a schematic diagram of an architecture of a multi-threaded, multi-tasking structure disclosed in an embodiment of the present invention;
[0081] Figure 12 This is a schematic diagram of another multi-threaded, multi-tasking architecture disclosed in an embodiment of the present invention;
[0082] Figure 13This is a schematic diagram of the structure of a system for processing tasks based on multithreading, as disclosed in an embodiment of the present invention;
[0083] Figure 14 This is a schematic diagram of another system for processing tasks based on multithreading, as disclosed in an embodiment of the present invention.
[0084] Figure 15 This is a schematic diagram of another system for processing tasks based on multithreading, as disclosed in an embodiment of the present invention. Detailed Implementation
[0085] To enable those skilled in the art to better understand the present invention, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings of the embodiments of the present invention. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.
[0086] The terms "first," "second," etc., used in the specification, claims, and accompanying drawings of this invention are used to distinguish different objects, not to describe a specific order. Furthermore, the terms "comprising" and "having," and any variations thereof, are intended to cover non-exclusive inclusion. For example, a process, method, apparatus, product, or device that includes a series of steps or units is not limited to the listed steps or units, but may optionally include steps or units not listed, or may optionally include other steps or units inherent to these processes, methods, products, or devices.
[0087] In this document, the term "embodiment" means that a particular feature, structure, or characteristic described in connection with an embodiment may be included in at least one embodiment of the invention. The appearance of this phrase in various places throughout the specification does not necessarily refer to the same embodiment, nor is it a separate or alternative embodiment mutually exclusive with other embodiments. It will be explicitly and implicitly understood by those skilled in the art that the embodiments described herein can be combined with other embodiments.
[0088] This invention discloses a method and system for multi-threaded task processing. It enables multiple threads to retrieve corresponding tasks from the thread control register, i.e., multiple threads concurrently distribute tasks to their respective task queues. The tasks in each thread's task queue are arbitrated, and based on the arbitration result and configuration queue data in memory storage, the configuration parameters of the corresponding control register are precisely determined and updated. This means all tasks are serially sent to the execution unit for time-sharing processing. This reduces task gaps caused by untimely processor interrupt responses, thereby reducing performance loss due to processor idle time, improving resource utilization of the computational logic, and ultimately enhancing processing efficiency. This improves processor performance; by pre-storing the register configuration parameters of multiple tasks in internal storage, the data processor's execution unit can reach the upper limit of hardware performance without increasing hardware resources (such as hardware area), improving the performance utilization of the execution unit while saving costs; it also reduces the waiting time of the execution unit within a single thread, improving the processing performance of the execution unit, effectively reducing the coupling between the operating system and the hardware processor, and reducing the software scheduling overhead of the CPU.
[0089] To better understand the method and system for task processing based on multithreading described in this invention, the system architecture applicable to the method for task processing based on multithreading is first described. Specifically, the system architecture can be as follows: Figure 2 As shown, Figure 2 This is a schematic diagram of a system architecture for processing tasks based on multithreading, as disclosed in an embodiment of the present invention. Figure 2As shown, the system architecture may include a thread control register, a task manager, a command queue parser, and a data processor. The data processor includes a control register and an execution unit. Storage space is used to store configuration queue data, which is used to configure the data processor, specifically its control register. Further, the storage space also stores data to be processed (such as images) and the processed data. The command queue parser and execution unit interact with the storage space via a memory bus. The thread control register obtains tasks from the CPU via the register configuration bus and distributes the corresponding tasks in parallel to the task queues of the task manager through four threads. The task manager arbitrates all tasks in all task queues, obtains an arbitration result, and selects a task from all tasks that have not yet performed data processing operations as the current task. Based on the current task, a task start signal is generated and sent to the command queue parser. This task start signal carries the information of the current task. For any current task, the command queue parser parses the task start signal for the first command in the current task's sequence. The process involves obtaining the header address of the command and the header length of the next command corresponding to each command. For commands that are not the first in the current target task, the header address of the command is determined based on the header address of the previous command and the header length of the next command corresponding to the previous command. The corresponding configuration parameters are then retrieved from storage based on this header address, and configuration operations are performed on the data processor's control register. After configuring the data processor's control register, the data processor's execution unit performs data processing operations matching the command of the current task. After processing the data for the current task, the next task is determined based on the arbitration result and becomes the current task. This process is repeated until all data for all tasks has been processed. If a thread has completed processing data for all tasks, an interrupt signal is sent to the CPU to raise its flag.
[0090] It should be noted that, Figure 2 The system architecture shown is only to illustrate a system applicable to a multi-threaded task processing method. All units involved are only schematic representations, and their specific structures, sizes, shapes, locations, and installation methods can be adapted to the actual system. Figure 2 The system architecture shown is not limited in this respect.
[0091] The above describes the systems to which the multi-threaded task processing method is applicable. The following provides a detailed explanation of the multi-threaded task processing method and system.
[0092] Example 1
[0093] Please see Figure 3 , Figure 3 This is a flowchart illustrating a method for processing tasks based on multithreading, as disclosed in an embodiment of the present invention. This method can be applied to a data processing system, which includes a thread control register and a task manager. The number of threads between the thread control register and the task manager is greater than one, and all threads are independent of each other. Each thread has a unique corresponding command queue (Cmd-queue) and a unique corresponding task queue. Figure 3 As shown, this method for processing tasks based on multithreading can include the following operations:
[0094] 101. The Task Manager retrieves the target tasks from the command queue corresponding to each thread from the thread control register, and stores each target task in the task queue matched by the thread corresponding to that target task.
[0095] In this embodiment of the invention, the thread control register obtains the target task from the processor and stores it in the corresponding command queue. Then, the thread corresponding to the command queue sends the target task to the task queue of the task manager. Each task queue can store at least one target task.
[0096] 102. The Task Manager arbitrates all target tasks in all task queues, obtains the arbitration result, and determines the configuration parameters corresponding to each target task based on the arbitration result and the configuration queue data stored in the pre-determined storage space; wherein, the configuration parameters of each target task are used as the basis for performing data processing corresponding to that target task.
[0097] In this embodiment of the invention, optionally, arbitration is performed on all target tasks in all task queues. This can be understood as arbitrating all target tasks in all task queues. For example, when target tasks from different task queues come out of their respective task queues at the same time, arbitration is performed on all target tasks that come out at the same time. This can also be understood as arbitrating all task queues. For example, after arbitration, all target tasks in task queue 1 are processed first, and after processing is completed, all target tasks in task queue 2 are processed.
[0098] In this embodiment of the invention, optionally, the configuration queue data consists of data from each of multiple headers and data from the main block uniquely corresponding to that header. For any header, the header data includes the starting address of the main block uniquely corresponding to that header, the length of the main block, the base address offset of the control register to be updated corresponding to that main block, the header length of the next command, and the update flag bit of the control register. The data of each main block includes the address offset of the control register to be updated, and the data within all main blocks are contiguous. Each header is used for addressing and parsing the main block uniquely corresponding to that header, and the data of each main block is used to perform an update operation on the address offset of the control register. The storage space occupied by all main blocks in the storage space includes contiguous storage space and / or non-contiguous storage space. Figure 4 As shown, Figure 4 This is a schematic diagram of a storage space for configuring queue data, as disclosed in an embodiment of the present invention. Figure 4As shown, each header consists of the unique starting address of the corresponding body block (Body_Block_Start_Address), the length of the body block (Body_Block_Length, in bytes), the base address offset of the corresponding control register that needs updating (Register_Offset), the header length of the next command (Next_Cmd_Head_Length), and the update flag (Dirty) of the control register. Each header occupies 128 bits, divided into four 32-bit segments. The starting address is used for addressing the header in memory. The header length of the next command is indicated by the high 24 bits of the fourth 32-bit segment. A value of 0 indicates that there is no next command, meaning this command is the last one in the queue; a value greater than 0 indicates that there is a next command. The update flag, bit 0 of the fourth 32-bit segment, is set to 1 if the control register corresponding to Register_Offset needs to be updated to the value within the body_block (i.e., the register value); and 0 if no update is needed. The address offsets stored within the body_block are each 32 bits. The memory address and length of the body_block are obtained from the corresponding header. Based on the base address offset in the header, the control register at the corresponding offset address in the data processor is incrementally and continuously updated. The updated value is the data within the body_block, and this process is repeated 32 times per Body_Block_Length. By storing the configuration data of the control register in blocks and configuring corresponding addresses for addressing analysis, the control register can skip register address segments that do not need updating, achieving non-continuous updates. This ensures accurate control register updates while improving update efficiency, further reducing task gaps. The storage space can be integrated into the data processing system or independent of it.
[0099] In this embodiment of the invention, the top-level thread control register further instantiates the corresponding register command queue header address (Head_address) and header length (Head_length) for each thread, which are configured by the processor. For example... Figure 5 As shown, Figure 5 This is a schematic diagram of a thread instantiation structure disclosed in an embodiment of the present invention, such as... Figure 5As shown, there are four threads: Cmdline0, Cmdline1, Cmdline2, and Cmdline3. Each thread instantiates a corresponding register command queue with a header address (Head_address) and header length (Head_length). Each thread points to a different address segment in the memory space. Each target task carries the address segment of its corresponding thread, and the hardware retrieves the corresponding configuration parameters, i.e., the Register Value, through this address segment. Figure 6 This is a timing diagram of a multi-threaded parallel distribution of subtasks to execution units, as disclosed in an embodiment of the present invention. Figure 6 As shown, there are a total of 4 threads: thread Cmdline0, thread Cmdline1, thread Cmdline2, and thread Cmdline3. Thread Cmdline0 issues 3 subtasks: Subtask 1 (containing commands Cmd0, Cmd1, and Cmd3), Subtask 2 (containing commands Cmd3 and Cmd4), and Subtask 3 (containing commands Cmd5 and Cmd6). Thread Cmdline1 issues 4 subtasks: Subtask 4 (containing commands Cmd7 and Cmd8), Subtask 5 (containing command Cmd9), Subtask 6 (containing command Cmd10), and Subtask 7 (containing commands Cmd11 and Cmd12). Thread Cmdline2 issues 3 subtasks: Subtask 8 (containing command Cmd13)... The thread Cmdline3 issues three subtasks: Subtask 11 (commands Cmd14 and Cmd15), Subtask 10 (commands Cmd16, Cmd17, and Cmd18), and Subtask 11 (commands Cmd21, Cmd22, and Cmd23), and Subtask 13 (commands Cmd24 and Cmd25). Each subtask issued by a thread first enters its corresponding task queue for queuing. After arbitration, an arbitration queue sequence is formed: Subtask 1, Subtask 4, Subtask 8, Subtask 11, Subtask 2, Subtask 5, Subtask 9, Subtask 12... The tasks are then sent to the execution unit for processing. The execution unit continuously processes tasks from different threads. From the perspective of a single processor, the timing of each thread's task assignment is fragmented, and task gaps still exist. However, after the task manager's arbitrator interweaves the tasks from different threads, these task gaps disappear from the execution unit's perspective. It is evident that using a multi-threaded architecture to balance the load across the CPU effectively reduces task gaps for the execution unit, minimizes performance losses caused by idle waiting, and maximizes the hardware performance of the execution unit.
[0100] It is evident that implementation Figure 3The described multi-threaded task processing method obtains corresponding tasks from the thread control register through multiple threads. That is, multiple threads send tasks to the task queue corresponding to each thread in parallel. The tasks in the task queue corresponding to each thread are arbitrated. Then, based on the arbitration result and the configuration queue data in the memory storage space, the configuration parameters of the corresponding control register are accurately determined to update the control register. In other words, all tasks are sent serially to the execution unit for time-sharing processing. This can reduce the occurrence of task gaps caused by the processor's untimely interrupt response, thereby reducing the performance loss caused by the processor waiting, improving the resource utilization of the computing logic, and thus improving the processor performance. Furthermore, by pre-storing the register configuration parameters of multiple tasks in the internal storage space, the data processor's execution unit can reach the upper limit of hardware performance without increasing hardware resources (such as hardware area), improving the performance utilization of the execution unit while saving costs; and reducing the waiting time of the execution unit in a single thread by reducing the waiting time of the execution unit in a single thread, thus improving the processing performance of the execution unit, effectively reducing the coupling between the operating system and the hardware processor, and reducing the software scheduling overhead of the CPU.
[0101] In this embodiment of the invention, optionally, the task manager determines the configuration parameters corresponding to each target task based on the arbitration result and the configuration queue data stored in the pre-determined storage space, including:
[0102] Based on the arbitration result, Task Manager determines the header information corresponding to each target task;
[0103] The Task Manager analyzes the header information corresponding to each target task and the configuration queue data stored in the pre-determined storage space to obtain the configuration parameters that match the header information, and determines the configuration parameters that match the header information as the configuration parameters corresponding to the target task that matches the header information.
[0104] In this embodiment of the invention, optionally, the data processing system further includes a command queue parser; and the task manager determines the header information corresponding to each target task based on the arbitration result, including:
[0105] Based on the arbitration result, the Task Manager selects one of the target tasks from all target tasks that have not performed data processing operations in all task queues as the current target task.
[0106] The Task Manager generates a task start signal based on the current target task and sends the task start signal to the command queue parser. This task start signal carries the information of the current target task.
[0107] The command queue parser determines the header information corresponding to the current target task based on the task initiation signal.
[0108] As can be seen, by analyzing the arbitration result and the configuration queue data stored in the storage space, the embodiments of the present invention can accurately determine the configuration parameters corresponding to the task; and by analyzing the arbitration result of the task to generate the task start signal of the current task, and then analyzing it to achieve efficient and accurate determination of the header information corresponding to the current task; and after the data corresponding to all commands of the current task has been processed, the self-starting task manager quickly determines the next task from the existing task queue for processing, without waiting for the interrupt response of the processor CPU, reducing task gaps and improving the execution performance of the execution unit; and by selecting the next task according to the arbitration result of the task, the accuracy and efficiency of determining the next task can be improved, which is conducive to further reducing the gap between tasks and further improving the execution performance of the execution unit.
[0109] In this embodiment of the invention, optionally, each target task includes at least one command, and the header information corresponding to each target task includes the header address of the first command in the order of all commands of the target task and the header length of the next command corresponding to each command.
[0110] The command queue parser determines the header information corresponding to the current target task based on the task initiation signal, including:
[0111] For any current target task, for the first command in the current target task, the command queue parser parses the task start signal to obtain the header address corresponding to the command and the header length of the next command corresponding to each command; for commands that are not the first command in the current target task, the header address corresponding to the command is determined based on the header address corresponding to the previous command and the header length of the next command corresponding to the previous command.
[0112] In this embodiment of the invention, optionally, the header of each command includes the header length of the next command, Next_Cmd_Head_Length (NCHL). This information indicates the number of Head_blocks in the next command. When the NCHL information is 0, it indicates that the current command is the last one; when the NCHL information is greater than 0, it indicates that the current command is not the last one. Thanks to the addition of the NCHL information, the Head_block of the next command can be addressed at the end of the previous command. Therefore, the Head_block can be made into a chained structure to realize the function of a command queue, such as... Figure 7 As shown, Figure 7 This is a schematic diagram of a chain-structured configuration queue data disclosed in an embodiment of the present invention, such as... Figure 7 As shown, the Head_address and Head_length of command Cmd0 come from the top-level thread control register and are initialized and configured by the processor CPU. Since head data is stored contiguously in memory, the Head_address for each command can be calculated, while the Head_length is recorded in the NCHL information of the previous command. Therefore, the formula for calculating the Head_address of each command is as follows:
[0113] cmd1_Head_address=cmd0_Head_address+cmd0_Head_length;
[0114] cmd2_Head_address=cmd1_Head_address+cmd1_Head_length(cmd0_NCHL);
[0115] cmd3_Head_address=cmd2_Head_address+cmd2_Head_length(cmd1_NCHL);
[0116] ...
[0117] And so on.
[0118] Because the memory address of the body_block is recorded in the Head_block information, multiple body_blocks can use a single contiguous block of memory or multiple fragmented blocks of memory in the storage space, such as... Figure 7 As shown, by storing the main blocks in a contiguous or fragmented manner, the storage flexibility of the main block data is improved, while the utilization rate of fragmented memory in the storage space is also improved.
[0119] As can be seen, this optional embodiment determines the header information of the next command by combining the header information of the previous command, that is, by using the header information relationship between two adjacent commands, the header information of each command can be accurately and efficiently determined.
[0120] In an optional embodiment, the data processing system further includes a data processor; and the method may further include the following steps:
[0121] The command queue parser performs configuration operations on the control registers of the data processor according to the configuration parameters corresponding to each target task;
[0122] After configuring the control registers of the data processor, the command queue parser controls the execution unit of the data processor to perform data processing operations that match the target task, based on the configuration parameters corresponding to each target task.
[0123] In this embodiment of the invention, the data processing operation corresponds to the type of data processor. When the data processor is an image processor, the data processing operation includes image processing operations, and may further include data processing operations; when the data processor is not an image processor, the data processing operation includes non-image processing operations.
[0124] In this embodiment of the invention, the command queue parser performs a configuration operation on the control register of the data processor according to the configuration parameters corresponding to each command of each target task, and after the configuration is completed, it controls the execution unit of the data processor to perform a data processing operation that matches the command according to the configuration parameters corresponding to the command.
[0125] In this optional embodiment, optionally, the command queue parser controls the execution unit of the data processor to perform data processing operations matching the target task based on the configuration parameters corresponding to each target task, including:
[0126] The command queue parser sends the start signal of the current command for the current target task to the execution unit of the data processor;
[0127] The execution unit receives the start signal of the current command and switches the current state to the working state according to the start signal of the current command. It also processes the data corresponding to the current command according to the current configuration parameters configured in the control register. After the data corresponding to the current command is processed, it sends a signal indicating that the processing of the current command is complete to the command queue parser.
[0128] The command queue parser receives the completion signal of the current command and switches its current state to the working state based on the completion signal. After switching to the working state, it determines whether the current command is the last command in the current target task based on the header length of the next command. If the result is not, it updates the next command to the current command and repeats the above operation of sending the start signal of the current command to the execution unit of the data processor until the data corresponding to the last command in the current target task is processed.
[0129] In this embodiment of the invention, Figure 8 This is a schematic diagram of the architecture of a task manager and thread control register disclosed in an embodiment of the present invention. Figure 9 This is a flowchart illustrating the control state machine of a task manager according to an embodiment of the present invention. Figure 8 , 9As shown, the thread control register contains command queues, specifically four command queues. Each command queue corresponds to one thread and interfaces with the task queue in the task manager through that thread. Each task queue can store multiple tasks. The Task Manager's arbitrator arbitrates all tasks in all task queues to determine the current task. If the Task Manager's control state machine is in the IDLE state, it continuously monitors the current task output of the arbitrator and generates a task start signal, Task_start. Simultaneously, it starts the command queue parser. The command queue parser analyzes the task start signal to obtain the task's header information and queries the offset address of the corresponding body block to configure the control register parameters. After configuration, it generates a start signal, Cmd_start, to start the execution unit. When the execution unit finishes processing the data corresponding to the current command, it sends a processing completion signal, Cmd_end, to the command queue parser. This signal is used to determine whether the NCHL (Next_Cmd_Head_Length) information at the end of the current command chain is equal to 0. If the NCHL information is not equal to 0, it calculates the header address (Head_address) and header length (Head_length) of the next command based on the value of NCHL, and then starts the command queue parser again, repeating the above process to execute the next command. If the NCHL information is equal to 0, it means that the last command in the command queue has been processed and a task end signal Task_end is generated. At this time, the control state machine jumps back to IDLE and continues to detect the task output in the task queue.
[0130] As can be seen, this optional embodiment updates the control register after obtaining the configuration parameters corresponding to the task, thereby controlling the execution unit to perform the corresponding data processing operations on the task. Without increasing the control register, it can reduce the occurrence of task gaps caused by untimely processor interrupt response, thereby reducing the performance loss caused by processor idling, improving the resource utilization of the arithmetic logic, and thus improving processor performance. Furthermore, by using self-starting hardware to process data between two adjacent commands, no processor CPU scheduling is required, thus avoiding command gaps caused by untimely processor CPU interrupt response, effectively improving the utilization of the execution unit.
[0131] In another alternative embodiment, the method may further include the following steps:
[0132] Once the data corresponding to the last command in the current target task has been processed, the command queue parser sends a task completion signal to the task manager for the current target task.
[0133] The task manager receives the task completion signal of the current target task and, based on the arbitration result, selects the target task that is currently ranked first from all target tasks that have not performed data processing operations in all task queues, and makes it the next target task.
[0134] The Task Manager updates the next target task to the current target task and re-executes the above operation of generating a task start signal based on the current target task until the data processing of the last target task in the command queue corresponding to the Task Manager is completed.
[0135] In this optional embodiment, Figure 10 This is the working timing of a command queue parser disclosed in an embodiment of the present invention, such as... Figure 10As shown, at time T0, the Task Manager sends a task to the Command Queue Parser, issuing the Task Start signal Task_start and the Head address and Head length of the command queue. At this time, the Command Queue Parser enters the working state, retrieving the Head data and Body data from storage to configure the control registers. At time T1, after the configuration of the control memory corresponding to command Cmd0 is completed, the command queue parser sends a start signal Cmd_start to the execution unit. The execution unit then enters the working state from the idle state, processes the data of command Cmd0 according to the current address offset of the control register, repeatedly reads and writes data from the storage space, inputs data, and outputs the written data to the memory space. At time T2, after the data processing corresponding to command Cmd0 is completed, the execution unit outputs a processing completion signal Cmd_end, thereby starting the command queue parser. The command queue parser retrieves the header data and body data of command Cmd1 based on the NCHL information at the end of the command Cmd0 chain and configures the control register. At time T3, after the configuration of command Cmd1 is completed, a start signal Cmd_start is output. At time T4, the execution unit begins data processing for command Cmd1. After Cmd1 is processed, the execution unit outputs a processing completion signal Cmd_end. The command queue parser restarts and retrieves the header and body data of command Cmd2 based on the NCHL information at the end of the Cmd1 chain, configuring the control registers. At time T5, the command queue parser sends a start signal Cmd_start, and the execution unit begins data processing for command Cmd2. At time T6, after command Cmd2 is executed, it outputs a processing completion signal Cmd_end. At this time, the NCHL information at the end of the Cmd2 chain in the command queue parser is 0, indicating that command Cmd2 is the last command in the command queue. Subsequently, it outputs a task end signal Task_end to notify the task manager that the task it assigned has been completed.
[0136] It should be noted that the time the command queue parser spends configuring the control register is very short, negligible compared to the time the execution unit spends processing a command. Figure 10 To provide a clearer understanding of the various processes involved in task processing for those skilled in the art, the proportion of time spent by the command queue parser has been extended. However, in actual hardware, Cmd0, Cmd1, and Cmd2 can be considered to operate continuously.
[0137] As can be seen, in this optional embodiment, after all data corresponding to the commands of the current task has been processed, the self-starting task manager quickly determines the next task from the existing task queue for processing, without waiting for the CPU interrupt response, reducing task gaps and improving the execution performance of the execution unit; and by selecting the next task based on the arbitration result of the task, the accuracy and efficiency of determining the next task can be improved, which is conducive to further reducing the gap between tasks and improving the execution performance of the execution unit.
[0138] In yet another optional embodiment, the data processing system further includes an interrupt manager; and the method may further include the following steps:
[0139] The interrupt manager determines whether there is a thread among all threads that has completed all target tasks. When the result is yes, it pulls the interrupt flag of that thread high.
[0140] Each thread has a unique address of the start of the register command queue and a unique interrupt flag.
[0141] In this optional embodiment, multiple threads can interface with different systems and heterogeneous CPUs, such as... Figure 11 As shown, Figure 11 This is a schematic diagram of an architecture for a multi-threaded, multi-tasking structure disclosed in an embodiment of the present invention, such as... Figure 11 As shown, there are four threads: Thread 0 (Android), Thread 1 (Linux), Thread 2 (ARM), and Thread 3 (RiskV). Tasks issued by each thread are arbitrated by the task manager before being sent to the execution unit for processing. After task processing, the interrupt manager determines whether to raise the corresponding thread's interrupt flag, such as Thread 0 interrupt, Thread 1 interrupt, Thread 2 interrupt, and Thread 3 interrupt. It can be seen that the four threads correspond to four register command queue start addresses and four interrupt flags. From a system perspective, the subtasks of the four threads are processed "in parallel" by four independent execution units (actually, they are processed serially by a single execution unit). This simplifies the interaction logic between the system and the heterogeneous processor CPU, and can be applied to multi-screen displays and multi-system SoCs. Multithreading is a technology that allows a processor to execute multiple control flows. Its principle is to use one processor as four, turning one "physical" processor with multithreading capabilities into four "logical" processors. From the system's perspective, the logical processors are no different from the physical processors. Therefore, the system distributes worker threads to these four logical processors for execution, allowing multiple threads of (single or multiple) applications to be executed simultaneously on the same physical processor. The four logical processors share all the execution resources of a single physical processor. Therefore, multithreading technology can be considered as processor virtualization.
[0142] In this embodiment of the invention, optionally, Figure 11 The multi-threaded, multi-tasking architecture shown can effectively reduce task gaps, allowing the invoked execution units to approach their performance limits. However, from the perspective of the entire system, the overall performance of the data processor is limited by the performance of a single execution unit. Therefore, two independent execution units can be instantiated internally to improve the overall performance of the data processor. Figure 12 This is a schematic diagram of another multi-threaded, multi-tasking architecture disclosed in an embodiment of the present invention, such as... Figure 12 As shown, instantiating two execution units, 0 and 1, can achieve a 4-thread, 2-core architecture. From the system level, it is still 4 logical execution units processing the tasks of 4 threads in parallel, and the CPU's software scheduling logic will not change.
[0143] As can be seen, in this optional embodiment, if the task corresponding to a thread has been completed, the interrupt flag of that thread can be pulled high so that other tasks can be sent out in parallel through that thread, thereby improving the utilization of the thread, further reducing task gaps, and further improving the execution performance of the execution unit and the system.
[0144] Example 2
[0145] Please see Figure 13 , Figure 13 This is a schematic diagram of a system for processing tasks based on multithreading, as disclosed in an embodiment of the present invention. The data processing system includes a thread control register and a task manager. The number of threads between the thread control register and the task manager is greater than one. All threads are independent of each other, and each thread has a unique command queue and a unique task queue. Figure 12 As shown, this system for processing tasks based on multithreading may include:
[0146] The task manager is used to retrieve the target tasks from the command queue corresponding to each thread from the thread control register, and store each target task in the task queue matched by the thread corresponding to the target task.
[0147] The task manager is also used to arbitrate all target tasks in all task queues, obtain the arbitration result, and determine the configuration parameters corresponding to each target task based on the arbitration result and the configuration queue data stored in the pre-determined storage space; wherein, the configuration parameters of each target task are used as the basis for performing data processing corresponding to that target task.
[0148] It is evident that implementation Figure 13The described multi-threaded task processing system obtains corresponding tasks from the thread control register through multiple threads. That is, multiple threads send tasks to the task queue corresponding to each thread in parallel. The tasks in the task queue corresponding to each thread are arbitrated. Then, based on the arbitration result and the configuration queue data in the memory storage space, the configuration parameters of the corresponding control register are accurately determined and the control register is updated. In other words, all tasks are sent serially to the execution unit for time-sharing processing. This can reduce the occurrence of task gaps caused by the processor's untimely interrupt response, thereby reducing the performance loss caused by the processor waiting, improving the resource utilization of the computing logic, and thus improving the processor performance. Furthermore, by pre-storing the register configuration parameters of multiple tasks in the internal storage space, the data processor's execution unit can reach the upper limit of hardware performance without increasing hardware resources (such as hardware area), improving the performance utilization of the execution unit while saving costs; and reducing the waiting time of the execution unit in a single thread by reducing the waiting time of the execution unit in a single thread, thus improving the processing performance of the execution unit, effectively reducing the coupling between the operating system and the hardware processor, and reducing the software scheduling overhead of the CPU.
[0149] In an optional embodiment, the task manager determines the specific configuration parameters corresponding to each target task based on the arbitration result and the configuration queue data stored in the pre-determined storage space in the following ways:
[0150] Based on the arbitration result, Task Manager determines the header information corresponding to each target task;
[0151] The Task Manager analyzes the header information corresponding to each target task and the configuration queue data stored in the pre-determined storage space to obtain the configuration parameters that match the header information, and determines the configuration parameters that match the header information as the configuration parameters corresponding to the target task that matches the header information.
[0152] It is evident that implementation Figure 13 The described multi-threaded task processing system can also save pixel storage resources and improve the search accuracy and reliability of macroblocks used for final macroblock search by first performing macroblock matching between the current block of the downsampled current frame and the macroblocks of the downsampled pixel region, and then comparing the macroblocks obtained by the two matching methods.
[0153] In another alternative embodiment, Figure 14 This is a schematic diagram of another system for processing tasks based on multithreading, as disclosed in an embodiment of the present invention. Figure 14 As shown, the data processing system described above also includes a command queue parser;
[0154] The Task Manager determines the specific header information for each target task based on the arbitration result, including:
[0155] Based on the arbitration result, the task manager selects one of the target tasks from all target tasks that have not performed data processing operations in all task queues as the current target task, and generates a task start signal based on the current target task. This task start signal carries the current target task and is sent to the command queue parser to trigger the command queue parser to determine the header information corresponding to the current target task based on the task start signal.
[0156] It is evident that implementation Figure 14 The described multi-threaded task processing system accurately determines the configuration parameters of a task by analyzing the arbitration result and the configuration queue data stored in the storage space. It also generates a task start signal for the current task by analyzing the arbitration result, and then analyzes it to efficiently and accurately determine the header information of the current task. After processing all the data corresponding to the commands of the current task, the system automatically starts the task manager to quickly determine the next task from the existing task queue for processing, without waiting for the CPU interrupt response, reducing task gaps and improving the execution performance of the execution unit. Furthermore, selecting the next task based on the arbitration result improves the accuracy and efficiency of determining the next task, further reducing gaps between tasks and improving the execution performance of the execution unit.
[0157] In another optional embodiment, each target task includes at least one command, and the header information corresponding to each target task includes the header address of the first command in the order of all commands of the target task and the header length of the next command corresponding to each command.
[0158] The command queue parser determines the header information corresponding to the current target task based on the task initiation signal in the following ways:
[0159] For any current target task, for the first command in the current target task, the command queue parser parses the task start signal to obtain the header address corresponding to the command and the header length of the next command corresponding to each command; for commands that are not the first command in the current target task, the header address corresponding to the command is determined based on the header address corresponding to the previous command and the header length of the next command corresponding to the previous command.
[0160] It is evident that implementation Figure 14 The system described, which processes tasks based on multithreading, determines the header information of the next command by combining the header information of the previous command. That is, it achieves accurate and efficient determination of the header information of each command by using the header information relationship between two adjacent commands.
[0161] In yet another alternative embodiment, such as Figure 14 As shown, the data processing system also includes a data processor;
[0162] The command queue parser is also used to perform configuration operations on the control registers of the data processor according to the configuration parameters corresponding to each target task;
[0163] The command queue parser is also used to control the execution unit of the data processor to perform data processing operations that match the target task, based on the configuration parameters corresponding to each target task, after the control register of the data processor has been configured.
[0164] In this optional embodiment, the command queue parser controls the execution unit of the data processor to perform data processing operations matching the target task according to the configuration parameters corresponding to each target task, including the following specific methods:
[0165] The execution unit of the data processor sends a start signal for the current command of the current target task to trigger the execution unit to receive the start signal of the current command, and switches the current state to the working state according to the start signal of the current command, and processes the data corresponding to the current command according to the current configuration parameters configured in the control register; when the data corresponding to the current command is processed, a processing completion signal for the current command is sent to the command queue parser.
[0166] Upon receiving the completion signal of the current command, the system switches from the current state to the working state. After switching to the working state, it determines whether the current command is the last command in the current target task based on the header length of the next command. If the result is negative, the next command is updated to the current command, and the system repeats the above operation of sending the start signal of the current command to the execution unit of the data processor until the data corresponding to the last command in the current target task is processed.
[0167] It is evident that implementation Figure 14The described multi-threaded task processing system updates the control register after obtaining the configuration parameters corresponding to the task, thereby controlling the execution unit to perform the corresponding data processing operations on the task. Without increasing the number of control registers, it can reduce the occurrence of task gaps caused by untimely processor interrupt responses, thereby reducing the performance loss caused by processor idling, improving the resource utilization of the arithmetic logic, and thus improving processor performance. Furthermore, by using self-starting hardware to process data between two adjacent commands, no CPU scheduling is required, thus avoiding command gaps caused by untimely CPU interrupt responses and effectively improving the utilization of the execution unit.
[0168] In yet another alternative embodiment, such as Figure 14 As shown, the command queue parser is also used to send a task end signal to the task manager after the data corresponding to the last command in the current target task has been processed.
[0169] The task manager is also used to receive the task completion signal of the current target task, and based on the arbitration result, to select the target task that is currently ranked first from all target tasks that have not performed data processing operations in all task queues, and to make it the next target task;
[0170] The Task Manager is also used to update the next target task to the current target task and re-execute the above-mentioned operation of generating a task start signal based on the current target task, until the data processing of the last target task in the command queue corresponding to the Task Manager is completed.
[0171] It is evident that implementation Figure 14 The system described, which processes tasks based on multithreading, automatically starts the task manager to quickly determine the next task from the existing task queue after processing all the data corresponding to the commands of the current task. This eliminates the need to wait for the CPU's interrupt response, reduces task gaps, and improves the execution performance of the execution unit. Furthermore, by selecting the next task based on the arbitration result of the tasks, the accuracy and efficiency of determining the next task can be improved, which further helps to reduce the gaps between tasks and improve the execution performance of the execution unit.
[0172] In yet another alternative embodiment, such as Figure 14 As shown, the data processing system also includes an interrupt manager;
[0173] The interrupt manager is used to determine whether there is a thread among all threads that has completed all target tasks. When the result is yes, the interrupt flag of that thread is pulled high.
[0174] Each thread has a unique address of the start of the register command queue and a unique interrupt flag.
[0175] It is evident that implementation Figure 14 The system described above, which processes tasks based on multithreading, can raise the interrupt flag of a thread once the corresponding thread has completed its task, allowing other tasks to be dispatched in parallel through that thread. This improves thread utilization, reduces task gaps, and further enhances the execution performance of the execution unit and the system.
[0176] Example 3
[0177] Please see Figure 15 , Figure 15 This is a schematic diagram of another system for processing tasks based on multithreading, as disclosed in an embodiment of the present invention. Figure 15 As shown, the device may include:
[0178] Memory 301 storing executable program code;
[0179] Processor 302 coupled to memory 301;
[0180] Furthermore, it may also include an input interface 303 and an output interface 304 coupled to the processor 302;
[0181] The processor 302 calls the executable program code stored in the memory 301 to execute the steps in the multi-threaded task processing method described in Embodiment 1.
[0182] Example 4
[0183] This invention discloses a computer read storage medium that stores a computer program for electronic data interchange, wherein the computer program causes a computer to execute the steps of the multi-threaded task processing method described in Embodiment 1.
[0184] Example 5
[0185] This invention discloses a computer program product, which includes a non-transitory computer-readable storage medium storing a computer program, and the computer program is operable to cause a computer to perform the steps in the multi-threaded task processing method described in Embodiment 1.
[0186] The device embodiments described above are merely illustrative. The modules described as separate components may or may not be physically separate. The components shown as modules may or may not be physical modules; that is, they may be located in one place or distributed across multiple network modules. Some or all of the modules can be selected to achieve the purpose of this embodiment according to actual needs. Those skilled in the art can understand and implement this without any creative effort.
[0187] Through the detailed description of the above embodiments, those skilled in the art can clearly understand that each implementation method can be implemented by means of software plus necessary general-purpose hardware platforms, and of course, it can also be implemented by hardware. Based on this understanding, the above technical solutions, in essence or the part that contributes to the prior art, can be embodied in the form of a software product. This computer software product can be stored in a computer-readable storage medium, including read-only memory (ROM), random access memory (RAM), programmable read-only memory (PROM), erasable programmable read-only memory (EPROM), one-time programmable read-only memory (OTPROM), electrically-Erasable Programmable Read-Only Memory (EEPROM), compact disc read-only memory (CD-ROM) or other optical disc storage, disk storage, magnetic tape storage, or any other computer-readable medium that can be used to carry or store data.
[0188] Finally, it should be noted that the method and system for processing tasks based on multithreading disclosed in the embodiments of the present invention are merely preferred embodiments of the present invention and are only used to illustrate the technical solutions of the present invention, not to limit it. Although the present invention has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that modifications can still be made to the technical solutions described in the foregoing embodiments, or equivalent substitutions can be made to some of the technical features. Such modifications or substitutions do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of the present invention.
Claims
1. A method for processing tasks based on multithreading, characterized in that, The method is applied to a data processing system, which includes a thread control register, a task manager, and a command queue parser. The number of threads between the thread control register and the task manager is greater than one, all threads are independent of each other, and each thread has a unique corresponding command queue and a unique corresponding task queue. The method includes: The task manager retrieves the target task from the command queue corresponding to each thread from the thread control register, and stores each target task in the task queue matched by the thread corresponding to the target task. The task manager arbitrates all target tasks in all task queues to obtain an arbitration result, and determines the configuration parameters corresponding to each target task based on the arbitration result and the configuration queue data stored in the pre-determined storage space. Each target task contains at least one command, and the header information corresponding to each target task contains the header address of the first command in the order of all commands of the target task and the header length of the next command corresponding to each command. The configuration parameters for each target task are used as the basis for performing data processing on the target task. The task manager determines the configuration parameters corresponding to each target task based on the arbitration result and the configuration queue data stored in the pre-determined storage space, including: Based on the arbitration result, the task manager selects a target task from all target tasks that have not performed data processing operations in all the task queues and uses it as the current target task. The task manager generates a task start signal based on the current target task and sends the task start signal to the command queue parser. The task start signal carries the current target task. For the first command in the current target task, the command queue parser parses the task start signal to obtain the header address corresponding to the command and the header length of the next command corresponding to each command; for commands that are not first in the current target task, the header address corresponding to the command is determined based on the header address corresponding to the previous command and the header length of the next command corresponding to the previous command.
2. The method for processing tasks based on multithreading according to claim 1, characterized in that, The task manager determines the configuration parameters corresponding to each target task based on the arbitration result and the configuration queue data stored in the pre-determined storage space, and also includes: The task manager analyzes the header information corresponding to each target task and the configuration queue data stored in the pre-determined storage space to obtain the configuration parameters that match the header information, and determines the configuration parameters that match the header information as the configuration parameters corresponding to the target task that matches the header information.
3. The method for processing tasks based on multithreading according to claim 1 or 2, characterized in that, The data processing system also includes a data processor; The method further includes: The command queue parser performs configuration operations on the control register of the data processor according to the configuration parameters corresponding to each target task; After configuring the control register of the data processor, the command queue parser controls the execution unit of the data processor to perform data processing operations that match the target task according to the configuration parameters corresponding to each target task.
4. The method for processing tasks based on multithreading according to claim 3, characterized in that, The command queue parser, based on the configuration parameters corresponding to each target task, controls the execution unit of the data processor to perform data processing operations matching the target task, including: The command queue parser sends the start signal of the current command of the current target task to the execution unit of the data processor; The execution unit receives the start signal of the current command, switches the current state to the working state according to the start signal of the current command, and processes the data corresponding to the current command according to the current configuration parameters configured in the control register; after the data corresponding to the current command is processed, it sends a processing completion signal of the current command to the command queue parser. The command queue parser receives the completion signal of the current command and switches its current state to the working state based on the completion signal. After switching to the working state, it determines whether the current command is the last command in the current target task based on the header length of the next command. If the result is not positive, it updates the next command to the current command and restarts the operation of sending the start signal of the current command to the execution unit of the data processor until the data corresponding to the last command in the current target task is processed.
5. The method for processing tasks based on multithreading according to claim 4, characterized in that, The method further includes: After the data corresponding to the last command in the current target task has been processed, the command queue parser sends a task completion signal for the current target task to the task manager. The task manager receives the task end signal of the current target task, and according to the arbitration result, selects the target task that is currently ranked first from all target tasks that have not performed data processing operations in all task queues, and makes it the next target task; The task manager updates the next target task to the current target task and re-executes the operation of generating a task start signal based on the current target task until the data processing of the last target task in the command queue corresponding to the task manager is completed.
6. The method for processing tasks based on multithreading according to any one of claims 1, 2, 4 and 5, characterized in that, The data processing system also includes an interrupt manager; The method further includes: The interrupt manager determines whether there is a thread among all the threads that has completed all the target tasks. When the result is yes, the interrupt flag of that thread is pulled high. Each of the aforementioned threads has a unique corresponding register command queue start address and a unique corresponding interrupt flag bit.
7. A system for processing tasks based on multithreading, characterized in that, The system includes a thread control register and a task manager. The number of threads between the thread control register and the task manager is greater than one. All threads do not interfere with each other. Each thread has a unique command queue and a unique task queue, wherein: The task manager is used to obtain the target task in the command queue corresponding to each thread from the thread control register, and store each target task in the task queue matched by the thread corresponding to the target task; The task manager is also used to arbitrate all the target tasks in all the task queues, obtain an arbitration result, and determine the configuration parameters corresponding to each target task based on the arbitration result and the configuration queue data stored in the pre-determined storage space; each target task contains at least one command, and the header information corresponding to each target task contains the header address of the first command in the order of all the commands of the target task and the header length of the next command corresponding to each command; The configuration parameters for each target task are used as the basis for performing data processing on the target task. The task manager determines the specific method for configuring the parameters corresponding to each target task based on the arbitration result and the configuration queue data stored in the pre-determined storage space, including: Based on the arbitration result, the task manager selects a target task from all target tasks that have not performed data processing operations in all the task queues and uses it as the current target task. The task manager generates a task start signal based on the current target task and sends the task start signal to the command queue parser. The task start signal carries the current target task. For the first command in the current target task, the command queue parser parses the task start signal to obtain the header address corresponding to the command and the header length of the next command corresponding to each command; for commands that are not first in the current target task, the header address corresponding to the command is determined based on the header address corresponding to the previous command and the header length of the next command corresponding to the previous command.
8. A system for processing tasks based on multithreading, characterized in that, The system includes: Memory containing executable program code; A processor coupled to the memory; The processor calls the executable program code stored in the memory to execute the multi-threaded task processing method as described in any one of claims 1-6.