Control device, multi-operator control method, hardware scheduler, accelerator and system

CN122309188APending Publication Date: 2026-06-30CALTERAH SEMICON TECH (SHANGHAI) CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
CALTERAH SEMICON TECH (SHANGHAI) CO LTD
Filing Date
2024-12-31
Publication Date
2026-06-30

Smart Images

  • Figure CN122309188A_ABST
    Figure CN122309188A_ABST
Patent Text Reader

Abstract

A control device, a multi-operator control method, a hardware scheduler, an accelerator, and a computing system are disclosed. In this embodiment, multiple first cache allocation instructions are decoded by an instruction decoding circuit and then executed by an instruction execution circuit. Caches are allocated to multiple operators from a shared cache pool, enabling data to be transferred between operators through the caches. This achieves soft connections and fast data interaction between operators. While ensuring processing efficiency, the connection relationship between operators can be changed according to application needs, improving the application flexibility of hardware accelerators using multiple operators.
Need to check novelty before this filing date? Find Prior Art

Claims

1. A control device, characterized in that, The control device is used to control a plurality of specified operators configured to process data to be processed according to a set connection order. The control device includes: An instruction decoding circuit is configured to decode software code, the software code including a plurality of first cache allocation instructions, each first cache allocation instruction including: a first operand, used to indicate the location and size of the cache allocated for the operator; and a second operand, used to indicate the storage space to be written to the first operand; The instruction execution circuit, coupled to the instruction decoding circuit, is configured to, in response to a plurality of decoded first cache allocation instructions, write a first operand in each first cache allocation instruction into the storage space indicated by a second operand in the cache allocation instruction, so as to allocate caches for the plurality of operators to use when accessing data. The caches allocated to the multiple operators are all from a shared cache pool.

2. The control device according to claim 1, characterized in that, The first operand in the first cache allocation instruction is used to indicate the location and size of the cache initially allocated to the operator. The cache initially allocated to the operator is the cache used when the operator performs its first processing. The cache includes an input cache and / or an output cache. The multiple operators are started sequentially according to a set connection order. For adjacent operators among the multiple operators, the output buffer initially allocated to the preceding operator is the same as the input buffer initially allocated to the following operator.

3. The control device according to claim 1 or 2, characterized in that, The software code contains short instructions of 16 bits, 32 bits, or 64 bits.

4. The control device according to claim 2, characterized in that, The software code decoded by the instruction decoding circuit also includes multiple sets of scheduling instructions; The instruction execution circuit is further configured to schedule one or more operators in response to each decoded set of scheduling instructions, such that the scheduled operators start and complete one processing of a data block in a time step; wherein each time step begins when a set of scheduling instructions is executed and ends when the execution of that set of scheduling instructions ends.

5. The control device according to claim 4, characterized in that, The software code decoded by the instruction decoding circuit also includes a loop start instruction and a loop end instruction. The loop start instruction includes an operand representing the number of loop iterations. The first cache allocation instruction is located before the loop start instruction, and the multiple sets of scheduling instructions are located between the loop start instruction and the loop end instruction. The instruction execution circuit responds to the decoded software code by performing multiple loop processes, controlling the multiple operators to complete the processing of the data to be processed in a pipeline manner according to a set connection order; each loop process includes multiple time steps, and each operator completes one processing of a data block in the data to be processed in one time step.

6. The control device according to claim 5, characterized in that, There are M groups of scheduling instructions between the loop start instruction and the loop end instruction. Each loop process includes M time steps, where M = m1 + m2 - 1, m1 is the number of the multiple operators, m2 is the number of data blocks processed in each loop, m1 ≥ 2, and m2 ≥ 2; When m < m1, the activated operators at the m-th time step are the 1st to the m-th operators, where m = 1, 2, …, M; When m1 ≤ m ≤ m2, the activated operators at the m-th time step are all the operators among the multiple operators; When m2 < m ≤ M, the activated operators at the m-th time step are the (m - m2 + 1)-th to the m1-th operators; Among them, the operator to be started indicated by the start instruction in the m-th group of scheduling instructions is the activated operator at the m-th time step.

7. The control device according to claim 5, wherein The first cache allocation instruction includes a write register instruction and N' write data instructions associated with the write register instruction. The write register instruction is associated with a total of N write data instructions, where: The write register instruction includes: an operation code, the second operand representing the register address Add, and an operand representing the number of immediate numbers N, where N ≥ N' ≥ 2; The write data instruction includes an operation code and an immediate number. The first operand includes N' immediate numbers in the N' write data instructions. The N' immediate numbers in the N' write data instructions include information on the row position, number of rows, column position, and number of columns of the cache allocated for the operator; The instruction execution circuit writes the first operand in each first cache allocation instruction into the storage space indicated by the second operand in the cache allocation instruction, including: for each of the N' write data instructions, writing the immediate number therein into the register with the address Add + n - 1 for the operator to read, where n is the serial number of the instruction in the N write data instructions, indicating that the instruction is the n-th instruction under the write register instruction, and 1 ≤ n ≤ N.

8. The control device according to claim 7, wherein The N' immediate numbers in the N' write data instructions further include the number K of caches used by the operator in multiple loop processes, to indicate that the operator updates the row position or column position of the allocated cache step by step with a period of K time steps, so that the operator uses different caches at the K time steps in the same period and uses the same cache at the k-th time step in different periods, where k = 0, 1, …, K - 1 and K ≥ 2.

9. The control device according to claim 5, wherein The software code decoded by the instruction decoding circuit further includes a second cache allocation instruction located between two adjacent groups of scheduling instructions. The second cache allocation instruction includes: a first operand for indicating the position of the cache re-allocated for the operator to be used in the next time step; a second operand for indicating the storage space into which the first operand is to be written. The instruction execution circuit is further configured to, in response to each decoded second cache allocation instruction, write the first operand therein to the storage space indicated by the second operand therein, so as to allocate a cache for the operator using the storage space to use in the next time step; wherein the caches allocated for the plurality of operators to use in the same time step are located in different and independently accessible regions in the shared cache pool, and for adjacent operators among the plurality of operators, the output cache allocated for the preceding operator to use in the current time step is the input cache allocated for the following operator to use in the next time step.

10. The control device according to claim 9, characterized in that, In the multiple sets of scheduling instructions, the second buffer allocation instruction between two adjacent sets of scheduling instructions is used to reallocate the buffer position for each operator that was processed in the previous time step and still needs to be processed in the next time step. For each of the plurality of operators, starting from the time step when the operator is first started, with a period of K time steps, the cache allocated by the instruction execution circuit for the operator in the K time steps of the same period is different, while the cache allocated for the operator in the k-th time step of different periods is the same, k = 0, 1, ..., K-1, K ≥ 2.

11. The control device according to claim 9, characterized in that, The second cache allocation instruction includes a write register instruction and M write data instructions associated with the write register instruction, where M ≥ 1, and: The write register instruction includes: an opcode, a second operand representing the register address Add', and an operand representing the number of immediate values ​​M; The write data instruction includes an opcode and an immediate value. The first operand includes M immediate values ​​from the M write data instructions. The M immediate values ​​from the M write data instructions include information about the row or column position of the cache reallocated for the operator. The instruction execution circuit responds to each second cache allocation instruction by writing the first operand therein into the storage space indicated by the second operand, including: for each of the M write data instructions, writing the immediate value therein into a register at address Add'+m-1 for the operator to read, where m is the sequence number of the instruction in the M write data instructions, indicating that the instruction is the m-th instruction under the write register instruction, 1≤m≤M.

12. The control device according to claim 5, characterized in that, The data to be processed is a frame of data obtained by the radar system from processing the received radar signal. The plurality of operators are operators in the 1D-FFT processing stage, and one data block in the data to be processed is data obtained by processing a chirp signal received on a channel; or, the plurality of operators are operators in the 2D-FFT processing stage, and one data block in the data to be processed is data at the same distance gate in the 1D-FFT data.

13. A multi-operator control method applied to a hardware accelerator, the hardware accelerator including a hardware scheduler, a specified number of operators, and a shared cache pool, the control method comprising: Allocating caches for the multiple operators respectively from the shared cache pool for data access to form data channels between the multiple operators, where the allocated caches include input caches and / or output caches; Through step-by-step scheduling, starting the multiple operators in sequence according to the set connection order to complete the processing of the data to be processed; Among them, for adjacent operators among the multiple operators, the output cache used by the previous operator in the current time step is the input cache used by the subsequent operator in the next time step.

14. The control method according to claim 13, wherein: The allocating caches for the multiple operators respectively from the shared cache pool for data access includes: allocating caches for the multiple operators respectively from the shared cache pool for the first processing; Among them, for adjacent operators among the multiple operators, the output cache allocated for the previous operator for the first processing is the input cache allocated for the subsequent operator for the first processing.

15. The control method according to claim 13 or 14, wherein: The through step-by-step scheduling, starting the multiple operators in sequence according to the set connection order to complete the processing of the data to be processed includes: performing multiple loop controls to control the multiple operators to complete the processing of multiple data blocks in the data to be processed in a pipeline manner according to the set connection order; each loop process includes multiple time steps, and each operator completes one processing of one data block in the data to be processed in one time step; Among them, the caches allocated for the multiple operators to use in the same time step are located in different and independently accessible regions in the shared cache pool; and for adjacent operators among the multiple operators, the output cache allocated for the previous operator to use in the current time step is the input cache allocated for the subsequent operator to use in the next time step.

16. The control method according to claim 15, wherein: Each loop process includes M time steps, M = m1 + m2 - 1, where m1 is the number of the multiple operators, m2 is the number of data blocks processed in each loop, m1 ≥ 2, m2 ≥ 2; In each loop process: When m < m1, the activated operators in the m-th time step are the 1st to the m-th operators, m = 1, 2,..., M; When m1 ≤ m ≤ m2, the activated operators in the m-th time step are all the operators among the multiple operators; When m2 < m ≤ M, the activated operators in the m-th time step are the (m - m2 + 1)-th to the m1-th operators.

17. The control method according to claim 15, wherein: The method further includes: respectively allocating the number K of caches used by the multiple operators in the multiple loop processes to indicate that the operators update the positions of the allocated caches step by step with a period of K time steps, so that the operators use different caches in K time steps of the same period and use the same cache in the k-th time step of different periods, k = 0, 1,..., K - 1, K ≥ 2.

18. The control method according to claim 15, characterized in that: The step of allocating caches from the shared cache pool for the multiple operators to use when accessing data includes: reallocating caches for operators that process data in both the previous and next time steps between two adjacent time steps; For each of the plurality of operators, there are K groups of buffers allocated and reallocated for the operator during the multiple loops, where K ≥ 2. Each group includes an input buffer and / or an output buffer. Starting from the first processing of the operator, with a period of K time steps, the buffer used by the operator in the k-th time step of different periods is the same, and the buffer used in different time steps of the same period is different, where k = 0, 1, ..., K-1.

19. The control method according to claim 15, characterized in that: The data to be processed is a frame of data obtained by the radar system from processing the received radar signal. The plurality of operators are operators in the 1D-FFT processing stage, and one data block in the data to be processed is data obtained by processing a chirp signal received on a channel; or, the plurality of operators are operators in the 2D-FFT processing stage, and one data block in the data to be processed is data at the same distance gate in the 1D-FFT data.

20. A hardware scheduler, comprising a control device and an internal memory, characterized in that, The control device is a control device as described in any one of claims 1 to 12, and the internal memory is configured to store the software code to be decoded and executed by the control device.

21. A hardware accelerator, characterized in that, This includes a hardware scheduler, multiple specified operators, a shared cache pool providing cache space for the operators, and a register set, wherein: The hardware scheduler is the hardware scheduler as described in claim 20, wherein the cache information allocated to each operator is saved in the register group, and the processed data is stored in external memory. The register group is configured to store configuration information for the plurality of operators, the configuration information including information on the cache allocated to the operators; The plurality of operators are configured to, upon startup, obtain information about the cache allocated to the operator from the register group; if an input cache is allocated, read the data block to be processed from the input cache and process it; if an output cache is allocated, write the processed data into the output cache.

22. The hardware accelerator according to claim 21, characterized in that: The operator includes: The data reading unit is configured to acquire information about the input buffer allocated to the operator, and based on the information about the input buffer, sequentially read out the data blocks in the input buffer and input them into the processing unit; The processing unit is configured to process the data block to obtain a processed data block; The write data unit is configured to obtain information about the output buffer allocated to the operator, and write the processed data block into the output buffer based on the information about the buffer. The scheduling interface is configured to receive a start signal and use it as a trigger signal to begin processing this operator. The synchronization interface is configured to send a signal indicating that the processing is complete after the write data unit writes the processed data block into the output buffer.

23. The hardware accelerator according to claim 22, characterized in that: The read data unit retrieves information about the input buffer allocated to the operator from the register group, and the write data unit retrieves information about the output buffer allocated to the operator from the register group. The operator also includes a cache update unit; The cache update unit is configured to: read from the register group the location and size of the cache initially allocated to the operator, and the number K of caches allocated to the operator, where K ≥ 2; and cyclically count the number of times the scheduling interface receives the start signal, starting a new cycle after the accumulated count reaches K times in each cycle, and when the accumulated count in each cycle is k times, calculate the location of the cache used by the operator in the next time step based on the location and size of the initially allocated cache and the value of k, and update the cache location in the register group to the calculated cache location; where k = 0, 1, ..., K-1, and the calculated cache location is different when the value of k is different.

24. The hardware accelerator according to claim 21, characterized in that: The plurality of operators sequentially include: CQMD operator, FFT operator, and SVA operator; or The plurality of operators sequentially include: DC operator, FFT operator, and SVA operator; or The plurality of operators sequentially include: CMB operator, STAS operator, HIST operator, CFAR operator, and STAS operator; or The plurality of operators include, in sequence: DC operator, FFT operator, SVA operator, and CMB operator.

25. A computing system comprising a processor, memory, and a hardware accelerator, characterized in that: The processor is configured to load the data to be processed stored in the memory into the hardware accelerator; The memory is configured to store data to be processed, and to store data obtained by the hardware accelerator after processing the data to be processed. The hardware accelerator employs the hardware accelerator described in any one of claims 21 to 24.

26. The integrated circuit according to claim 25, characterized in that: The computing system is a system-on-a-chip, which is a millimeter-wave chip or sensor chip in a radar system.