Register access methods, processors and electronic devices
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- ARM TECH CHINA CO LTD
- Filing Date
- 2022-07-28
- Publication Date
- 2026-06-30
Smart Images

Figure CN115113932B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of chip technology, and in particular to a register access method, a processor, and an electronic device. Background Technology
[0002] Registers are storage units in a processor used to store data and instructions. When executing instructions, the processor can store the data corresponding to the instructions in registers. In some scenarios, it is often necessary to perform the same access operation on multiple registers, such as setting all the data in multiple registers to zero (hereinafter referred to as clearing) or setting all the data in multiple registers to 1 (hereinafter referred to as setting).
[0003] To clear or set data in registers in batches, the Load Multiple (LDM) instruction is typically used to clear or set data in multiple registers. However, when using the LDM instruction to clear or set data in multiple registers, only one register can be set to 0 or 1 in a single clock cycle or pipeline stage. Therefore, the number of clock cycles or pipeline stages required to clear or set data in multiple registers increases with the number of registers being cleared or set, resulting in low efficiency. Summary of the Invention
[0004] In view of this, embodiments of this application provide a register access method, a processor, and an electronic device. By setting a port in the register for clearing or setting data to 1, the data in multiple registers can be cleared or set to 1 in the same clock cycle or pipeline stage, which helps to improve the efficiency of the processor when performing clearing or setting operations on multiple registers and increases the speed of processor instruction execution.
[0005] In a first aspect, embodiments of this application provide a register access method applied to an electronic device. The method includes: a processor of the electronic device receiving an instruction to perform a first operation on a plurality of registers in the processor, wherein the first operation is an operation to set data stored in the plurality of registers to 0 or an operation to set data stored in the plurality of registers to 1; in response to the instruction, the processor sends operation signals corresponding to the first operation to preset ports of each register for responding to the first operation in parallel; each register detects the operation signal of its respective preset port and performs the first operation on its respective stored data in the same clock cycle.
[0006] In this embodiment, the processor can send operation signals to preset ports of registers in parallel within the same clock cycle, and the registers receiving the operation signals can perform a first operation on their respective stored data within the same clock cycle. Thus, the number of clock cycles or pipeline stages required for the processor to clear or set data in multiple registers does not increase with the number of registers being cleared or set, which improves the efficiency of the processor when clearing or setting multiple registers and increases the speed of processor instruction execution.
[0007] For example, the preset port can be the set-to-1 port or the cleared-to-zero port as described below.
[0008] For example, the operation signal can be a set signal or a clear signal as described below.
[0009] In one possible implementation of the first aspect above, each register detects the operation signal of the preset port and performs a first operation on the data stored therein during the same clock cycle, including: if the first operation is to set the data stored in the register to 0, each register sets all the data stored therein to 0 during the same clock cycle; if the first operation is to set the data stored in the register to 1, each register sets all the data stored therein to 1 during the same clock cycle.
[0010] In one possible implementation of the first aspect above, when the first operation is to set the data stored in multiple registers to 0, the preset port is a first preset port; when the first operation is to set the data stored in multiple registers to 1, the preset port is a second preset port.
[0011] In the embodiments of this application, each register is provided with a first preset port (e.g., the clear port below) for responding to the clear operation and a second preset port (e.g., the clear port below) for responding to the set operation.
[0012] In one possible implementation of the first aspect above, each register detects an operation signal of a preset port and performs a first operation on its stored data in the same clock cycle, including: if an operation signal of a first preset port is detected, each register sets all of its stored data to 0 in the same clock cycle; if an operation signal of a second preset port is detected, each register sets all of its stored data to 1 in the same clock cycle.
[0013] In one possible implementation of the first aspect described above, the processor pipeline includes a first pipeline stage and a second pipeline stage arranged sequentially; and the processor sends operation signals corresponding to the first operation to preset ports of each register for responding to the first operation in parallel, including: the processor using the first pipeline stage of the preset instruction to set the operation flags of each register in the preset operation table to be valid, and storing the preset operation table in a pipeline register between the first pipeline stage and the second pipeline stage; the processor using the second pipeline stage of the preset instruction to retrieve the preset operation table from the pipeline register, and sending the operation signal corresponding to the first operation to the preset port of the register in the preset operation table where the operation flag is valid for responding to the first operation.
[0014] In this embodiment, the processor responds to the instruction corresponding to the first operation using preset instructions, such as the CLRM and SRM instructions hereinafter. The first pipeline stage (e.g., the EX pipeline stage hereinafter) and the second pipeline stage (e.g., the WB pipeline stage hereinafter) of this instruction are used to complete the sending of the operation signal. Thus, the clock cycles occupied by the EX pipeline stage of the preset instruction do not increase with the number of registers executing the first operation, which helps improve the efficiency of the processor when performing clear or set operations on multiple registers, thereby increasing the speed of the processor's instruction execution.
[0015] For example, the operation identifier can be the zeroing identifier or the set identifier as described below.
[0016] For example, the preset operation table may be the zeroing operation table or the set-to-1 operation table as described below.
[0017] In one possible implementation of the first aspect described above, each register performs a first operation on its stored data in the second pipeline stage of a preset instruction.
[0018] Secondly, embodiments of this application provide a processor, which includes: a computing unit and a plurality of registers, wherein each register includes a preset port for responding to a first operation; and the computing unit is configured to, upon receiving an instruction to perform a first operation on a plurality of first registers among the plurality of registers, send operation signals in parallel to the preset ports of each first register, wherein the first operation is an operation to set the data stored in the register to 0 or an operation to set the data stored in the register to 1; each first register is configured to, upon detecting the operation signal of its respective preset port, perform the first operation on its respective stored data within the same clock cycle.
[0019] In this embodiment, the processor can send operation signals to preset ports of registers in parallel within the same clock cycle, and the registers receiving the operation signals can perform a first operation on their respective stored data within the same clock cycle. Thus, the number of clock cycles or pipeline stages required for the processor to clear or set data in multiple registers does not increase with the number of registers being cleared or set, which improves the efficiency of the processor when clearing or setting multiple registers and increases the speed of processor instruction execution.
[0020] For example, the preset port can be the set-to-1 port or the cleared-to-zero port as described below.
[0021] For example, the operation signal can be a set signal or a clear signal as described below.
[0022] In one possible implementation of the second aspect above, each first register performs a first operation on its stored data within the same clock cycle when it detects the operation signal of its respective preset port: if the first operation is to set the data stored in the register to 0, each first register sets all of its stored data to 0 within the same clock cycle; if the first operation is to set the data stored in the register to 1, each first register sets all of its stored data to 1 within the same clock cycle.
[0023] In one possible implementation of the second aspect above, the preset port includes a first preset port and a second preset port, wherein the first preset port is used to respond to an operation signal corresponding to a first operation that sets the data stored in the register to 0, and the second preset port is used to respond to a first operation that sets the data stored in the register to 1.
[0024] In the embodiments of this application, each register is provided with a first preset port (e.g., the clear port below) for responding to the clear operation and a second preset port (e.g., the clear port below) for responding to the set operation.
[0025] In one possible implementation of the second aspect above, each first register performs a first operation on its stored data within the same clock cycle when an operation signal of its respective preset port is detected in the following manner: if an operation signal of the first preset port is detected, each first register sets all of its stored data to 0 within the same clock cycle; if an operation signal of the second preset port is detected, each first register sets all of its stored data to 1 within the same clock cycle.
[0026] In one possible implementation of the second aspect above, the processor pipeline includes a first pipeline stage and a second pipeline stage; and, when the computing unit receives an instruction to perform a first operation on a plurality of first registers among a plurality of registers, it sends operation signals in parallel to preset ports of each first register in the following manner: the computing unit uses the first pipeline stage of the preset instruction to set the operation flags of each first register in the preset operation table to be valid, and stores the preset operation table in a pipeline register between the first pipeline stage and the second pipeline stage; the computing unit uses the second pipeline stage of the preset instruction to retrieve the preset operation table from the pipeline register, and sends the operation signal corresponding to the first operation to the preset port of the register in the preset operation table where the operation flag is valid for responding to the first operation.
[0027] In this embodiment, the processor responds to the instruction corresponding to the first operation using preset instructions, such as the CLRM and SRM instructions hereinafter. The first pipeline stage (e.g., the EX pipeline stage hereinafter) and the second pipeline stage (e.g., the WB pipeline stage hereinafter) of this instruction are used to complete the sending of the operation signal. Thus, the clock cycles occupied by the EX pipeline stage of the preset instruction do not increase with the number of registers executing the first operation, which helps improve the efficiency of the processor when performing clear or set operations on multiple registers, thereby increasing the speed of the processor's instruction execution.
[0028] For example, the operation identifier can be the zeroing identifier or the set identifier as described below.
[0029] For example, the preset operation table may be the zeroing operation table or the set-to-1 operation table as described below.
[0030] In one possible implementation of the second aspect described above, each of the first registers performs a first operation on the data stored therein during the second pipeline stage of a preset instruction.
[0031] Thirdly, embodiments of this application provide an electronic device that includes the processor provided in the second aspect and any possible implementation thereof.
[0032] Fourthly, embodiments of this application provide a computer-readable medium storing instructions that, when executed by an electronic device, cause the electronic device to implement the first aspect and any possible implementation of the provided register access method.
[0033] Fifthly, embodiments of this application provide a computer program product that, when run on an electronic device, enables the electronic device to implement the first aspect described above and any possible implementation of the register access method provided in the first aspect. Attached Figure Description
[0034] Figure 1 According to some embodiments of this application, a schematic diagram of a processor 1 executing instructions via pipeline is shown;
[0035] Figure 2 According to some embodiments of the present invention, a schematic diagram of a zeroing operation table is shown;
[0036] Figure 3 According to some embodiments of this application, a schematic diagram of another processor 1 executing instructions via pipeline is shown;
[0037] Figure 4 According to some embodiments of this application, a schematic diagram of the EX pipeline stage and the WB pipeline stage for clearing registers R0 to R4 using the CLRM instruction is shown.
[0038] Figure 5 According to some embodiments of this application, a schematic diagram of the EX pipeline stage and the WB pipeline stage is shown, which use SRM instructions to set registers R0 to R4 to 1.
[0039] Figure 6 According to some embodiments of this application, a schematic diagram of the structure of a processor 1 is shown;
[0040] Figure 7 According to some embodiments of this application, a flowchart of a register access method is shown;
[0041] Figure 8 According to some embodiments of this application, a schematic diagram of the structure of an electronic device 100 is shown. Detailed Implementation
[0042] The illustrative embodiments of the present invention include, but are not limited to, register access methods, processors, and electronic devices.
[0043] To facilitate understanding of the technical solutions of this application, the terms involved in the embodiments of this application will be explained first.
[0044] (1) Pipeline
[0045] Pipelining is the mechanism by which a processor executes instructions. In this mechanism, any instruction is divided into multiple pipeline stages. Each pipeline stage is executed by a dedicated hardware or software module. Thus, the execution of each instruction can be completed by multiple ordered pipeline stages. Furthermore, in a pipeline, only one instruction can be in the same pipeline stage at any given time.
[0046] Therefore, for a processor with n pipeline stages, at most n instructions can be executed at any given time, and these n instructions are located in different pipeline stages. Furthermore, when the processor executes multiple instructions sequentially, after one instruction completes its current pipeline stage and enters the next pipeline stage, the next instruction can enter the current pipeline stage.
[0047] The technical solution of the present invention will now be described in conjunction with the accompanying drawings.
[0048] It is understandable that different processors may divide instructions into different pipeline stages. For example, in some embodiments, the processor may divide instructions into three stages: fetch (FE), which retrieves the instruction to be executed from memory; decode (DE), which determines the registers used by the instruction; and execute (EX), which executes the instruction and writes the execution result to the register. Alternatively, the processor may further subdivide the EX pipeline stage, dividing instructions into five pipeline stages: fetch; decode; EX, which calculates the result of the instruction; memory access (ME), which accesses memory and writes data when an instruction needs to access memory; and write back (WR), which writes the execution result to the register. In other embodiments, the number of pipeline stages may also be other numbers, such as 6, 7, 9, or 11. The register access method of this application is applicable to processors with any number of pipeline stages. The following description uses a five-stage pipeline as an example.
[0049] It is understandable that for processors that divide instructions into the above 5 pipeline stages (FE, DE, EX, ME, WB), not all instructions need to execute the above 5 pipeline stages. For example, for instructions that do not need to access memory, the ME pipeline stage does not need to be executed.
[0050] Understandably, a pipeline stage typically occupies one clock cycle of the processor, but for some instructions, a pipeline stage can occupy multiple clock cycles.
[0051] Figure 1 According to some embodiments of this application, a schematic diagram of a processor 1 executing instructions via pipeline is shown.
[0052] like Figure 1As shown, processor 1 divides instructions into 5 pipeline stages (FE, DE, EX, ME, WB). The instructions to be executed by processor 1 are instruction 1, instruction 2, instruction 3, instruction 4, and instruction 5. Instruction 1 is an addition (ADD) instruction, instruction 2 is a subtraction (SUB) instruction, instruction 3 is an LDM instruction (used to clear or set data in registers R0 to R4), instruction 4 is a compare (CMP) instruction, and instruction 5 is an XOR instruction. Figure 1 It can be seen that processor 1 can execute the above five instructions at different pipeline stages simultaneously in T5 clock cycles, and at most one instruction is in the same pipeline stage in each clock cycle.
[0053] refer to Figure 1 When processor 1 uses LDM instructions to clear or set the data in registers R0 to R4, the EX pipeline stage takes 5 clock cycles, from clock cycle T5 to clock cycle T9. Each clock cycle clears or sets the data in one register. During this time, instructions 4 and 5 can only enter the EX pipeline stage after the LDM EX pipeline stage is complete. Processor 1's operation of clearing or setting data in multiple registers consumes a relatively large number of clock cycles, resulting in low efficiency.
[0054] It is understandable that the above example of the processor clearing or setting the data in a register to 1 per clock cycle is just an example. The number of clock cycles occupied by processor 1 in clearing or setting the data in each register is determined by the performance of processor 1.
[0055] To improve the efficiency of processor 1 in clearing or setting data in multiple registers and reduce the number of clock cycles occupied by these operations, this application provides a register access method. By setting clear ports and / or set ports in each register of processor 1, when data in multiple registers needs to be cleared or set, processor 1 can simultaneously send a clear signal to the clear port of the register to be cleared or a set signal to the set port of the register to be set within a preset number of clock cycles. The registers receiving the clear or set signals can clear or set their stored data within the same clock cycle. This ensures that the EX pipeline stage corresponding to the instruction to clear or set multiple registers only occupies a preset number of clock cycles, and this preset number does not change with the increase in the number of registers to be cleared or set. This improves the efficiency of processor 1 in clearing or setting multiple registers and increases the speed of processor 1 instruction execution.
[0056] It is understood that the aforementioned clear port can be set in various registers of processor 1. When the register or the register controller detects that the signal received by the clear port is 1 or high level, the data in the register is set to 0. It is understood that in some other embodiments, the register can also set the data in the register to 0 when it detects that the signal received by the clear port is 0 or low level, and this is not limited here.
[0057] Similarly, the aforementioned set-to-1 port can be configured in various registers of processor 1. When the register or its controller detects that the signal received by the set-to-1 port is 1 or high level, it sets the data in the register to 1. It is understood that in some other embodiments, the register can also set the data in the register to 1 when it detects that the signal received by the set-to-1 port is 0 or low level; this is not a limitation.
[0058] It is understood that in some other embodiments, the aforementioned clear port and set port may be the same port, and the register or the register controller sets the data in the register to 0 when it detects that the signal received by the port is 1, and sets the data in the register to 1 when it detects that the signal received by the port is 0, or sets the data in the register to 1 when it detects that the signal received by the port is 1, and sets the data in the register to 0 when it detects that the signal received by the port is 0.
[0059] It is understood that the aforementioned zeroing port and setting port can be actual physical interfaces, or access interfaces implemented through software, or ports implemented through both physical and software interfaces; no limitation is made here.
[0060] It is understood that in some embodiments, the aforementioned preset number is determined by the processor's performance. Generally, the preset number is 1 or 2, but for some processors with poor performance, the preset number can also be larger, which is not limited here. Because Figure 1 The number of clock cycles required for processor 1 to clear or set data in a register is comparable to the preset number in this embodiment. Therefore, even if the preset number is greater than 1, compared to clearing or setting data in multiple registers using LDM instructions, it can reduce multiple clock cycles and improve the efficiency of processor 1 in executing instructions.
[0061] For ease of description, the following describes the technical solution of this application embodiment by taking the clearing operation of data in multiple registers as an example, where the register sets the data in the register to 0 when it detects that the signal received by the clearing port is 1 or high level.
[0062] It is understood that in some embodiments, processor 1 can perform a clearing operation on data in multiple registers by setting a ClearRegister Multiple (CLRM) instruction. This CLRM instruction obtains the register identifiers to be cleared during the EX pipeline stage, for example, referring to... Figure 2 The CLRM instruction maintains a clear operation table that includes the identifiers of registers in processor 1 and the corresponding clear identifiers for each register. During the EX pipeline stage, it sets the clear identifier of the register to be cleared to 1. During the WB pipeline stage, the CLRM instruction sends a clear signal to the clear port of the register with a clear identifier of 1, based on the clear operation table. This can be done by sending a 1 or setting the corresponding register's input signal high, so that upon receiving the clear signal, that register will set its data to zero.
[0063] It is understood that in other embodiments, the specific contents of the EX pipeline stage and the WB pipeline stage of the CLRM instruction can also be merged or split, which is not limited here.
[0064] For example, refer to Figure 3 Regarding the aforementioned Figure 1 The five instructions to be executed shown can be replaced by the LDM instructions using the CLRM instructions provided in this application embodiment. During the T5 clock cycle, processor 1 executes the EX pipeline stage of the CLRM instruction, setting the clear flag of the registers to be cleared to 1, for example, referring to... Figure 2 Processor 1 can set the clear flag corresponding to registers R0, R1, R2, R3, and R4 in the clear operation table to 1; thus, during the T7 cycle, when processor 1 executes the WB stage of the CLRM instruction, it sends 1 to the clear port of registers R0 to R4 to clear the data in registers R0 to R4.
[0065] Thus, the EX pipeline stage for clearing multiple registers only occupies one clock cycle, improving the efficiency of processor 1 in clearing data in multiple registers. Further, refer to... Figure 3 Processor 1 completes the execution of the above 5 instructions in 9 clock cycles, compared to... Figure 1 The 12 clock cycles in the illustrated embodiment are shortened by 3 clock cycles, which increases the speed at which processor 1 executes instructions.
[0066] It is understood that in some embodiments, processor 1 may execute the WB pipeline stage of the CLRM instruction only after detecting that no other instructions are in the EX pipeline stage and WB pipeline stage, so as to avoid clearing the data in the registers used by other instructions in the EX pipeline stage or WB pipeline stage and affecting the normal execution of other instructions.
[0067] Specifically, Figure 4 According to some embodiments of this application, schematic diagrams are shown of the EX pipeline stage and the WB pipeline stage for clearing registers R0 to R4 using the aforementioned CLRM instructions. (See reference) Figure 4 In the EX pipeline stage of the CLRM instruction, processor 1 can set the clear flags corresponding to registers R0 to R4 in the clear operation table to 1, and store the clear operation table in the pipeline register PR-1. In the WB pipeline stage of the CLRM instruction, according to the clear operation table recorded in the pipeline register PR-1, processor 1 sends a clear signal to the clear port of the corresponding register, for example, by sending 1. The register receiving the clear signal can then set the data in the register to 0. For example, registers R0 to R4 have clear ports CLP-0, CLP-1, CLP-2, CLP-3, and CLP-4, respectively. Processor 1 can send clear signals to the above clear ports CLP-0 to CLP-4, for example, by setting the input signal of the above clear ports to a high level. Then, when registers R0 to R4 detect that the corresponding clear port is high, they will set the data stored in each register to 0.
[0068] It is understood that in some embodiments, the clear signal can also be a 0, that is, setting the input signal of each register to a low level, which is not limited here.
[0069] It is understandable that the pipeline register is used to store the calculation results of each instruction at a certain pipeline stage. Then, when the processor executes the next pipeline stage of each instruction, it can directly obtain data from the pipeline register to complete the calculation of the next pipeline stage.
[0070] It is understood that in some embodiments, the pipeline register PR-1 can be used to represent the aforementioned clear operation table using n-bit binary numbers, with each bit corresponding to a clear flag for a register. Therefore, during the WB stage of the CLRM instruction, processor 1 can send a clear signal to the corresponding register based on the bits in this binary number that are 1. For example, corresponding to... Figure 4 In the example shown, the binary number corresponding to the clear operation table is 00000……011111, that is, bits 0 to 4 are 1, bits 5 to n-1 are 0, indicating that the data in registers R0 to R4 need to be cleared.
[0071] It is understood that the register access method provided in this application is applicable to any register, including but not limited to general-purpose registers (GPRs) such as data registers, address registers, index registers, instruction registers, and flag registers, as well as control registers, segment registers, vector registers (VS), and current program status registers (CPSR).
[0072] It is understood that in some embodiments, different clear instructions can be set for different types of registers. For example, the clear operation instruction VSCCLRM can be set for vector registers to set at least a portion of the data in the vector register to 0.
[0073] Similarly, processor 1 can also set the data in multiple registers to 1 by setting a batch register multiple (SRM) instruction, and maintain a set operation table for recording register identifiers and corresponding set identifiers in the EX pipeline stage. Set the set identifier of the register to be set to 1, and send 1 to the set port of the register with the set identifier of 1 in the WB pipeline stage. Then, when the set port signal is detected to be 1, the registers will set the data in the registers to 1.
[0074] For example, Figure 5 According to some embodiments of this application, a schematic diagram of the execution process of SRM instructions in the EX pipeline stage and the WB pipeline stage is shown.
[0075] refer to Figure 5 Assuming the SRM instruction sets the data in registers R3 through R5 to 1, processor 1, during the EX pipeline stage of the SRM instruction, can set the corresponding set-1 flags of registers R3 through R5 in the set-1 operation table to 1 and store the set-1 operation table in pipeline register PR-2. During the WB pipeline stage of the SRM instruction, processor 1 can send a set-1 signal (e.g., send 1) to the set-1 port of the corresponding register according to the set-1 operation table recorded in pipeline register PR-2. The register receiving the set signal can then set its data to 1. Specifically, for example, registers R3 through R5 have set-1 ports SEP-3, SEP-4, and SEP-5 respectively. During the WB pipeline stage of the SRM instruction, processor 1 sends set-1 signals to these set-1 ports (e.g., sets the input signal of each set-1 port to a high level). When registers R3 through R5 detect that the corresponding set-1 port is high, they will set their stored data to 1.
[0076] Similar to the clear operation table described above, in some embodiments, the pipeline register PR-2 can be used to represent the set-to-1 operation table with n bits, each bit corresponding to a set-to-1 flag for a register. Therefore, during the WB stage of the CLRM instruction, processor 1 can send a set-to-1 signal to the corresponding register based on the bits in the binary number that are 1. For example, corresponding to... Figure 5 In the example shown, the binary number corresponding to the clear operation table is 00000……0111000, that is, the 3rd to 5th bits are 1, the 0th to 2nd bits and the 6th to n-1th bits are 0, indicating that the data in registers R3 to R5 need to be set to 1.
[0077] It is understood that in some embodiments, if the clear port and set port of each register are the same port, the above-mentioned CLRM instruction and SRM instruction will send the clear signal or set signal to the same port during the WB pipeline stage. Then, when each register detects the signal of the same port, it will set or clear the data in the register according to the different signals.
[0078] It is understood that the above explanation of the technical solution of clearing or setting data in multiple registers using CLRM and SRM instructions is only an example. In other embodiments, other instructions with the same or similar functions may also be used, which are not limited here.
[0079] Furthermore, this application provides a processor 1, which can perform clearing or setting operations on multiple registers in the processor according to the register access methods provided in the above embodiments.
[0080] Specifically, Figure 6 According to some embodiments of this application, a schematic diagram of the structure of a processor 1 is shown.
[0081] like Figure 6 As shown, processor 1 includes a computing unit 11, a control unit 12, and a storage unit 13, wherein the storage unit 13 includes n registers and a cache 131.
[0082] The computing unit 11 may include an arithmetic logic unit (ALU) for executing various instructions of the processor 1, such as an ALU for executing the execution pipeline stage of each instruction, including adders, subtractors, comparators, multipliers, etc. In some embodiments, the computing unit 11 may be used to execute the EX pipeline stage of the CLRM and SRM instructions described above.
[0083] The control unit 12 may include an instruction controller, a timing controller, a bus controller, and an interrupt controller. The instruction controller is used to perform pipelined stages such as address fetching and decoding of instructions. The timing controller is used to control the timing of the processor 1's execution of instructions and access to data. The bus controller is used to control the processor's access to the bus and devices connected to the bus. The interrupt controller is used to generate or respond to interrupts.
[0084] Storage unit 13 may include n registers and cache 131.
[0085] The system includes n registers, each containing a clear port and / or a set port. For example, register R0 may include a clear port CLP-0 and a set port SEP-0, register R1 may include a clear port CLP-1 and a set port SEP-1, and register Rn-1 may include a clear port CLP-n-1 and a set port SEP-n-1. These n registers can be used to temporarily store instructions, calculation results of instructions, or input data. Each register can set its data to 1 or 0 when it detects a clear signal from a clear port or a set signal from a set port.
[0086] Cache 131 is used to temporarily store the input data or calculation results required by the computing unit 11.
[0087] It is understood that in some embodiments, the computing unit 11 and the control unit 12 may also include registers, and each register may also include a clear port and / or a set port.
[0088] It is understood that in some embodiments, the storage unit 13 can be used to execute the WB pipeline stage of the above-mentioned CLRM instruction or SRM instruction, and send a clear signal or a set signal to the register to be cleared or set.
[0089] It is understandable that the above Figure 6 The structure of processor 1 shown is only an example. In other embodiments, processor 1 may include more or fewer modules, which is not limited here.
[0090] The processor 1 with the above structure does not require more clock cycles to clear or set multiple registers using the CLRM or SRM instructions. This improves the efficiency of the processor 1 in clearing or setting data in multiple registers, and thus improves the efficiency of the processor 1 in executing instructions.
[0091] The following is based on Figure 6 The structure of processor 1 shown is illustrated, and the technical solution of the embodiments of this application is introduced.
[0092] Specifically, Figure 7According to some embodiments of this application, a schematic flowchart of a register access method is shown. Figure 7 As shown, the process includes the following steps:
[0093] S701: The control unit 12 obtains the first operation on multiple registers and sends the instruction corresponding to the first operation to the calculation unit 11.
[0094] It is understandable that the first operation is to clear or set the data in multiple registers. This first operation can be a single instruction or part of an instruction, and there is no limitation here.
[0095] When the control unit receives a first operation on data in multiple registers, it sends the instruction corresponding to the first operation to the calculation unit 11. This instruction varies depending on the first operation and includes an identifier of the register on which the first operation is performed.
[0096] For example, when the first operation is to clear the aforementioned registers R0 to R4, the instruction can be the aforementioned CLRM instruction, and the instruction includes register identifiers R0 to R4; as another example, when the first operation is to set the aforementioned registers R3 to R5 to 1, the instruction can be the aforementioned SRM instruction, and the instruction includes register identifiers R3 to R5.
[0097] S702: The calculation unit 11 obtains the corresponding operation table based on the instruction corresponding to the first operation.
[0098] When the computing unit 11 obtains the instruction corresponding to the first operation, it obtains the corresponding operation table in the EX pipeline stage of the instruction and stores the intermediate data in the pipeline register.
[0099] It is understandable that the above operation table varies depending on the first operation. When the first operation is a clear operation, the operation table is a clear operation table; when the first operation is a set operation, the operation table is a set operation table.
[0100] For example, when the received instruction is a CLRM instruction, the computing unit 11 can maintain a aforementioned [previous instruction] in the EX pipeline stage of the CLRM instruction. Figure 4 The zeroing operation table shown is used, and the zeroing flags corresponding to registers R0 to R4 in the zeroing operation table are set to 1, and the zeroing operation table is stored in the pipeline register RP-1; for example, when the received instruction is an SRM instruction, the calculation unit 11 can maintain the aforementioned zeroing operation table in the EX pipeline stage of the SRM instruction. Figure 5 The set-1 operation table is shown, and the set-1 flags corresponding to registers R3 to R5 in the clear operation table are set to 1, and the set-1 operation table is stored in the pipeline register RP-2.
[0101] S703: Based on the operation table, memory unit 13 sends a signal to a preset port of multiple registers to perform the first operation in the same clock cycle.
[0102] In the WB pipeline stage of the instruction corresponding to the first operation, the storage unit 13 retrieves the corresponding operation table from the pipeline register and sends a signal to execute the first operation to the preset ports of multiple registers in the same clock cycle.
[0103] It is understood that the preset port can be the aforementioned zeroing port and / or setting port.
[0104] For example, when the received instruction is a CLRM instruction, the storage unit 13 can obtain the clear operation table from the pipeline register PR-1 during the WB pipeline stage of the CLRM instruction, and send a clear signal to the clear port of the register whose clear flag is 1 in the clear operation table; as another example, when the received instruction is an SRM instruction, the storage unit 13 can obtain the set operation table from the pipeline register PR-2 during the WB pipeline stage of the SRM instruction, and send a set signal to the set port of the register whose set flag is 1 in the set operation table.
[0105] S704: Each register responds to the signal to perform the first operation and performs the first operation on the data stored in each register.
[0106] In response to the signal for performing the first operation, each register performs the first operation on the data stored in each register according to preset logic within the same clock cycle. For example, each register can set the data in the register to 0 when it detects that the signal at the clear port is 1 or high; and set the data in the register to 1 when it detects that the signal at the set port is 1 or high.
[0107] It is understood that the execution subject of the above steps S701 to S704 is only an example. In other embodiments, the execution subject of the above steps may also be other modules of the processor 1, which is not limited here.
[0108] It is understood that the execution process of steps S701 to S704 above is only an example. In other embodiments, some steps may be combined or split, which is not limited here.
[0109] The method provided in this application embodiment ensures that the number of clock cycles occupied by the processor 1 in clearing or setting data in multiple registers does not increase with the increase in the number of registers being cleared or set, thereby improving the efficiency of the processor 1 in clearing or setting multiple registers and thus improving the efficiency of the processor 1 in executing instructions.
[0110] further, Figure 8 According to some embodiments of this application, a schematic diagram of the structure of an electronic device 100 is shown. For example... Figure 8 As shown, the electronic device 100 includes one or more processors 101, system memory 102, non-volatile memory (NVM) 103, communication interface 104, input / output (I / O) devices 105, and system control logic 106 for coupling the processor 101, system memory 102, NVM 103, communication interface 104, and input / output (I / O) devices 105. Wherein:
[0111] Processor 101 may include one or more processing units, such as a central processing unit (CPU), a graphics processing unit (GPU), a digital signal processor (DSP), a micro-programmed control unit (MCU), an artificial intelligence (AI) processor, a field-programmable gate array (FPGA), a neural network processing unit (NPU), etc.
[0112] In some embodiments, at least one processing unit in processor 101 may have the structure of the aforementioned processor 1, so as to clear or set a plurality of registers in the processing unit according to the register access method provided in the embodiments of this application.
[0113] System memory 102 is volatile memory, such as random-access memory (RAM) or double data rate synchronous dynamic random-access memory (DDR SDRAM). System memory is used for temporary storage of data and / or instructions. For example, in some embodiments, system memory 102 can be used to store instructions corresponding to the register access methods provided in this application, such as the aforementioned CLRM and SRM instructions.
[0114] The non-volatile memory 103 may include one or more tangible, non-transitory computer-readable media for storing data and / or instructions. In some embodiments, the non-volatile memory 103 may include any suitable non-volatile memory such as flash memory and / or any suitable non-volatile storage device, such as a hard disk drive (HDD), a compact disc (CD), a digital versatile disc (DVD), a solid-state drive (SSD), etc. In some embodiments, the non-volatile memory 103 may also be a removable storage medium, such as a Secure Digital (SD) memory card, etc.
[0115] Specifically, system memory 102 and non-volatile memory 103 may each include a temporary copy and a permanent copy of instruction 107. Instruction 107 may include, when executed by at least one of processors 101, causing electronic device 100 to implement the register access methods provided in the embodiments of this application.
[0116] The communication interface 104 may include a transceiver for providing a wired or wireless communication interface for the electronic device 100, thereby enabling communication with any other suitable device via one or more networks. In some embodiments, the communication interface 104 may be integrated into other components of the electronic device 100, for example, the communication interface 104 may be integrated into the processor 101. In some embodiments, the electronic device 100 may communicate with other devices through the communication interface 104.
[0117] Input / output (I / O) device 105 can be an input device such as a keyboard or mouse, and an output device such as a monitor. Users can interact with electronic device 100 through input / output (I / O) device 105.
[0118] System control logic 106 may include any suitable interface controller to provide any suitable interface to other modules of electronic device 100. For example, in some embodiments, system control logic 106 may include one or more memory controllers to provide an interface to system memory 102 and non-volatile memory 103.
[0119] In some embodiments, at least one of the processors 101 may be packaged together with the logic of one or more controllers for system control logic 106 to form a system in package (SiP). In other embodiments, at least one of the processors 101 may also be integrated on the same chip with the logic of one or more controllers for system control logic 106 to form a system-on-chip (SoC).
[0120] Understandable. Figure 8 The structure of the electronic device 100 shown is merely an example. In other embodiments, the electronic device 100 may include more or fewer components than illustrated, or combine some components, or split some components, or have different component arrangements. The illustrated components may be implemented in hardware, software, or a combination of software and hardware.
[0121] It is understood that electronic device 100 can be any electronic device, including but not limited to mobile phones, wearable devices (such as smartwatches), tablets, desktops, laptops, handheld computers, notebook computers, ultra-mobile personal computers (UMPCs), netbooks, as well as cellular phones, personal digital assistants (PDAs), augmented reality (AR) / virtual reality (VR) devices, etc., and this application embodiment does not limit it.
[0122] The various embodiments of the mechanisms disclosed in this application can be implemented in hardware, software, firmware, or a combination of these implementation methods. Embodiments of this application can be implemented as computer programs or program code executable on a programmable system, the programmable system including at least one processor, a storage system (including volatile and non-volatile memory and / or storage elements), at least one input device, and at least one output device.
[0123] Program code can be applied to input instructions to execute the functions described in this application and generate output information. The output information can be applied to one or more output devices in a known manner. For the purposes of this application, the processing system includes any system having a processor such as, for example, a Digital Signal Processor (DSP), a microcontroller, an Application Specific Integrated Circuit (ASIC), or a microprocessor.
[0124] The program code can be implemented using a high-level procedural language or an object-oriented programming language to communicate with the processing system. Assembly language or machine language can also be used when needed. In fact, the mechanisms described in this application are not limited to any particular programming language. In either case, the language can be a compiled language or an interpreted language.
[0125] In some cases, the disclosed embodiments may be implemented in hardware, firmware, software, or any combination thereof. The disclosed embodiments may also be implemented as instructions carried or stored thereon on one or more temporary or non-temporary machine-readable (e.g., computer-readable) storage media, which may be read and executed by one or more processors. For example, the instructions may be distributed via a network or through other computer-readable media. Therefore, machine-readable media may include any mechanism for storing or transmitting information in a machine-readable (e.g., computer-readable) form, including but not limited to floppy disks, optical disks, CD-ROMs, magneto-optical disks, read-only memory (ROM), random access memory (RAM), erasable programmable read-only memory (EPROM), electrically erasable programmable read-only memory (EEPROM), magnetic cards or optical cards, flash memory, or tangible machine-readable storage for transmitting information (e.g., carrier waves, infrared signals, digital signals, etc.) using the Internet in the form of electrical, optical, acoustic, or other forms of propagated signals. Therefore, machine-readable media include any type of machine-readable medium suitable for storing or transmitting electronic instructions or information in a machine-readable (e.g., computer-readable) form.
[0126] In the accompanying drawings, some structural or methodological features may be shown in a specific arrangement and / or order. However, it should be understood that such a specific arrangement and / or order may not be necessary. Rather, in some embodiments, these features may be arranged in a manner and / or order different from that shown in the illustrative drawings. Furthermore, the inclusion of structural or methodological features in a particular figure does not imply that such features are required in all embodiments, and in some embodiments, these features may be omitted or may be combined with other features.
[0127] It should be noted that the units / modules mentioned in the various device embodiments of the present invention are all logical units / modules. Physically, a logical unit / module can be a physical unit / module, a part of a physical unit / module, or a combination of multiple physical units / modules. The physical implementation of these logical units / modules themselves is not the most important factor; the combination of functions implemented by these logical units / modules is the key to solving the technical problem proposed by the present invention. Furthermore, to highlight the innovative aspects of the present invention, the above-described device embodiments of the present invention have not introduced units / modules that are not closely related to solving the technical problem proposed by the present invention. This does not mean that the above-described device embodiments do not contain other units / modules.
[0128] It should be noted that in the examples and description of this patent, relational terms such as "first" and "second" are used merely to distinguish one entity or operation from another, and do not necessarily require or imply any such actual relationship or order between these entities or operations. Furthermore, the terms "comprising," "including," or any other variations thereof are intended to cover non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such a process, method, article, or apparatus. Without further limitations, an element defined by the phrase "comprising one" does not exclude the presence of other identical elements in the process, method, article, or apparatus that includes said element.
[0129] Although the invention has been illustrated and described with reference to certain preferred embodiments thereof, those skilled in the art will understand that various changes in form and detail may be made therein without departing from the scope of the invention.
Claims
1. A register access method, applied to an electronic device, characterized in that, include: The processor of the electronic device receives an instruction to perform a first operation on a plurality of registers in the processor, wherein the first operation is to set the data stored in the plurality of registers to 0 or to set the data stored in the plurality of registers to 1; In response to the instruction, the processor sends operation signals corresponding to the first operation to preset ports of each of the registers in parallel for responding to the first operation; Each of the registers detects the operation signal of its respective preset port and performs the first operation on its respective stored data within the same clock cycle.
2. The method according to claim 1, characterized in that, Each of the aforementioned registers detects the operation signal of the preset port and performs the first operation on its respective stored data within the same clock cycle, including: If the first operation is to set the data stored in the register to 0, each register will set all the data stored in its respective register to 0 in the same clock cycle; If the first operation is to set the data stored in the register to 1, each register will set all the data stored in it to 1 in the same clock cycle.
3. The method according to claim 1, characterized in that, When the first operation is to set the data stored in the plurality of registers to 0, the preset port is a first preset port; when the first operation is to set the data stored in the plurality of registers to 1, the preset port is a second preset port.
4. The method according to claim 3, characterized in that, Each of the aforementioned registers detects the operation signal of the preset port and performs the first operation on its respective stored data within the same clock cycle, including: If the operation signal of the first preset port is detected, each register sets all the data stored therein to 0 in the same clock cycle; If the operation signal of the second preset port is detected, each register sets all the data stored therein to 1 in the same clock cycle.
5. The method according to claim 1, characterized in that, The processor pipeline includes a first pipeline stage and a second pipeline stage arranged sequentially; and the processor sends operation signals corresponding to the first operation to preset ports of each of the registers in parallel, including: The processor uses the first pipeline stage of the preset instruction to set the operation flags of each register in the preset operation table to be valid, and stores the preset operation table in the pipeline register between the first pipeline stage and the second pipeline stage; The processor uses the second pipeline stage of the preset instruction to obtain the preset operation table from the pipeline register, and sends the operation signal corresponding to the first operation to the preset port of the register in the preset operation table that is valid for responding to the first operation.
6. The method according to claim 5, characterized in that, Each of the registers performs the first operation on its stored data in the second pipeline stage of the preset instruction.
7. A processor, characterized in that, include: The computing unit and a plurality of registers, wherein each of the registers includes a preset port for responding to a first operation; and The computing unit is configured to send operation signals in parallel to the preset ports of each of the first registers when it receives an instruction to perform a first operation on a plurality of first registers among the plurality of registers, wherein the first operation is an operation to set the data stored in the register to 0 or an operation to set the data stored in the register to 1. Each of the first registers is used to perform a first operation on its stored data within the same clock cycle when an operation signal of its respective preset port is detected.
8. The processor according to claim 7, characterized in that, Each of the first registers performs a first operation on its stored data within the same clock cycle upon detecting the operation signal of its respective preset port in the following manner: If the first operation is to set the data stored in the register to 0, each of the first registers will set all the data stored in its respective register to 0 in the same clock cycle; If the first operation is to set the data stored in the register to 1, each of the first registers will set all the data stored in it to 1 in the same clock cycle.
9. The processor according to claim 7, characterized in that, The preset port includes a first preset port and a second preset port. The first preset port is used to respond to the operation signal corresponding to the first operation of setting the data stored in the register to 0, and the second preset port is used to respond to the first operation of setting the data stored in the register to 1.
10. The processor according to claim 9, characterized in that, Each of the first registers performs a first operation on its stored data within the same clock cycle upon detecting the operation signal of its respective preset port in the following manner: If the operation signal of the first preset port is detected, each of the first registers sets all the data stored therein to 0 in the same clock cycle; If the operation signal of the second preset port is detected, each of the first registers sets all the data stored therein to 1 in the same clock cycle.
11. The processor according to claim 7, characterized in that, The processor pipeline includes a first pipeline stage and a second pipeline stage; and the computing unit sends operation signals in parallel to the preset ports of each of the plurality of registers when it receives an instruction to perform a first operation on a plurality of first registers among the plurality of registers: The computing unit uses the first pipeline stage of the preset instruction to set the operation flags of each of the first registers in the preset operation table to be valid, and stores the preset operation table in the pipeline register between the first pipeline stage and the second pipeline stage; The computing unit uses the second pipeline stage of the preset instruction to obtain the preset operation table from the pipeline register, and sends the operation signal corresponding to the first operation to the preset port of the register in the preset operation table that is valid for responding to the first operation.
12. The processor according to claim 11, characterized in that, Each of the first registers performs the first operation on its stored data in the second pipeline stage of the preset instruction.
13. An electronic device, characterized in that, The electronic device includes the processor according to any one of claims 7 to 12.