Instruction decoder, instruction decoding method, electronic device, and storage medium
By employing instruction fusion technology and relying on scoreboard module management, the problems of low decoding efficiency and difficulty in conflict detection of RISC-V instruction sets have been solved, achieving an efficient and secure instruction decoding process.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- SHANDONG YUNHAI GUOCHUANG CLOUD COMPUTING EQUIP IND INNOVATION CENT CO LTD
- Filing Date
- 2026-04-17
- Publication Date
- 2026-06-19
AI Technical Summary
In existing technologies, the decoding efficiency of type I and type R instructions in the RISC-V instruction set is low, and it is difficult to detect read-after-write (RAW) and write-after-write (WAW) conflicts, which leads to stagnation in the decoding process and low security.
By employing instruction fusion technology, a dependency scoring board module, and a dispatch module, hardware-level instruction reorganization and dependency management are used to automatically detect fusionable instructions and determine the execution order. The decoding process is optimized using logic registers and decoding bypasses to enhance security.
It improves instruction decoding efficiency, reduces redundant operations, effectively detects and avoids write-after-read and write-after-write conflicts, and enhances the processor's instruction throughput and security.
Smart Images

Figure CN122240182A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of integrated circuit design technology, specifically to instruction decoders, instruction decoding methods, electronic devices, and storage media. Background Technology
[0002] Instruction decoding is the core step in the CPU pipeline's instruction fetching and execution. The binary machine code generated by program compilation cannot directly drive the hardware; a decoder must translate it into hardware-executable micro-operations and control signals. Decoding identifies instruction format, operands, and other information, performs dependency checks and validity verification, supports parallel instruction execution and memory safety protection, and is crucial for mapping instructions to hardware operations and ensuring the normal operation of the processor.
[0003] Register-type (R-type) and immediate-type (I-type) instruction encoding formats are the two most fundamental in the RISC-V instruction set. Classified by operand source and instruction structure, they are the core categories for processor decoding. Currently, instruction decoding employs independent decoding designs for R-type and I-type instructions, resulting in low encoding / decoding efficiency. Furthermore, the current instruction decoding process struggles to detect Read After Write (RAW) and Write After Write (WAW) conflicts. Summary of the Invention
[0004] This invention provides an instruction decoder, instruction decoding method, electronic device, and storage medium to solve the problems of low encoding and decoding efficiency for R-type and I-type instructions, and difficulty in detecting write-after-read and write-after-write conflicts.
[0005] In a first aspect, this application provides an instruction decoder, which includes: a decoding module, a dependent scoreboard module, and a dispatch module; The decoding module is used to identify the instruction to be fused from the received instructions, fuse the instruction to be fused into the instruction to be decoded, and parse the instruction to be decoded to obtain the control signal and the operation to be executed; The dependency scoreboard module is used to determine the related instructions of the instruction to be decoded based on the data dependencies between instructions, and to determine the execution order of the instruction to be decoded and the related instructions. The dispatch module is used to dispatch control signals and operations to be executed to the instruction execution module according to the execution order.
[0006] Secondly, this application provides an instruction decoding method, the method comprising: The instructions to be fused are identified from the received instructions, fused into instructions to be decoded, and the instructions to be decoded are parsed to obtain control signals and operations to be executed. Based on the data dependencies between instructions, determine the related instructions of the instruction to be decoded, and determine the execution order of the instruction to be decoded and the related instructions; According to the execution order, control signals and operations to be executed are dispatched to the instruction execution module.
[0007] Thirdly, this application provides an instruction decoding apparatus, which includes: The instruction fusion module is used to identify the instruction to be fused from the received instructions, fuse the instruction to be fused into the instruction to be decoded, and parse the instruction to be decoded to obtain the control signal and the operation to be executed; The sequence determination module is used to determine the related instructions of the instruction to be decoded based on the data dependencies between instructions, and to determine the execution order of the instruction to be decoded and the related instructions; The instruction dispatch module is used to dispatch control signals and operations to be executed to the instruction execution module according to the execution order.
[0008] Fourthly, this application provides an electronic device, including: a memory and a processor, which are communicatively connected to each other. The memory stores computer instructions, and the processor executes the computer instructions to perform the instruction decoding method of the second aspect or any corresponding embodiment described above.
[0009] Fifthly, this application provides a computer-readable storage medium storing computer instructions that cause a computer to execute the instruction decoding method of the second aspect or any corresponding embodiment described above.
[0010] In a sixth aspect, this application provides a computer program product, including computer instructions, which are used to cause a computer to execute the instruction decoding method of the second aspect or any corresponding embodiment described above.
[0011] This application utilizes a decoding module to identify the instruction to be fused from the received instructions, fuses the instruction to be fused into a decoded instruction, and parses the decoded instruction to obtain control signals and operations to be executed. A dependency scoring board module uses data dependencies between instructions to determine the related instructions of the decoded instruction and the execution order of the instructions. A dispatch module dispatches the control signals and operations to be executed to the instruction execution module according to the execution order. This solves the problems of low encoding / decoding efficiency for R-type and I-type instructions, and difficulty in detecting write-after-read and write-after-write conflicts. Introducing instruction fusion technology into the decoding module supports automatic pairing detection and dynamic fusion of different types of instructions, and decodes the fused instruction. Hardware-level instruction reorganization reduces redundant operations and improves decoding efficiency. The dependency scoring board module tracks data dependencies between instructions and determines the execution order of instructions to eliminate pipeline congestion and avoid instruction execution conflicts. Attached Figure Description
[0012] To more clearly illustrate the technical solutions in the specific embodiments or related technologies of this application, the drawings used in the description of the specific embodiments or related technologies will be briefly introduced below. Obviously, the drawings described below are some embodiments of this application. For those skilled in the art, other drawings can be obtained from these drawings without creative effort.
[0013] Figure 1 This is a schematic diagram of the structure of an instruction decoder according to an embodiment of this application; Figure 2 This is a schematic diagram of the overall design architecture of the decoder according to an embodiment of this application; Figure 3 This is a flowchart of decoder decoding instructions according to an embodiment of this application; Figure 4 This is a schematic diagram of a dependency scoreboard architecture according to an embodiment of this application; Figure 5 This is a schematic diagram of a logic register according to an embodiment of this application; Figure 6 This is a flowchart illustrating the instruction decoding method according to an embodiment of this application; Figure 7 This is a structural block diagram of an instruction decoding apparatus according to an embodiment of this application; Figure 8 This is a schematic diagram of the hardware structure of an electronic device according to an embodiment of this application. Detailed Implementation
[0014] The technical solutions of the embodiments of this application will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of this application, and not all embodiments. Based on the embodiments of this application, all other embodiments obtained by those of ordinary skill in the art without creative effort are within the protection scope of this application.
[0015] It should be noted that, in the description of this application, the terms "comprising," "including," or any other variations thereof are intended to cover non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such a process, method, article, or apparatus. The terms "first," "second," etc., in this application are used to distinguish similar objects and are not used to describe a specific order or sequence.
[0016] To enable those skilled in the art to better understand the present application, the present application will be further described in detail below with reference to the accompanying drawings and specific embodiments.
[0017] The hardware description language (Verilog) digital design methodology is an electronic system-level design technique based on a hardware description language (HDL). This technique employs a hybrid modeling approach at the behavioral, register-transfer, and gate levels, supporting synchronous sequential logic design (synchronous reset / clock enable), combinational logic optimization, and structured hierarchical design. Through synthesis, it maps code to standard cells of the target technology library, and formal verification and static timing analysis ensure timing closure. Its core features include parallel execution semantics, event-driven simulation, and deterministic hardware implementation of synthesizable subsets. RISC-V is an open-source instruction set architecture (ISA) based on the principles of Reduced Instruction Set Computing (RISC). It adopts a modular design, supporting a scalable basic instruction set and various optional extension subsets, such as floating-point operations (F extension), vector computation (V extension), and atomic operations (A extension). Its technical features include efficient five-stage pipeline execution, load-store architecture, fixed 32-bit instruction encoding, and a streamlined register set (32 general-purpose registers). It also supports multiple privileged security mechanisms, including user mode (U), supervisory mode (S), and machine mode (M). RISC-V supports cross-domain applications from embedded microcontroller units (MCUs) to high-performance multi-core SoCs, and can achieve domain-specific architecture optimization through custom instruction extensions, significantly outperforming traditional closed-source instruction set architectures in terms of energy efficiency and design freedom. Its ecosystem has formed a complete toolchain support (GCC / LLVM), emulators (Spike / QEMU), and hardware verification framework (RISC-CV-DV), becoming a key technology route for heterogeneous computing and the independent controllability of domestically produced chips. The central processing unit design methodology is based on instruction set architecture specifications, employing multi-stage pipelines and superscalar techniques to achieve instruction-level parallelism, and optimizing throughput through branch prediction and out-of-order execution. Modern CPU designs emphasize energy efficiency optimization, combining dynamic voltage and frequency scaling and clock gating to reduce power consumption, while employing multi-core and SIMD (such as AVX / Neon) extensions to enhance parallel computing capabilities. The CPU decoder (Instruction Decoder) is a key front-end module responsible for parsing binary instructions into micro-operations.
[0018] The current instruction decoding process has the following problems: the RISC-V I and R instruction sets are designed independently, resulting in low encoding and decoding efficiency; it cannot detect RAW and WAW conflicts; it cannot store intermediate data, causing the decoding process to stall; it mostly uses vertical execution paths, resulting in redundant computation paths; after an instruction is attacked, it is easy to cause instruction leakage, resulting in low security; and it mostly uses serial decoding design, resulting in low decoding efficiency.
[0019] Based on the above, this application provides an instruction decoder. The instruction decoder is implemented using a dependency scoreboard, bypass design, interrupt logic, logic registers, hardware-level memory protection technology, and the Verilog hardware description language. The instruction decoder employs instruction fusion technology combining I and R instructions, supporting dynamic instruction fusion. During the decoding phase, the decoder automatically detects fusionable instruction pairs. The instruction decoder uses a dependency scoreboard; after an instruction is decoded, its operands and result addresses are recorded in the dependency scoreboard. Before the execution unit reads operands or writes results, the dependency scoreboard checks for conflicts. If data dependencies exist, the execution of the instruction is delayed until the dependent instruction completes. The instruction decoder uses an Architecture Register File (ARF) design to store operands and intermediate results during the decoding process, accelerating data storage and access efficiency, and providing state maintenance and recovery. The instruction decoder employs a decoding bypass mechanism to reduce critical path latency and improve instruction throughput, advance data provisioning, improve efficiency, and allows for optional dynamic path selection. In addition, a memory protection mechanism and multiple decoding slots are designed in the instruction decoder. The Physical Memory Protection (PMP) mechanism is adopted, memory region access control is set, privilege level isolation is set, non-privileged code access to hardware registers is restricted, safe regions are locked, and multiple decoding slots perform parallel decoding to improve the processor's instruction throughput.
[0020] According to an embodiment of this application, an instruction decoder is provided. It should be noted that the instruction decoder can run on a computer system such as a set of computer-executable instructions, for example, a computer, a server, etc., or it can run in an integrated circuit corresponding to the instruction decoder.
[0021] This embodiment provides an instruction decoder. Figure 1 This is a structural diagram of the instruction decoder according to an embodiment of this application, as shown below. Figure 1 As shown, the instruction decoder includes: a decoding module, a dependency scoreboard module, and a dispatch module; The decoding module is used to identify the instruction to be fused from the received instructions, fuse the instruction to be fused into the instruction to be decoded, and parse the instruction to be decoded to obtain the control signal and the operation to be executed; The dependency scoreboard module is used to determine the related instructions of the instruction to be decoded based on the data dependencies between instructions, and to determine the execution order of the instruction to be decoded and the related instructions. The dispatch module is used to dispatch control signals and operations to be executed to the instruction execution module according to the execution order.
[0022] Specifically, the decoding module, for example Figure 2 The hardware decoding module in the code directly parses instruction codes and generates control signals using fixed hardware logic circuits. Furthermore, during instruction decoding, the decoding module first identifies the opcode, register dependencies, and semantic relationships of the received instructions. Based on this information, it identifies fusionable instructions from the received instructions as the instructions to be fused. These fusion instructions are then fused into a single operation to obtain the instruction to be decoded. Additionally, as... Figure 2 As shown, the decoder is the instruction decoder. It obtains instructions through the instruction fetch module. The process includes: obtaining the cached instructions sequentially through the instruction cache and instruction buffer, and pre-decoding them to obtain the instruction input decoder.
[0023] The decoding module uses fixed hardware logic circuits to parse the instruction to be decoded, obtaining control signals and operations to be executed. Control signals indicate the operations the execution module needs to perform, such as: opening a register, instructing the ALU to perform addition / subtraction, specifying the data path, and whether to write data back to the ARF. Execution modules include functional units such as the ALU, FPU, and LSU. Operations to be executed include data writing operations and fused address generation and storage operations.
[0024] Dependency scoreboard module includes Figure 2The dependency scoreboard in the code is primarily used to dynamically track data dependencies between instructions to ensure that instructions are executed in the correct order after the dependency conditions are met. The dependency scoreboard tracks data dependencies between instructions through a register / memory dependency table. Examples of data dependencies include RAW (Read After Write, true dependency), WAR (Write After Read, anti-dependency), and WAW (Write After Write, output dependency). RAW requires operations on the same registers or memory to ensure that data read by a read instruction is written after the data written by the read instruction; WAR requires operations on the same registers or memory to prevent data written by a write instruction from overwriting data read by a read instruction; and WAW ensures that data written by a later write instruction is not overwritten by data written by a previous write instruction. Additionally, the dependency scoreboard module also manages the usage status of functional units (ALU, FPU, LSU, etc.) to avoid structural hazards.
[0025] The dependency scoring board module determines the data dependencies between instructions using a register / memory dependency table. Based on these dependencies, it identifies the related instructions for the instruction to be decoded. For example, if the data dependency is WAR and the instruction to be decoded is a write instruction, then the related instructions are read instructions. The module then determines the execution order of the instruction to be decoded and the related instructions. For instance, it executes the read instruction first, followed by the write instruction, to prevent the write instruction from overwriting the read instruction's data.
[0026] In addition, after decoding the instruction, the instruction decoder sends the decoded control signals and the operation to be executed to the dependency scoring board module. The dependency scoring board module marks the registers and memory addresses to be executed. After the execution unit completes the register / access operation, it clears the dependency locks in the dependency scoring board module, releases the registers / memory addresses, and notifies the execution unit that the blocked instructions are ready to be executed.
[0027] The dispatch module dispatches control signals and operations to be executed to the instruction execution module according to the execution order. For example, the dispatch module includes: Figure 2 The module handles the renaming (register) / dispatch (instruction) process. Registers within this module perform renaming operations to eliminate false dependencies (WAR / WAW) between instructions, retaining only true data dependencies (RAW). This involves dynamically mapping logical registers to physical registers. During instruction decoding, the renaming phase allocates the target logical register to a new physical register, allowing subsequent instructions to execute in parallel without waiting for previous instructions to be written back. The renaming table maintains the mapping between logical and physical registers and updates the architecture register state upon instruction commit.
[0028] Based on the above, the workflow of the hardware decoding module's decoding instructions is as follows: Figure 3As shown, the process includes: instruction fetching (reading instructions from the instruction cache); instruction caching (caching instructions in the instruction cache, awaiting the next step); hardwired decoding (directly parsing instruction encoding and generating control signals through fixed hardware logic circuits); microcode / fusion decoding (controlling the execution flow of complex instructions through pre-stored micro-instruction sequences, decomposing a macro instruction into multiple low-level hardware operations; and decomposing complex fusion instructions into simplified low-level hardware operations through pre-stored fusion instruction formats); branch prediction (predicting branches for instructions, with the prediction strategy described above); bypass processing (issue and execute, prioritizing the issuance of dependent instructions; immediately triggering exceptions for illegal instructions; clearing the pipeline when prediction errors occur); dependency checking (scoreboard lookup); and instruction dispatch (dispatching instructions to lower-level pipeline units, with the dispatch operation described above).
[0029] In addition, before the instruction decoder executes the above process, it needs to enter the configuration phase. The configuration phase includes: defining protected areas and permissions; runtime checks, where the hardware compares each PMP rule with the CPU's memory access; if a match is found and permissions allow: access continues; if a match is found but permissions are insufficient: an exception is triggered; if no match is found: access is allowed by default.
[0030] The instruction decoder provided in this embodiment utilizes a decoding module to identify the instruction to be fused from the received instructions, fuses the instruction to be fused into a single instruction to be decoded, and parses the single instruction to be decoded to obtain control signals and operations to be executed. A dependency scoring board module uses the data dependencies between instructions to determine the related instructions of the single instruction to be decoded and to determine the execution order of the instructions. A dispatch module dispatches the control signals and operations to be executed to the instruction execution module according to the execution order. Instruction fusion technology is introduced into the decoding module, supporting automatic pairing detection and dynamic fusion of different types of instructions, and decoding the fused instruction. Hardware-level instruction reorganization reduces redundant operations and improves decoding efficiency. The dependency scoring board module tracks the data dependencies between instructions and determines the execution order of instructions to eliminate pipeline congestion and avoid instruction execution conflicts. This solves the problems of low encoding and decoding efficiency for R-type and I-type instructions, and the difficulty in detecting write-after-read and write-after-write conflicts.
[0031] As an optional embodiment, the decoding module includes: an instruction information acquisition unit, an instruction fusion unit, and an instruction decomposition unit; The instruction information acquisition unit is used to acquire the opcode, register dependency relationship and semantic association of the instruction, and determine the instruction to be fused based on the opcode, register dependency relationship and semantic association; The instruction fusion unit is used to fuse instructions to be fused into instructions to be decoded according to a preset fusion instruction format. The instruction decomposition unit is used to decompose macro instructions in an instruction into multiple hardware operations.
[0032] Specifically, the instruction information acquisition unit acquires the opcode, register dependency, and semantic association of the instruction. For example, the instruction information acquisition unit acquires the semantic association between instructions from the hybrid decoding design table, such as instruction 1 being LW and instruction 2 being ADDI, indicating that the two are related.
[0033] Based on the opcode, register dependencies, and semantic relationships, the instructions to be fused are determined. For example, based on the opcode, register dependencies, and semantic relationships, instruction 1 is determined to be LW and instruction 2 to be ADDI. Since they are related, they need to be fused, and are therefore selected as the instructions to be fused. Other instructions that need to be fused include LH and ADD, LB and SUB, ADD and JALR, etc., as shown in Table 1. Other instructions to be fused are listed in Table 1 and will not be elaborated further here.
[0034] Table 1 Hybrid Decoding Design Table
[0035] In this context, LW represents the Load Word instruction, LH represents the Load Halfword instruction, LB represents the Load Byte instruction, ADDI represents the Add Immediate instruction, ADD represents the Add instruction, SUB represents the Subtract instruction, and JALR represents the Jump and LinkRegister instruction.
[0036] Instruction fusion unit, for example Figure 2 The fusion instruction decoding unit in the middle. Preset fusion instruction formats include: memory access instruction + arithmetic instruction, arithmetic instruction + control instruction, arithmetic instruction + memory access instruction, etc.
[0037] The instruction fusion unit fuses the instructions to be fused into instructions to be decoded according to the preset fusion instruction format, and then fuses them into a single operation. For example, according to the preset fusion instruction format of "memory access instruction + arithmetic instruction", the LW instruction and ADDI instruction are fused to obtain the instructions to be decoded, and the execution order is determined to avoid write-back-read latency. By using the preset fusion instruction format, the complex fusion is broken down into simplified low-level hardware operations.
[0038] The instruction decomposition unit decomposes macro instructions in an instruction into multiple hardware operations. For example, by using a pre-stored microinstruction sequence, it controls the execution flow of complex instructions and decomposes a macro instruction into multiple low-level hardware operations.
[0039] In this embodiment, instruction fusion technology is introduced at the front end of the decoding pipeline, innovatively supporting the automatic pairing detection and combined execution of I-type and R-type instructions. Redundant operations are reduced through hardware-level instruction reorganization. Instructions to be fused can be accurately identified based on instruction relationships, and multiple instructions can be merged into a single operation according to a preset format. Simultaneously, macro instructions are decomposed into low-level hardware operations, effectively avoiding write-back. Reduce read latency, decrease temporary storage overhead, simplify hardware execution process, improve instruction decoding and execution efficiency, and optimize pipeline operation performance.
[0040] As an optional embodiment, the dependency scoreboard module includes: a dependency identification unit and a state management unit; The dependency identification unit is used to perform dependency tracking based on register dependency tables or memory dependency tables to determine the data dependencies between instructions; The status management unit is used to mark the register and memory address corresponding to the decoding instruction when a decoding instruction is received from the instruction decoder; The status management unit is also used to clear dependency locks in the scoreboard, release registers and memory addresses, and notify the execution module to execute blocked instructions after the execution module completes register access operations.
[0041] Specifically, the dependency identification unit is mainly used to dynamically track data dependencies between instructions to ensure that instructions are executed in the correct order after the dependency conditions are met; it tracks data dependencies between instructions through register / memory dependency tables. The dependency identification unit performs dependency tracking based on register dependency tables or memory dependency tables to determine data dependencies between instructions. Examples of data dependencies include RAW, WAR, and WAW. An example of a register dependency table is shown in Table 2.
[0042] Table 2 Register Dependency Table
[0043] The instruction decoder sends the decoded instruction to the status management unit. Upon receiving the decoded instruction from the instruction decoder, the status management unit marks the register and memory address corresponding to the decoded instruction.
[0044] After the execution module completes its register access operations, the state management unit clears dependency locks within the scoreboard, releases registers and memory addresses, and notifies the execution module to execute blocked instructions. Execution modules include, for example, the ALU, FPU, and LSU. Additionally, the state management unit manages the usage state of the execution module to prevent structural hazards.
[0045] A dependency scoreboard module is designed using dependency scoreboard technology. After an instruction is decoded, its operand and result addresses are recorded in the dependency scoreboard. Before the execution unit reads the operand or writes the result, the dependency scoreboard checks for conflicts. If a data dependency exists, the execution of the instruction is delayed until the dependent instruction completes.
[0046] The workflow of the scoreboard module is as follows: Figure 4 As shown, the instruction decoder outputs the decoded instruction, and the state management unit marks the registers and memory addresses that the instruction needs to access, establishing dependency locks. The dependency identification unit dynamically tracks data dependencies such as RAW, WAR, and WAW between instructions based on the register dependency table and memory dependency table. When data dependencies or resource conflicts exist, the scoreboard blocks subsequent instructions. After the execution unit completes the register / memory operation, it clears the dependency locks and releases the resources. The state management unit notifies the execution module to continue executing the blocked instruction, while managing the execution unit state to avoid structural hazards and ensure orderly pipeline execution.
[0047] In this embodiment, the dependency scoreboard can dynamically track various data dependencies, mark registers and memory addresses and manage dependency locks; after instruction execution, it promptly clears locks, releases resources and allows blocked instructions to proceed, effectively avoiding data hazards and structural hazards, ensuring orderly instruction execution, and improving pipeline stability and operating efficiency.
[0048] As an optional embodiment, the instruction decoder further includes: a logic register and a decoding bypass; Logical registers are used to store instruction operands, calculation results, and program status. Decoding bypass is used to obtain the data to be forwarded that has been calculated but not written back, and to forward the data to subsequent instructions. Decoding bypass is also used to pass the execution results of the execution module in response to control signals to the relevant instructions.
[0049] Specifically, the instruction decoder also includes the logical register ARF. Additionally, as... Figure 2 As shown, the instruction decoder also includes a decoding bypass.
[0050] The Arithmetic Register Array (ARF) is primarily used to manage register sets, storing instruction operands and the results of instruction computation. Logic registers also store and maintain program state, including critical status registers. An ARF is a register array, such as a set of static RAM (SRAM) cells. Logic registers include read / write ports and employ a multi-port design to support parallel access.
[0051] The workflow of the logic register ARF is as follows: Figure 5 As shown, during instruction execution, the ARF is responsible for storing instruction operands and intermediate calculation results; receiving dependency status and control signals from the scoreboard to complete register data read and write operations; and cooperating with the decoding bypass path to directly forward calculation results that have not been written back to subsequent instructions, reducing data waiting delays; and updating the register status after execution to provide stable data support for instruction dispatch and pipelined execution.
[0052] The decode bypass retrieves the computation result of an instruction that has just been calculated but not yet written back, and uses it as data to be forwarded. This data is then forwarded to subsequent instructions to avoid pauses. To reduce pipeline stalls caused by data hazard quirks, the decode bypass directly passes calculated data that has not yet been written back to subsequent instructions, avoiding performance penalties caused by waiting for register writes.
[0053] To eliminate data hazard pauses, decode bypass allows the execution module to directly pass the execution results of control signals to the relevant instructions without waiting for the register file to be written back. The relevant instructions are those that depend on the control signals.
[0054] In this embodiment, an ARF (Automatic Logical Register) design is used to store operands and intermediate results during the decoding process, thereby accelerating data storage and access efficiency and providing state maintenance and recovery. A DecodeBypass mechanism is also employed to reduce critical path latency and improve instruction throughput, advance data supply, improve efficiency, and allow for optional dynamic path selection.
[0055] As an optional embodiment, the decoding module further includes: an operation queue and a preset number of decoding slots; The operation queue is used to buffer the sub-instructions after the decoded instruction and to schedule the sub-instructions; The decoding slot contains independent opcode decoding logic circuitry, operand read ports, and control signal generation circuitry.
[0056] Specifically, operating on queues includes, for example: Figure 2 The micro-operation queue, decoding slots, for example: Figure 2 The Nx decoding slot in the middle.
[0057] The operation queue is used to cache the sub-instructions after decoding the instruction to be decoded and to enable dynamic scheduling of the sub-instructions, such as decoded micro-instructions. The operation queue is implemented using multi-port SRAM or register file, supports out-of-order writing and sequential reading according to dependencies, and ensures that long-latency operations do not block subsequent instructions through age counting or priority flags.
[0058] The Nx decoding slot contains independent opcode decoding logic circuitry, operand read ports, and control signal generation circuitry. The Nx decoding slot supports static allocation (fixed processing of simple / complex instructions) or dynamic scheduling (flexible allocation based on instruction type).
[0059] In this embodiment, a multi-decoding-slot design enables parallel decoding, improving the processor's instruction throughput. This solution supports out-of-order writing and sequential reading by operating a queue cache and scheduling sub-instructions, avoiding blocking during long-latency operations. The multiple decoding slots employ independent decoding and control circuits, allowing for parallel instruction decoding, effectively increasing processor instruction throughput and optimizing pipeline execution efficiency.
[0060] As an optional embodiment, the instruction decoder further includes: a data protection module; The data protection module is used to determine the data protection area in memory, and to determine the access permissions, access priority, and data bits to be locked for the data protection area. The data protection module is also used to check the device's access permissions and access priority when the device accesses the data protection area. If the check passes, the device is allowed to access the data protection area and the data bits to be locked are locked.
[0061] Specifically, this embodiment creates a data protection module based on the PMP protection mechanism. The data protection module controls the hardware-level security mechanism for physical memory access permissions. Its core function is to restrict access to specific physical memory regions by master devices such as CPU cores and DMA through programmable rules.
[0062] The data protection module can partition physical memory into protected areas, serving as data protection zones within memory. It can control access permissions, such as setting read (R), write (W), and execute (X) permissions, supporting both user mode (U) and machine mode (M). The module can also set access priorities / priority arbitration, ensuring that priority takes precedence when multiple PMP rules conflict. Furthermore, it can set a lockable data bit (L) to prevent runtime data tampering (requiring a reset to modify).
[0063] The workflow of the data protection module is as follows: During the configuration phase, protected areas and permissions are defined; during runtime checks, when a device accesses the data protection area, the device's access permissions and access priority are checked, with the hardware comparing each rule against the PMP rules; if a match is found, the check passes, and the device is allowed to access the data protection area, locking the data bits to be locked. If the check fails, insufficient permissions are indicated, triggering an exception. If no match is found, access is allowed by default. Devices include, for example, central processing units (CPUs) and graphics processing units (GPUs).
[0064] In this embodiment, a memory protection mechanism for instruction decoding is designed, which adopts the PMP protection mechanism, sets memory region access control, sets privilege level isolation, restricts non-privileged code access to hardware registers, and locks the safe region; a security verification mechanism with decoding-execution linkage is adopted, which pre-checks the legality of memory access during the instruction decoding stage, moving security protection to the very front of the pipeline.
[0065] As an optional embodiment, the dispatch module includes: a renaming unit and a register allocation unit; The renaming unit is used to rename registers to obtain new register names and update the register state when dispatching control signals to the instruction execution module. The new register names are used to eliminate the preset dependencies between instructions. The renaming unit is also used to determine the mapping relationship between logical registers and physical registers; The register allocation unit is used to allocate a new physical register to the target logical register when parsing the instruction to be decoded. The new physical register is used to process subsequent instructions if the preceding instruction has not been written back.
[0066] Specifically, the renaming unit renames registers to obtain new register names and updates the register state when dispatching control signals to the instruction execution module. The new register names eliminate pre-defined dependencies between instructions. Eliminating pre-defined dependencies includes eliminating false dependencies (WAR / WAW) and retaining only true data dependencies (RAW). Updating the register state when dispatching control signals to the instruction execution module includes updating the architecture register state, for example, when submitting control signals to the instruction execution module. Additionally, the renaming unit can dynamically map register states to logical registers via physical registers.
[0067] The renaming unit contains a renaming table, which maintains the mapping relationship between logical registers and physical registers, and updates the architecture register state when an instruction is submitted.
[0068] The register allocation unit is used to enter the renaming stage when parsing the instruction to be decoded. During the renaming stage, a new physical register is allocated to the target logical register, allowing subsequent instructions to be executed in parallel without waiting for the preceding instruction to be written back, and processing subsequent instructions if the preceding instruction has not been written back.
[0069] In this embodiment of the application, the solution eliminates false dependencies of WAR and WAW by renaming registers, retaining only true dependencies of RAW; it establishes a mapping between logical and physical registers, allocates new physical registers for instructions, and enables subsequent instructions to be executed in parallel without waiting for the previous write-back, effectively reducing pipeline blockage and improving instruction execution parallelism and efficiency.
[0070] As an alternative embodiment, such as Figure 2 As shown, the instruction decoder may also include a branch prediction module and an operation selection module.
[0071] Specifically, the branch prediction module employs the following prediction strategies: 1. Default prediction of no jump: This is the fundamental strategy in static branch prediction, primarily targeting forward branches (i.e., conditional branches where the target address is greater than the current instruction address). Its core assumption is that programs typically execute sequentially, and jump paths in conditional branches (such as error handling and special conditions) are triggered infrequently. 2. Predicting jump for backward branches (loops): Backward branches (where the target address is less than the current address) usually correspond to loop structures. The static prediction of a default "jump" is based on the strong temporal locality of loops: the loop body often executes tens to millions of times before exiting.
[0072] Branch prediction: Perform branch prediction on instructions; the prediction strategy is described above. Bypass processing: Issue and execute instructions, prioritizing the issuance of instructions without dependencies; illegal instructions immediately trigger an exception; the pipeline is cleared when a prediction error occurs.
[0073] In this embodiment, the scheme adopts a static branch prediction strategy, where forward branches do not jump by default, while backward loop branches predict and jump; combined with bypass processing, it prioritizes the issuance of dependent instructions, triggers exceptions immediately for illegal instructions, and clears the pipeline when prediction errors occur, effectively reducing branch latency and improving pipeline execution efficiency and system stability.
[0074] According to an embodiment of this application, an instruction decoding method embodiment is provided. It should be noted that the steps shown in the flowchart in the accompanying drawings can be executed in the above-mentioned instruction decoder, or in a computer system such as a set of executable instructions, for example, a computer, a server, etc. Although a logical order is shown in the flowchart, in some cases, the steps shown or described can be executed in a different order than that shown here.
[0075] This embodiment provides an instruction decoding method. Figure 6 This is a flowchart of an instruction decoding method according to an embodiment of this application, such as... Figure 6 As shown, the process includes the following steps: Step S601: Determine the instruction to be fused from the received instructions, fuse the instruction to be fused into the instruction to be decoded, and parse the instruction to be decoded to obtain the control signal and the operation to be executed.
[0076] Step S602: Based on the data dependencies between instructions, determine the related instructions of the instruction to be decoded, and determine the execution order of the instruction to be decoded and the related instructions.
[0077] Step S603: Dispatch control signals and operations to be executed to the instruction execution module according to the execution order.
[0078] Specifically, the instruction decoder obtains instructions through the instruction fetch module. The process includes: sequentially passing through the instruction cache and instruction buffer to obtain the cached instructions, and pre-decoding them to obtain the instruction input decoder.
[0079] The decoding module uses fixed hardware logic circuits to parse the instruction to be decoded, obtaining control signals and operations to be executed. Control signals indicate the operations the execution module needs to perform, such as: opening a register, instructing the ALU to perform addition / subtraction, specifying the data path, and whether to write data back to the ARF. Execution modules include functional units such as the ALU, FPU, and LSU. Operations to be executed include data writing operations and fused address generation and storage operations.
[0080] The dependency scoring board module determines the data dependencies between instructions using a register / memory dependency table. Based on these dependencies, it identifies the related instructions for the instruction to be decoded. For example, if the data dependency is WAR and the instruction to be decoded is a write instruction, then the related instructions are read instructions. The module then determines the execution order of the instruction to be decoded and the related instructions. For instance, it executes the read instruction first, followed by the write instruction, to prevent the write instruction from overwriting the read instruction's data.
[0081] The dispatch module dispatches control signals and operations to be executed to the instruction execution module according to the execution order. For example, the dispatch module includes: Figure 2 The module handles the renaming (register) / dispatch (instruction) process. Registers within this module perform renaming operations to eliminate false dependencies (WAR / WAW) between instructions, retaining only true data dependencies (RAW). This involves dynamically mapping logical registers to physical registers. During instruction decoding, the renaming phase allocates the target logical register to a new physical register, allowing subsequent instructions to execute in parallel without waiting for previous instructions to be written back. The renaming table maintains the mapping between logical and physical registers and updates the architecture register state upon instruction commit.
[0082] The instruction decoding method provided in this embodiment introduces instruction fusion technology into the decoding module, supporting automatic pairing detection and dynamic fusion of different types of instructions, and decoding the fused instructions. Hardware-level instruction reorganization reduces redundant operations and improves decoding efficiency. A dependency scoring board module tracks data dependencies between instructions and determines the execution order of instructions to eliminate pipeline blockage and avoid instruction execution conflicts. This solves the problems of low encoding and decoding efficiency for R-type and I-type instructions, and the difficulty in detecting write-after-read and write-after-write conflicts.
[0083] This embodiment also provides an instruction decoding device for implementing the above embodiments and preferred embodiments; details already described will not be repeated. As used below, the term "module" can refer to a combination of software and / or hardware that implements a predetermined function. Although the devices described in the following embodiments are preferably implemented in software, hardware implementations, or a combination of software and hardware, are also possible and contemplated.
[0084] This embodiment provides an instruction decoding device, such as... Figure 7 As shown, it includes: The instruction fusion module 701 is used to determine the instruction to be fused from the received instructions, fuse the instruction to be fused into an instruction to be decoded, and parse the instruction to be decoded to obtain control signals and operations to be executed; The sequence determination module 702 is used to determine the related instructions of the instruction to be decoded based on the data dependency relationship between instructions, and to determine the execution order of the instruction to be decoded and the related instructions; The instruction dispatch module 703 is used to dispatch control signals and operations to be executed to the instruction execution module according to the execution order.
[0085] Further functional descriptions of the above modules and units are the same as those in the corresponding embodiments described above, and will not be repeated here.
[0086] In this embodiment, the instruction decoding device is presented in the form of a functional unit. Here, a unit refers to an ASIC (Application Specific Integrated Circuit) circuit, a processor and memory that execute one or more software or fixed programs, and / or other devices that can provide the above functions.
[0087] Figure 8 This is a schematic diagram of the structure of an electronic device provided in an embodiment of the present invention.
[0088] The following is a detailed reference. Figure 8 This diagram illustrates a suitable structural schematic for implementing an electronic device according to embodiments of the present invention. The electronic device may include a processor (e.g., a central processing unit, graphics processor, etc.) 801, which can perform various appropriate actions and processes based on a program stored in read-only memory (ROM) 802 or a program loaded from memory 808 into random access memory (RAM) 803. The RAM 803 also stores various programs and data required for the operation of the electronic device. The processor 801, ROM 802, and RAM 803 are interconnected via a bus 804. An input / output (I / O) interface 805 is also connected to the bus 804.
[0089] Typically, the following devices can be connected to I / O interface 805: input devices 806 including, for example, touchscreens, touchpads, keyboards, mice, cameras, microphones, accelerometers, gyroscopes, etc.; output devices 807 including, for example, liquid crystal displays (LCDs), speakers, vibrators, etc.; memory devices 808 including, for example, magnetic tapes, hard disks, etc.; and communication devices 809. Communication device 809 allows electronic devices to communicate wirelessly or wiredly with other devices to exchange data. Although Figure 8 Electronic devices with various devices are shown, but it should be understood that it is not required to implement or have all of the devices shown, and more or fewer devices may be implemented or have instead.
[0090] In particular, according to embodiments of the present invention, the processes described above with reference to the flowcharts can be implemented as computer software programs. For example, embodiments of the present invention include a computer program product comprising a computer program carried on a non-transitory computer-readable medium, the computer program containing program code for performing the methods shown in the flowcharts. In such embodiments, the computer program can be downloaded and installed from a network via a communication device 809, or installed from a memory 808, or installed from a ROM 802. When the computer program is executed by the processor 801, it performs the functions defined in the instruction decoding method of the embodiments of the present invention.
[0091] Figure 8 The electronic device shown is merely an example and should not be construed as limiting the functionality and scope of use of the embodiments of the present invention.
[0092] This invention also provides a computer-readable storage medium. The methods described above according to embodiments of the invention can be implemented in hardware or firmware, or implemented as computer code that can be recorded on a storage medium, or implemented as computer code downloaded via a network and originally stored on a remote storage medium or a non-transitory machine-readable storage medium and then stored on a local storage medium. Thus, the methods described herein can be processed by software stored on a storage medium using a general-purpose computer, a dedicated processor, or programmable or dedicated hardware. The storage medium can be a magnetic disk, optical disk, read-only memory, random access memory, flash memory, hard disk, or solid-state drive, etc.; further, the storage medium can also include combinations of the above types of memory. It is understood that computers, processors, microprocessor controllers, or programmable hardware include storage components capable of storing or receiving software or computer code. When the software or computer code is accessed and executed by the computer, processor, or hardware, the instruction decoding method shown in the above embodiments is implemented.
[0093] A portion of this invention can be applied as a computer program product, such as computer program instructions, which, when executed by a computer, can invoke or provide the methods and / or technical solutions according to the invention through the operation of the computer. Those skilled in the art will understand that the forms in which computer program instructions exist in a computer-readable medium include, but are not limited to, source files, executable files, installation package files, etc. Correspondingly, the ways in which computer program instructions are executed by a computer include, but are not limited to: the computer directly executing the instructions, or the computer compiling the instructions and then executing the corresponding compiled program, or the computer reading and executing the instructions, or the computer reading and installing the instructions and then executing the corresponding installed program. Here, the computer-readable medium can be any available computer-readable storage medium or communication medium accessible to a computer.
[0094] Although embodiments of the invention have been described in conjunction with the accompanying drawings, those skilled in the art can make various modifications and variations without departing from the spirit and scope of the invention, and such modifications and variations all fall within the scope defined by the appended claims.
Claims
1. An instruction decoder, characterized in that, The instruction decoder includes: a decoding module, a dependency scoreboard module, and a dispatch module; The decoding module is used to determine the instruction to be fused from the received instructions, fuse the instruction to be fused into an instruction to be decoded, and parse the instruction to be decoded to obtain control signals and operations to be executed; The dependency scoring board module is used to determine the related instructions of the instruction to be decoded based on the data dependency relationship between instructions, and to determine the execution order of the instruction to be decoded and the related instructions; The dispatch module is used to dispatch the control signal and the operation to be executed to the instruction execution module according to the execution order.
2. The instruction decoder according to claim 1, characterized in that, The decoding module includes: an instruction information acquisition unit, an instruction fusion unit, and an instruction decomposition unit; The instruction information acquisition unit is used to acquire the opcode, register dependency, and semantic association of the instruction, and to determine the instruction to be fused based on the opcode, register dependency, and semantic association. The instruction fusion unit is used to fuse the instruction to be fused into the instruction to be decoded according to a preset fusion instruction format; The instruction decomposition unit is used to decompose macro instructions in an instruction into multiple hardware operations.
3. The instruction decoder according to claim 1, characterized in that, The dependency scoring board module includes: a dependency identification unit and a state management unit; The dependency identification unit is used to perform dependency tracking based on register dependency table or memory dependency table to determine the data dependency relationship between instructions; The state management unit is used to mark the register and memory address corresponding to the decoding instruction when it receives the decoding instruction from the instruction decoder; The state management unit is also used to clear the dependency lock in the scoreboard, release the register and the memory address, and notify the execution module to execute the blocked instruction after the execution module completes the register access operation.
4. The instruction decoder according to claim 1, characterized in that, The instruction decoder also includes: a logic register and a decoding bypass; The logic register is used to store instruction operands, calculation results, and program status; The decoding bypass is used to obtain the data to be forwarded that has been calculated but not written back, and to forward the data to be forwarded to subsequent instructions; The decoding bypass is also used to transmit the execution result of the execution module in response to the control signal to the relevant instructions.
5. The instruction decoder according to claim 1, characterized in that, The decoding module further includes: an operation queue and a preset number of decoding slots; The operation queue is used to cache the sub-instructions after the decoded instruction to be decoded and to schedule the sub-instructions; The decoding slot includes an independent opcode decoding logic circuit, an operand read port, and a control signal generation circuit.
6. The instruction decoder according to claim 1, characterized in that, The instruction decoder also includes: a data protection module; The data protection module is used to determine a data protection area in memory, and to determine the access permissions, access priority, and data bits to be locked for the data protection area. The data protection module is further configured to check the access permissions and access priority of the device when the device accesses the data protection area, and if the check passes, allow the device to access the data protection area and lock the data bit to be locked.
7. The instruction decoder according to claim 1, characterized in that, The dispatch module includes: a renaming unit and a register allocation unit; The renaming unit is used to rename the register to obtain a new register name, and to update the register state when the control signal is dispatched to the instruction execution module. The new register name is used to eliminate the preset dependency relationship between instructions. The renaming unit is also used to determine the mapping relationship between logical registers and physical registers; The register allocation unit is used to allocate a new physical register to the target logical register when parsing the instruction to be decoded, wherein the new physical register is used to process subsequent instructions if the preceding instructions have not been written back.
8. A method for decoding instructions, characterized in that, The method includes: The instructions to be fused are identified from the received instructions, the instructions to be fused are fused into instructions to be decoded, and the instructions to be decoded are parsed to obtain control signals and operations to be executed; Based on the data dependencies between instructions, determine the related instructions of the instruction to be decoded, and determine the execution order of the instruction to be decoded and the related instructions; According to the execution order, the control signal and the operation to be executed are dispatched to the instruction execution module.
9. An electronic device, characterized in that, include: The system includes a memory and a processor, which are communicatively connected to each other. The memory stores computer instructions, and the processor executes the computer instructions to perform the instruction decoding method of claim 8.
10. A computer-readable storage medium, characterized in that, The computer-readable storage medium stores computer instructions that are used to cause the computer to execute the instruction decoding method of claim 8.