integrated circuit
By detecting the instruction sequence in the extract stage of the processor pipeline and using the immediate jump handler circuit to determine the target address, the power waste and misprediction problems of the indirect jump target predictor circuit are solved, thereby achieving reduced power consumption and improved performance.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- SIFIVE INC
- Filing Date
- 2021-03-29
- Publication Date
- 2026-06-19
AI Technical Summary
The indirect jump target predictor circuit in the existing processor pipeline suffers from power waste and target address misprediction, resulting in performance penalties and predictor state pollution.
By detecting the instruction sequence in the extract stage of the processor pipeline, the target address is determined using the immediate jump handler circuit, and the indirect jump target predictor circuit is disabled when necessary to avoid generating target address predictions.
It reduces the power consumption of the processor core, reduces predictor state pollution in the indirect jump target predictor circuit, and improves the performance of the processor core.
Smart Images

Figure CN122240179A_ABST
Abstract
Description
[0001] This application is a divisional application, which is directed against the parent application, Chinese Patent Application No. 202180023802.2, filed on March 29, 2021, by SFAFE GmbH, entitled "Extraction-level processing of indirect jumps in processor pipeline". Technical Field
[0002] This invention relates to the extraction-level processing of indirect jumps in a processor pipeline. Background Technology
[0003] To improve performance, pipelined processors may include an indirect jump target predictor that generates a prediction of the target address of an indirect jump instruction. This prediction may depend on data that will not become available until the indirect jump instruction reaches the next stage of the processor pipeline. The target address prediction is used to fetch the upcoming instruction while waiting for the indirect jump instruction to pass through the pipeline and be deprecated. Incorrect target address prediction can cause problems, including performance penalties and pollution of the indirect jump target predictor's state. Attached Figure Description
[0004] When read in conjunction with the accompanying drawings, this disclosure is best understood from the following detailed description. It should be emphasized that, by convention, the various features in the drawings are not to scale. Rather, for clarity, the dimensions of the various features have been arbitrarily enlarged or reduced.
[0005] Figure 1 This is a block diagram of an example of an integrated circuit used to execute instructions using fetch-level processing of indirect jumps in the processor pipeline.
[0006] Figure 2 This is a block diagram of an example of a processor pipeline used for extracting instructions using indirect jumps.
[0007] Figure 3 It is a memory mapping of an example instruction sequence, the instruction sequence comprising a first instruction having an immediate digital field depending on the result of a program counter value, followed by a second instruction as an indirect jump instruction.
[0008] Figure 4 This is a flowchart illustrating an example of the extraction-level processing procedure used for indirect jumps.
[0009] Figure 5 This is a flowchart illustrating an example of the process for determining the target address of an indirect jump instruction, which depends on the program counter and one or more immediate values of the instruction sequence.
[0010] Figure 6This is a flowchart illustrating an example of a process for selectively disabling the indirect jump target predictor circuitry without indirect jumps. Detailed Implementation
[0011] Overview
[0012] This paper describes a system and method for fetch-level handling of indirect jumps in a processor pipeline. In some processor architectures, a sequence of instructions including an indirect jump instruction can be used to specify a target address in a large virtual address space. The earlier instruction in the sequence can add an immediate value to the program counter value. The result can then be added to a second immediate value included in the indirect jump instruction and shifted to allow a large jump relative to the program counter value. This sequence of instructions forms an immediate jump, and the target address for that immediate jump can be determined based on the immediate value and the program counter value, which is information that will be available in the fetch stage of the processor pipeline. For example, in the RISC-V instruction set, a sequence of instructions including the AUIPC instruction and the subsequent JALR instruction forms an immediate jump. However, the indirect jump target predictor circuitry can generate a target address prediction for the indirect jump instruction in the instruction sequence. This can waste power in the indirect jump target predictor circuitry and may also lead to accidental misprediction of the target address, resulting in performance penalties and / or contamination of the predictor state of the indirect jump target predictor circuitry.
[0013] Some implementations address or mitigate these problems by adding circuitry to the processor core to detect these instruction sequences that form immediate jumps and to determine the target address of the indirect jump in the fetch stage of the processor pipeline. For example, the determined target address can be inserted into a fetch target queue and used in place of target address prediction from the indirect jump target predictor circuitry of the processor core. In some implementations, the indirect jump target predictor circuitry can be disabled while the indirect jump instruction sequence is being fetched to prevent it from wasting power generating target address predictions for the indirect jump instruction. For example, the instruction sequence can be detected in an earlier stage of a pipeline with multiple fetch stages, such as when a cache line S of instructions is loaded into the L1 instruction cache. This early detection can allow the generation of an immediate jump hint that is available early enough to control the enable input and disable the indirect jump target predictor circuitry when the instruction sequence is read from the L1 instruction cache.
[0014] Another technique for reducing power consumption in the indirect jump target predictor circuitry is to detect the presence or absence of indirect jump instructions in the cache line as it is loaded into the L1 instruction cache, generating an indirect jump hint that can be used in the fetch stage later in the processor pipeline to enable or disable the indirect jump target predictor circuitry. That is, if no indirect jump instruction is detected in the cache line, the indirect jump target predictor circuitry is disabled when the instruction is read from the cache line. This technique can be combined with immediate jump handling described above and elsewhere in this document. For example, if the indirect jump hint indicates the absence of an indirect jump instruction or the immediate jump hint indicates the presence of an instruction sequence that forms an immediate jump, the enable input of the indirect jump target predictor circuitry can be set to an inactive level.
[0015] In some implementations, techniques for extracting indirect jumps in a processor pipeline can be used to achieve one or more advantages over conventional processors. For example, the structures and techniques described herein can reduce power consumption in the processor core, reduce contamination of the predictor state in the indirect jump target predictor circuit, and / or improve the performance of the processor core.
[0016] As used herein, the term "circuit" refers to an electronic component (e.g., a transistor, resistor, capacitor, and / or inductor) configured to perform one or more functions. For example, a circuit may include one or more transistors interconnected to form logic gates that collectively perform a logic function.
[0017] detail
[0018] Figure 1This is a block diagram of an example integrated circuit 110 for executing instructions using fetch-level processing of indirect jumps in a processor pipeline. Integrated circuit 110 includes a processor core 120. Processor core 120 includes a processor pipeline 130 that includes indirect jump target predictor circuitry 132 configured to generate a prediction of the target address for a fetched indirect jump instruction. Processor core 120 includes one or more register files 140, which includes a program counter 142. Processor core 120 includes an L1 instruction cache 150 and an L1 data cache 152. Integrated circuit 110 includes an external memory system 160, which may include memory for storing instructions and data and / or provide access to memory 162 external to the integrated circuit for storing instructions and / or data. Processor core 120 includes an immediate jump handler circuit 170, which can be configured to detect instruction sequences including indirect jump instructions having a target address that can be determined based on information available in the fetch stage of pipeline 130, and to determine the target address in the fetch stage of the pipeline instead of using target address prediction. Processor core 120 includes an indirect jump detector circuit 180, which can be configured to check for indirect jump instructions in cache lines when indirect jump instructions are loaded into L1 instruction cache 150, and to disable the indirect jump target predictor circuit 132 to save power when reading cache lines without indirect jumps from L1 instruction cache 150. Integrated circuit 110 can provide advantages over conventional processor architectures, such as, for example, avoiding misprediction of target addresses and the resulting pollution and performance degradation of the indirect jump predictor, and / or power savings. For example, integrated circuit 110 can implement... Figure 4 The process 400. For example, integrated circuit 110 can implement... Figure 6 The process is 600.
[0019] Integrated circuit 110 includes a processor core 120, which includes a processor pipeline 130 configured to execute instructions. Pipeline 130 includes one or more fetch stages configured to retrieve instructions from the memory system of integrated circuit 110. For example, pipeline 130 may fetch instructions via L1 instruction cache 150. For example, pipeline 130 may include... Figure 2 The processor pipeline 200. Pipeline 130 may include additional stages such as a decode stage, rename stage, dispatch stage, release stage, execution stage, memory access stage, and write-back stage. For example, processor core 120 may include pipeline 130 configured to execute instructions of the RISC V instruction set.
[0020] Integrated circuit 110 includes an indirect jump target predictor circuit 132 in the extraction stage of pipeline 130, configured to generate a prediction of the target address of the extracted indirect jump instruction. For example, the indirect jump target predictor circuit 132 may be... Figure 2 The indirect jump target predictor circuit 220. For example, the indirect jump target predictor circuit 132 can output the prediction to the target extraction queue.
[0021] The indirect jump target predictor circuit 132 is a structure used to predict the target of an indirect jump instruction (e.g., a RISC-V JALR instruction). For example, the indirect jump target predictor circuit 132 can be an ITTAGE-type predictor, which is designed to resemble a branch direction predictor (BDP). However, instead of predicting the branch direction, the indirect jump target predictor circuit 132 provides the target address. For example, the indirect jump target predictor circuit 132 can be SRAM-based and, for greater area efficiency, can be designed to use a single-port memory. In some implementations, there is no structural hazard between prediction and updating on the indirect jump target predictor circuit 132.
[0022] Integrated circuit 110 includes one or more register files 140, which include a program counter 142 for processor core 120. For example, program counter 142 may be stored in a register. For example, program counter 142 may be stored using a program counter map, which is used to track the program counter of instructions in a reordering buffer window.
[0023] Integrated circuit 110 includes an L1 instruction cache 150 for processor core 120. The L1 instruction cache 150 may be a set-associative cache for instruction memory. To avoid long latency for serial read tag arrays and data arrays, and high power consumption for parallel read arrays, a path predictor can be used. This can be implemented in the early fetch stage (e.g., Figure 2 The path predictor is accessed in the F1 stage 204 of the processor pipeline 200, and the hit path can be encoded into the read index of the data array. The tag array can be accessed in a later extraction stage (e.g., Figure 2 It is accessed in the F2 stage 206 of the processor pipeline 200 and is used only for verifying the path predictor.
[0024] Integrated circuit 110 includes an L1 data cache 152 for processor core 120. For example, the L1 data cache 152 may be a set-associative VIT cache, meaning it is purely indexed by virtual address bits VA[set] and fully tagged with all translated physical address bits PA[msb:12]. For low power consumption, the tags and data array can be serially looked up, allowing access to at most a single data SRAM path. For example, the row size of the L1 data cache 152 may be 64 bytes, and the tick size may be 16 bytes.
[0025] Integrated circuit 110 includes an external memory system 160, which may include memory for storing instructions and data and / or provide access to memory 162 outside the integrated circuit for storing instructions and / or data. For example, external memory system 160 may include an L2 cache, which may be configured to implement a cache coherence protocol / policy to maintain cache coherence across multiple L1 caches. Although in Figure 1 Although not shown in the diagram, in some embodiments, integrated circuit 110 may include multiple processor cores. For example, external memory system 160 may include multiple layers.
[0026] Integrated circuit 110 includes immediate jump processor circuitry 170. Immediate jump processor circuitry 170 can be configured to detect a sequence of instructions fetched by processor core 120, wherein the sequence of instructions includes a first instruction having a result of an immediate digital field and a program counter value depending on a first instruction, followed by a second instruction as an indirect jump instruction. In some embodiments, processor core 120 is configured to execute instructions of the RISC V instruction set, and the first instruction is an AUIPC instruction and the second instruction is a JALR instruction. Immediate jump processor circuitry 170 can be configured to prevent indirect jump target predictor circuitry from generating a target address prediction for the second instruction in response to detecting the instruction sequence. Immediate jump processor circuitry 170 can be configured to determine the target address of the second instruction before issuing the first instruction to the execution stage of the pipeline in response to detecting the instruction sequence. Immediate jump processor circuitry 170 can be configured to write the target address to a fetch target queue configured to receive predictions from indirect jump target predictor circuitry 132. For example, the target address of the second instruction can be determined before the first instruction reaches the decoding stage of processor pipeline 130. For example, the immediate jump processor circuit 170 may include Figure 2 The immediate jump scanning circuit 230 and the immediate jump determination circuit 232.
[0027] For example, the immediate jump processor circuit 170 can detect the instruction sequence before it enters an fetch stage that includes the indirect jump target predictor circuit 132. In some embodiments, the processor pipeline 130 includes multiple fetch stages, and the immediate jump processor circuit 170 detects the instruction sequence as it passes through an earlier fetch stage in the processor pipeline 130 than the fetch stage that includes the indirect jump target predictor circuit 132. The immediate jump processor circuit 170 can be configured to disable the indirect jump target predictor circuit 132 in response to detecting the instruction sequence. For example, the immediate jump processor circuit 170 can be configured to update a status bit in an instruction cache tag such that the indirect jump target predictor circuit 132 is disabled when a second instruction enters the fetch stage of the pipeline that includes the indirect jump target predictor circuit 132. For example, the immediate jump processor circuit 170 can be configured to update the status bits in the instruction cache path predictor, which disables the indirect jump target predictor circuit 132 when a second instruction enters the fetch stage of the pipeline that includes the indirect jump target predictor circuit 132.
[0028] For example, the immediate jump processor circuit 170 can be configured to detect instruction sequences by scanning values stored in cache lines of the L1 instruction cache 150. In some embodiments, the immediate jump processor circuit 170 is configured to detect instruction sequences by scanning values appearing on the memory bus when instructions are input to the L1 instruction cache 150 via the memory bus.
[0029] In some implementations, the immediate jump processor circuit 170 is configured to: detect an instruction sequence fetched by the processor core 120, wherein the instruction sequence includes an AUIPC instruction followed by a JALR instruction; in response to detecting the instruction sequence, disable the indirect jump target predictor circuit 132 to prevent the indirect jump target predictor circuit 132 from generating a target address prediction for the JALR instruction; in response to detecting the instruction sequence, determine the target address of the JALR instruction before the AUIPC instruction is issued to the execution stage of the pipeline 130; and write the target address to the fetch target queue in the entry corresponding to the JALR instruction.
[0030] Integrated circuit 110 includes an indirect jump detector circuit 180 configured to: check for cache lines containing indirect jump instructions by scanning values appearing on the memory bus when a cache line is fed into the instruction cache via the memory bus; based on this check, update a hint bit associated with the cache line to indicate that no indirect jump instruction is present in the cache line; and, based on the hint bit, disable an indirect jump target predictor circuit 132 to prevent it from generating a target address prediction when the instruction from the cache line enters a pipeline stage including the indirect jump target predictor circuit 132. For example, the indirect jump instruction may be a JALR instruction of the RISC V instruction set. For example, the hint bit may be stored in an instruction cache path predictor (e.g., in L1 instruction cache 150). For example, the hint bit may be stored in an instruction cache tag (e.g., in L1 instruction cache 150). The indirect jump detector circuit 180 can be used to save power by disabling the indirect jump target predictor circuit 132 when no indirect jump instruction is fetched. For example, the indirect jump detector circuit 180 can be configured to implement Figure 6 The process is 600.
[0031] Figure 2 This is a block diagram of an example of a processor pipeline 200 used for executing instructions using fetch-stage processing with indirect jumps. The processor pipeline 200 includes multiple fetch stages: F0 stage 202, F1 stage 204, F2 stage 206, and F3 stage 208. The processor pipeline 200 includes a decode stage 210 following fetch stages 202 through 208. Although in Figure 2 Although not shown, the processor pipeline 200 may include additional stages such as renaming stage, dispatch stage, release stage, execution stage, memory access stage, and write-back stage.
[0032] Processor pipeline 200 includes an indirect jump target predictor circuit 220 in stage F3 208 of pipeline 200, which is configured to generate a prediction of the target address for a fetched indirect jump instruction. Processor pipeline 200 includes a fetch target queue 222 for storing target address predictions from the indirect jump target predictor circuit 220 for use by subsequent stages of pipeline 200. The indirect jump target predictor circuit 220 is a structure for predicting the target of an indirect jump instruction (e.g., a RISC-V JALR instruction). The encoding of the source register and destination register fields of the indirect jump instruction can provide hints about whether the indirect jump instruction is used as a function call or return. In some implementations, the indirect jump target predictor circuit 220 does not predict the target of a function return, but instead uses the return address stack (RAS). For example, the indirect jump target predictor circuit 220 can be an ITTAGE-style predictor, which is designed to resemble a branch direction predictor (BDP). However, instead of predicting the branch direction, the indirect jump target predictor circuit 220 provides the target address. For example, the indirect jump target predictor circuit 220 can be SRAM-based and, for greater area efficiency, can be designed to use a single-port memory. In some implementations, there is no structural hazard between prediction and update on the indirect jump target predictor circuit 220.
[0033] As an area optimization, it can be observed that the indirect jump target predictor circuit 220 may only need to reference a small range of memory within a given time window. The indirect jump target predictor circuit 220 can use layers of indirection to compress the storage of the target virtual address bits. In some implementations, each entry in the indirect jump target predictor circuit 220 may therefore only retain a certain number of low-order bits and a reference to a table containing the high-order bits. This table is referred to as the high array.
[0034] For example, the indirect jump target predictor circuit 220 can maintain a table with corresponding entries, each entry including: an index to the high array that stores the target bit; the low bit of the target program counter (PC); and a tag, which may be a hash tag. Each entry in the indirect jump target predictor circuit 220 can also have a counter (e.g., 1-bit or 2-bit) used to indicate the usefulness of each entry and influence the replacement strategy. These counter bits are stored in a flip-flop array.
[0035] To avoid storing the parsing target of each indirect jump instruction (e.g., JALR) in the branch parsing queue, the indirect jump target predictor circuit 220 can be updated directly after the branch unit parses the jump, rather than at retirement. When the indirect jump target predictor circuit 220 issues a jump to the branch unit, the branch parsing queue index is sent back to the branch parsing queue, and the prediction information of the indirect jump target predictor circuit 220 (e.g., counter bits and provider table index) is read from the branch parsing queue. When the indirect jump instruction is in the write-back stage, an update request can be sent to the indirect jump target predictor circuit 220. For example, the update pipeline can thus be as follows: at the issue stage, the branch unit sends the branch parsing queue index back to the branch parsing queue; at the register read stage, the prediction information of the indirect jump target predictor circuit 220 is read from the branch parsing queue; at the execution stage, the indirect jump target predictor circuit 220 update request is constructed and flipped to write-back; and at the write-back stage, the update request, along with a misprediction indication, is sent to the indirect jump target predictor circuit 220. The indirect jump target predictor circuit 220 can use the target bit to recalculate the table index and tag as well as the CAM high array.
[0036] If the indirect jump target predictor circuit 220 receives an update for a correctly predicted jump, it can set the counter bits for the provider entry. If the target is mispredicted, the indirect jump target predictor circuit 220 can update the provider entry if the counter bit is zero, or decrement the counter bit if the counter bit is not zero. The indirect jump target predictor circuit 220 can also attempt to assign to a table higher than the provider table. For example, starting from the next highest index table, the counter bits can be scanned. If the table has a counter of zero, the indirect jump target predictor circuit 220 can assign to that table. If all counter bits are set, failed assignments can be signaled. A saturation counter can be incremented on failed assignments and decremented on successful assignments. Counter saturation indicates a failure to install a new entry into the indirect jump target predictor circuit 220 due to a long-lived entry. If saturation occurs, the counter bit array of all entries in the indirect jump target predictor circuit 220 can be flashed to allow new useful entries to be installed. In some implementations, each entry in the indirect jump target predictor circuit 220 stores only a portion of the target address. When assigned to the indirect jump target predictor circuit 220, the high array can be a CAM with the high bits of the resolved target. If a matching entry is found, the index of that entry can be written to the hiIdx field of the entry in the jump target predictor circuit 220. If no matching entry is found, an entry in the high array is assigned according to a pseudo-LRU replacement strategy, and the index is written to the hiIdx field.
[0037] Processor pipeline 200 includes immediate jump processor circuitry, including immediate jump scan circuitry 230 and immediate jump determination circuitry 232. Immediate jump scan circuitry 230 can be configured to detect a sequence of instructions that form an indirect jump with a target address that can be determined based on information available in the fetch stage. The instruction sequence includes a first instruction whose result depends on the immediate digital field and program counter value of the first instruction, followed by a second instruction as an indirect jump instruction. For example, in a RISC-V processor core, the instruction sequence may include an AUIPC instruction followed by a JALR instruction. Immediate jump scan circuitry 230 is configured to detect the instruction sequence by scanning values appearing on the memory bus from memory bus interface 240 when instructions are fed into L1 instruction cache 250 via the memory bus. After detecting the instruction sequence, immediate jump scan circuitry 230 can update a status bit in instruction cache path predictor 252 to indicate that the cache line associated with the status bit includes the instruction sequence. When the second instruction enters stage F3 208 of pipeline 200, which includes indirect jump target predictor circuit 220, updating the status bit can disable the indirect jump target predictor circuit.
[0038] When the cache line is subsequently read from the L1 instruction cache 250 in F2 stage 206, the value of the status bit can be passed via the pipeline register as an immediate jump hint so that the input of the indirect jump target predictor circuit 220 will be available in time at F3 stage 208. This saves power by preventing the indirect jump target predictor circuit 220 from running to generate target address predictions for indirect jump instructions for the instruction sequence. Therefore, the immediate jump scan circuit 230 detects the instruction sequence before it enters F3 stage 208, which includes the indirect jump target predictor circuit 220. The immediate jump scan circuit 230 is configured to disable the indirect jump predictor by passing the immediate jump hint stored in the status bit of the instruction cache path predictor 252 when the corresponding cache line is read from the L1 instruction cache 250 in response to the detection of the instruction sequence, for use in the enable input of the indirect jump target predictor circuit 220.
[0039] After a cache line is read from the L1 instruction cache 250, the cache line can be rotated in the F3 level 208 to access the relevant instructions. These relevant instructions can be fed into the instruction queue 260, which holds instructions for decoding, and also into the immediate jump determination circuit 232. The immediate jump determination circuit 232 is configured to detect the instruction sequence and determine the target address of the indirect jump instruction in the instruction sequence based on the immediate value of the instruction sequence and the program counter value. The processor pipeline 200 includes a multiplexer 270 for selecting the target address determined by the immediate jump determination circuit 232 and writing the target address into the fetch target queue 222, in lieu of the target address prediction for the indirect jump instruction in the instruction sequence from the indirect jump target predictor circuit 220.
[0040] Figure 3 This is an example memory mapping of instruction sequence 300, which includes a first instruction 310 having a result depending on an immediate number field and a program counter value, followed by a second instruction 320 as an indirect jump instruction. The first instruction 310 includes an opcode 312, a destination register field 314 identifying the architecture register to be used to store the result of the first instruction 310, and an immediate value 316 to be combined (e.g., added) with the program counter value to determine the result of the first instruction. The second instruction 320 includes an opcode 322, a source register field 324 identifying the architecture register to be accessed, and an immediate value 326 to be combined (added) with the value stored in the source register 324 to determine the target address of the second instruction. For example, in a RISC-V processor core, the first instruction could be an AUIPC instruction, and the second instruction could be a JALR instruction.
[0041] In some embodiments, the first instruction 310 is adjacent to the second instruction 320 in memory, and therefore the second instruction 320 immediately follows the first instruction 310. In some embodiments, one or more additional intermediary instructions may be stored in a memory location between the first instruction 310 and the second instruction 320, and therefore the second instruction 320 follows the first instruction 310, but not immediately after it. The instruction sequence 300 can still function as an immediate jump if one or more intermediary instructions are not written to the destination register 314 before being accessed as the source register 324, and the target address can be determined for the immediate jump during the fetch stage of the processor pipeline (e.g., processor pipeline 130).
[0042] Figure 4This is a flowchart illustrating an example of a fetch-level processing procedure 400 for indirect jumps. Procedure 400 includes detecting a sequence of instructions fetched by the processor core at 450, the sequence comprising a first instruction whose result depends on the immediate digital field and program counter value of the first instruction, followed by a second instruction as an indirect jump instruction; preventing an indirect jump target predictor circuit at 420 from generating a target address prediction for the second instruction in response to detecting the instruction sequence; determining a target address for the second instruction before issuing the first instruction in response to detecting the instruction sequence; and writing the target address to a fetch target queue at 440. Procedure 400 can provide advantages over conventional techniques, such as, for example, avoiding misprediction of the target address and pollution and performance degradation caused by the indirect jump predictor, and / or power savings. For example, procedure 400 can use... Figure 1 This can be implemented using integrated circuit 110. For example, it can be implemented using... Figure 2 The processor pipeline 200 implements process 400.
[0043] Process 400 includes detecting 410 a sequence of instructions fetched by a processor core (e.g., processor core 120). The instruction sequence includes a first instruction whose outcome depends on the immediate digital field and program counter value of the first instruction, followed by a second instruction as an indirect jump instruction. For example, the processor core may be configured to execute instructions of the RISC V instruction set, where the first instruction is an AUIPC instruction and the second instruction is a JALR instruction. In some embodiments, detecting the instruction sequence fetched by the processor core 410 includes detecting the instruction sequence by scanning values appearing on the memory bus as the instruction is fed into the instruction cache (e.g., L1 instruction cache 250) via the memory bus. In some embodiments, detecting the instruction sequence fetched by the processor core 410 includes detecting the instruction sequence by scanning values stored in cache lines of the instruction cache. For example, the instruction sequence 410 may be detected before entering a fetch stage (e.g., F3 stage 208 of processor pipeline 200) that includes indirect jump target predictor circuitry (e.g., indirect jump target predictor circuitry 220). In some implementations, the pipeline includes multiple fetch stages, and the instruction sequence is detected in an earlier fetch stage (e.g., F0 stage 202 of the processor pipeline 200) that precedes the fetch stage that includes indirect jump target predictor circuitry (e.g., F3 stage 208 of the processor pipeline 200).
[0044] Process 400 includes preventing the indirect jump target predictor circuitry 420 (e.g., indirect jump target predictor circuitry 132) from generating a target address prediction for the second instruction in response to the detection of the instruction sequence 410. For example, preventing the indirect jump target predictor circuitry 420 from generating a target address prediction for the second instruction may include disabling the indirect jump predictor in response to the detection of the instruction sequence 410. In some embodiments, preventing the indirect jump target predictor circuitry 420 from generating a target address prediction for the second instruction includes updating a status bit in the instruction cache tag, such that the indirect jump target predictor circuitry is disabled when the second instruction enters a stage of the pipeline that includes the indirect jump target predictor circuitry (e.g., F3 stage 208 of processor pipeline 200). In some embodiments, preventing the indirect jump target predictor circuitry 420 from generating a target address prediction for the second instruction includes updating a status bit in the instruction cache path predictor (e.g., instruction cache path predictor 252), such that the indirect jump target predictor circuitry is disabled when the second instruction enters a stage of the pipeline that includes the indirect jump target predictor circuitry.
[0045] Process 400 includes determining a target address (430) for a second instruction in response to detecting an instruction sequence (410) before issuing a first instruction to an execution stage of the pipeline of the processor core. For example, the target address (430) for the second instruction can be determined before the first instruction reaches a decoding stage of the pipeline (e.g., processor pipeline 130).
[0046] Process 400 includes writing the target address to a target retrieval queue 440 (e.g., target retrieval queue 222), which is configured to receive predictions from an indirect jump target predictor circuit. For example, a multiplexer (e.g., multiplexer 270) may be used to select the target address 430 to be determined from the indirect jump target predictor circuit instead of the target address prediction.
[0047] although Figure 4 Not shown in the diagram, but process 400 can be used with... Figure 6 The processes 600 are used in combination to further reduce power consumption in the indirect jump target predictor circuitry. For example, process 400 may further include: when a cache line is input to the instruction cache via the memory bus, checking 610 for the cache line used for indirect jump instructions by scanning the values appearing on the memory bus; based on the check, updating 630 a hint bit associated with the cache line to indicate that no indirect jump instruction exists in the cache line; and based on the hint bit, disabling 660 the indirect jump target predictor circuitry to prevent it from generating a target address prediction when the instruction of the cache line enters the pipeline stage including the indirect jump target predictor circuitry.
[0048] Figure 5 This is a flowchart of an example of a process 500 for determining the target address of an indirect jump instruction, which depends on a program counter and one or more immediate values of an instruction sequence. Process 500 includes: left shifting 510 the immediate value of a first instruction (e.g., first instruction 310) (e.g., immediate value 316); adding the shifted immediate value of the first instruction to the immediate value of a second instruction (e.g., second instruction 320) (e.g., immediate value 326) 520; and adding the sum of the immediate values to the program counter value 530 to obtain the target address. For example, the first instruction may be a RISC-V AUIPC instruction, and the second instruction may be a RISC-V JALR instruction. For example, the immediate value of the first instruction may be left-shifted 510 by a number of bits equal to the size of the immediate value of the second instruction. In some implementations, the number of bits of the immediate value of the first instruction and the number of bits of the immediate value of the second instruction together are equal to the number of bits of the architecture registers of the processor core implementing process 500. Process 500 may be implemented by logic circuitry accessing the fetch level of the first and second instructions, such as if they were stored in a buffer. The steps of process 500 can be performed in various orders or simultaneously. For example, the shifted unsigned immediate value of the first instruction can be added to the program counter value to obtain the target address before the immediate value of the second instruction is added to the result. In some implementations ( Figure 5 (Not shown in the diagram), the immediate value of the second instruction, instead of the immediate value of the first instruction, is left-shifted before being added. For example, procedure 500 can use... Figure 1 This can be implemented using integrated circuit 110. For example, it can be implemented using... Figure 2 The processor pipeline 200 implements process 500.
[0049] Figure 6This is a flowchart illustrating an example of a process 600 for selectively disabling an indirect jump target predictor circuit in the absence of an indirect jump. Process 600 includes: checking the cache line 610 for an indirect jump instruction when a cache line is input to the instruction cache via the memory bus; if the check detects an indirect jump instruction in the cache line, updating a hint bit associated with the cache line 620 to indicate the presence of an indirect jump instruction in the cache line based on the check; if the check does not detect an indirect jump instruction in the cache line, updating a hint bit associated with the cache line 630 to indicate the absence of an indirect jump instruction in the cache line based on the check; and at a later time, reading the cache line from the cache into the processor 640. In the fetch stage of the pipeline; if the hint bit indicates that an indirect jump instruction exists in the cache line, then based on the hint bit, the 650 indirect jump target predictor circuit is enabled to allow the indirect jump target predictor circuit to generate a target address prediction when the instruction from the cache line enters the pipeline stage including the indirect jump target predictor circuit; and if the hint bit indicates that no indirect jump instruction exists in the cache line, then based on the hint bit, the 660 indirect jump target predictor circuit is disabled to prevent the indirect jump target predictor circuit from generating a target address prediction when the instruction from the cache line enters the pipeline stage including the indirect jump target predictor circuit. For example, process 600 can use Figure 1 It is implemented using integrated circuit 110.
[0050] Process 600 includes checking the cache line 610 for indirect jump instructions by scanning the values appearing on the memory bus when a cache line is loaded into the instruction cache (e.g., L1 instruction cache 150) via the memory bus. For example, the indirect jump instruction is the JALR instruction of the RISC V instruction set. In some cases, the indirect jump instruction appears entirely within a single cache line, and the complete indirect jump instruction included in the cache line 610 is detected when it is loaded into the cache. For example, the JALR instruction can be identified by detecting the opcode within the lower 16 bits of the instruction. In some cases, the indirect jump instruction may span cache line boundaries. For example, the lower portion of the instruction may be in a first cache line, and the higher portion may be in a second cache line. The order in which these two cache lines are received in the cache may not be guaranteed, which could further complicate the check for the presence of an indirect jump instruction in the cache line 610. Special logic can be used to try to check whether the 610 indirect jump instruction (e.g., JALR) ends in a cache line that is loaded into the cache.
[0051] For example, when supporting the C extension of the RISC-V instruction set, a 32-bit JALR instruction may span cache lines. As a power optimization, the path predictor can store a hint bit indicating that a JALR instruction may end in that cache line. During fetching, the indirect jump target predictor circuitry (e.g., indirect jump target predictor circuitry 132) can be accessed only if the hint bit is set to indicate the presence of a JALR instruction being fetched in the cache line. To generate this hint bit, the cache's (e.g., L1 instruction cache 150) miss queue can have some additional logic to scan incoming fill data and detect when a JALR instruction may end in that cache line. For example, the parentValid, parentFilled, parent (e.g., a pointer to a miss queue entry for the parent cache line), and jalrCross entry fields can be used for this purpose. A common scenario is that the fetch unit generates a cache miss, followed by several sequential prefetches. When a miss queue entry is allocated, the miss queue is checked to see if previously allocated entries are still valid. If so, the parentValid field is set to 1, and the parent field is set to the index of the previously allocated entry. The previously allocated entry is called the “parent” entry. If the parent entry is filled first, the parentFilled field is set to 1, and the jalrCross field is set to 1 if the last 16 bits of the parent’s filled data look like the lower 16 bits of a 32-bit JALR. When the entry’s filled data is returned, each tick of the filled data is also scanned for potential JALR instructions. This is tricky when C extensions are supported, as it may be impossible to know whether the first 16 bits of the cache block correspond to the second half of a 32-bit instruction. Therefore, both cases can be assumed. When an entry is filled, the hint bit is set if any of the following are true: (1) When a miss request is made, the pipeline already has the first 16 bits of an RVI instruction, and it looks like a JALR. (2) The parent entry is valid and is filled first, and the jalrCross bit is set. (3) When the entry’s filled data is scanned, there may be a complete JALR instruction.
[0052] If an indirect jump instruction ending in a cache line has been detected (in step 615), process 600 includes updating the hint bit associated with the cache line 620 based on the check 610 to indicate the presence of an indirect jump instruction in the cache line. If no indirect jump instruction ending in a cache line has been detected (in step 615), process 600 includes updating the hint bit associated with the cache line 630 based on the check 610 to indicate the absence of an indirect jump instruction in the cache line. In some embodiments, the hint bit is stored in an instruction cache path predictor (e.g., instruction cache path predictor 252). In some embodiments, the hint bit is stored in an instruction cache tag (e.g., in L1 instruction cache 250).
[0053] Process 600 includes reading cache lines from the cache into the fetch stage of a 640 processor pipeline (e.g., processor pipeline 130). For example, cache lines 640 can be read from the cache and rotated as needed before placing the instructions of the cache lines into an instruction queue (e.g., instruction queue 260) for decoding.
[0054] If (in step 645) the hint bit indicates the presence of an indirect jump instruction ending in a cache line, then process 600 includes enabling 650 of the indirect jump target predictor circuitry (e.g., indirect jump target predictor circuitry 132) based on the hint bit to allow the indirect jump target predictor circuitry to generate a target address prediction when the instruction from the cache line enters a pipeline stage including the indirect jump target predictor circuitry. If (in step 645) the hint bit indicates the absence of an indirect jump instruction ending in a cache line, then process 600 includes disabling 660 of the indirect jump target predictor circuitry based on the hint bit to prevent the indirect jump target predictor circuitry from generating a target address prediction when the instruction from the cache line enters a pipeline stage including the indirect jump target predictor circuitry.
[0055] Mispredictions or errors in the IJTP hint bit may occur and may require correction. For example, when C extensions are supported in a RISC-V processor, it may be impossible to accurately determine when a cache block begins with the second half of a JALR instruction during filling. In cases where multiple misses are not completed, filling may return out of order. A misprediction occurs if the fetch pipeline detects a JALR instruction during branch and jump searches in a fetch stage downstream in the processor pipeline (e.g., stage F3 208 of processor pipeline 200), but the hint bit read from the path predictor indicates no JALR instruction. In this case, the indirect jump target predictor circuitry is disabled and inaccessible, and there is no valid prediction for that fetch group. In some implementations, this misprediction is handled by treating it as a path predictor misprediction, correcting the IJTP hint bit (e.g., within the path predictor), and re-fetching. For example, disposing of a missed indirect jump instruction as a path predictor misprediction may result in a performance penalty (e.g., a 4-cycle penalty), but this is expected to be rare.
[0056] In a first aspect, the subject matter described herein can be embodied in an integrated circuit for executing instructions, the integrated circuit comprising: a processor core including a pipeline configured to execute instructions; an indirect jump target predictor circuit in an extraction stage of the pipeline, the indirect jump target predictor circuit being configured to generate a prediction of a target address for an extracted indirect jump instruction; and an immediate jump handler circuit configured to: detect an instruction sequence extracted by the processor core, wherein the instruction sequence includes a first instruction followed by a second instruction as an indirect jump instruction, the first instruction having a result depending on an immediate digital field and a program counter value of the first instruction; prevent the indirect jump target predictor circuit from generating a prediction of a target address for the second instruction in response to detecting the instruction sequence; and determine the target address of the second instruction before issuing the first instruction to an execution stage of the pipeline in response to detecting the instruction sequence.
[0057] In a second aspect, the subject matter described in this specification can be embodied in a method comprising: detecting an instruction sequence extracted by the processor core, wherein the instruction sequence includes a first instruction having a result depending on an immediate digital field and a program counter value of the first instruction, followed by a second instruction as an indirect jump instruction; preventing an indirect jump target predictor circuit from generating a target address prediction for the second instruction in response to detecting the instruction sequence; and determining the target address of the second instruction before issuing the first instruction to an execution stage of the pipeline in response to detecting the instruction sequence.
[0058] In a third aspect, the subject matter described in this specification can be embodied in an integrated circuit for executing instructions, the integrated circuit including a processor core including a pipeline configured to execute instructions of a RISC V instruction set; an indirect jump target predictor circuit in the fetch stage of the pipeline, the indirect jump target predictor circuit being configured to generate a prediction of the target address of the fetched indirect jump instruction and output the prediction to a fetch target queue; and an immediate jump handler circuit configured to: detect an instruction sequence fetched by the processor core, wherein the instruction sequence includes an AUIPC instruction followed by a JALR instruction; in response to detecting the instruction sequence, disable the indirect jump target predictor circuit to prevent the indirect jump target predictor circuit from generating a target address prediction for the JALR instruction; in response to detecting the instruction sequence, determine the target address of the JALR instruction before issuing the AUIPC instruction to the execution stage of the pipeline; and write the target address into the fetch target queue corresponding to the entry of the JALR instruction.
[0059] In a fourth aspect, the subject matter described in this specification can be embodied in a method comprising: when a cache line is input to an instruction cache via a memory bus, checking the cache line for an indirect jump instruction by scanning values appearing on the memory bus; based on the check, updating a hint bit associated with the cache line to indicate that the indirect jump instruction is not present in the cache line; and based on the hint bit, disabling an indirect jump target predictor circuit to prevent the indirect jump target predictor circuit from generating a target address prediction when the instruction of the cache line enters a stage of a processor pipeline including the indirect jump target predictor circuit.
[0060] In a fifth aspect, the subject matter described in this specification can be embodied in an integrated circuit for executing instructions, the integrated circuit including a processor core including a pipeline configured to execute instructions; an indirect jump target predictor circuit in the fetch stage of the pipeline, the indirect jump target predictor circuit being configured to generate a prediction of the target address of the fetched indirect jump instruction; and an indirect jump detector circuit configured to: check the cache line for the indirect jump instruction by scanning values appearing on the memory bus when a cache line is input to an instruction cache via a memory bus; based on the check, update a hint bit associated with the cache line to indicate that the indirect jump instruction is not present in the cache line; and based on the hint bit, disable the indirect jump target predictor circuit to prevent the indirect jump target predictor circuit from generating a target address prediction when the instruction of the cache line enters a stage of the pipeline including the indirect jump target predictor circuit.
[0061] While this disclosure has been described in conjunction with certain embodiments, it should be understood that this disclosure is not limited to the disclosed embodiments, but rather, this disclosure is intended to cover various modifications and equivalent arrangements included within the scope of the appended claims, which should be given the broadest interpretation to include all such modifications and equivalent structures.
Claims
1. An integrated circuit, comprising: A processor core, the processor core including a pipeline configured to execute instructions; An indirect jump target predictor circuit is configured to generate a prediction of the target address of the extracted indirect jump instruction. as well as Immediately jump to the processor circuit, which is configured to: Detect the instruction sequence extracted by the processor core, wherein the instruction sequence includes a first instruction followed by a second instruction as an indirect jump instruction, the first instruction having a result depending on the immediate number field and program counter value of the first instruction; In response to detecting the instruction sequence, the indirect jump target predictor circuit is prevented from generating a target address prediction for the second instruction; and Based on the immediate digital field and the program counter value of the first instruction, the target address of the second instruction is determined before the first instruction is issued to the execution stage of the pipeline.
2. The integrated circuit according to claim 1, wherein, The immediate jump handler circuit detects the instruction sequence before it enters the extraction stage including the indirect jump target predictor circuit, and the immediate jump handler circuit is configured to: In response to the detection of the instruction sequence, the indirect jump target predictor circuit is disabled.
3. The integrated circuit according to claim 1, wherein, The pipeline includes multiple extraction stages, the immediate jump processor circuit detects the instruction sequence when the instruction sequence passes through an earlier extraction stage in the pipeline than the extraction stage including the indirect jump target predictor circuit, and the immediate jump processor circuit is configured to: In response to the detection of the instruction sequence, the indirect jump target predictor circuit is disabled.
4. The integrated circuit according to claim 1, wherein, The immediate jump handler circuit is configured as follows: The status bit in the instruction cache tag is updated so that the indirect jump target predictor circuit is disabled when the second instruction enters the extraction stage of the pipeline, which includes the indirect jump target predictor circuit.
5. The integrated circuit according to claim 1, wherein, The immediate jump handler circuit is configured as follows: Update the status bits in the instruction cache path predictor so that the indirect jump target predictor circuit is disabled when the second instruction enters the extraction stage of the pipeline, which includes the indirect jump target predictor circuit.
6. The integrated circuit according to claim 1, wherein, The immediate jump handler circuit is configured as follows: When instructions are input to the instruction cache via the memory bus, the instruction sequence is detected by scanning the values that appear on the memory bus.
7. The integrated circuit according to claim 1, wherein, The immediate jump handler circuit is configured as follows: The instruction sequence is detected by scanning the values in the cache lines stored in the instruction cache.
8. The integrated circuit according to claim 1, wherein, The immediate jump handler circuit is configured as follows: The target address is written into the target extraction queue, which is configured to receive predictions from the indirect jump target predictor circuit.
9. The integrated circuit according to claim 1, wherein, The target address of the second instruction is determined before the first instruction reaches the decoding stage of the pipeline.
10. The integrated circuit according to claim 1, wherein, The processor core is configured to execute instructions of the RISC V instruction set, and The first instruction is the AUIPC instruction, and the second instruction is the JALR instruction.
11. A method comprising: Detect the instruction sequence extracted by the processor core, wherein the instruction sequence includes a first instruction followed by a second instruction as an indirect jump instruction, the first instruction having a result that depends on the immediate number field and the program counter value of the first instruction; In response to detecting the instruction sequence, the indirect jump target predictor circuit is prevented from generating a target address prediction for the second instruction; and Based on the immediate digital field and the program counter value of the first instruction, the target address of the second instruction is determined before the first instruction is issued to the execution stage of the processor core's pipeline.
12. The method according to claim 11, wherein, Detecting the instruction sequence before it enters the extraction stage, which includes the indirect jump target predictor circuit, and wherein preventing the indirect jump target predictor circuit from generating a target address prediction for the second instruction includes: In response to the detection of the instruction sequence, the indirect jump target predictor circuit is disabled.
13. The method according to claim 11, wherein, The pipeline includes multiple extraction stages, wherein the instruction sequence is detected as the instruction sequence passes through an earlier extraction stage in the pipeline than the extraction stage including the indirect jump target predictor circuitry, and in, Preventing the indirect jump target predictor circuit from generating a target address prediction for the second instruction includes: In response to the detection of the instruction sequence, the indirect jump target predictor circuit is disabled.
14. The method according to claim 11, wherein, Preventing the indirect jump target predictor circuit from generating a target address prediction for the second instruction includes: The status bit in the instruction cache tag is updated so that the indirect jump target predictor circuit is disabled when the second instruction enters a stage of the pipeline that includes the indirect jump target predictor circuit.
15. The method according to claim 11, wherein, Preventing the indirect jump target predictor circuit from generating a target address prediction for the second instruction includes: Update the status bits in the instruction cache path predictor so that the indirect jump target predictor circuit is disabled when the second instruction enters a stage of the pipeline that includes the indirect jump target predictor circuit.
16. The method according to claim 11, wherein, Detecting the instruction sequence extracted by the processor core includes: When instructions are input to the instruction cache via the memory bus, the instruction sequence is detected by scanning the values that appear on the memory bus.
17. The method of claim 11, comprising: The target address is written into the target extraction queue, which is configured to receive predictions from the indirect jump target predictor circuit.
18. The method of claim 11, comprising: When a cache line is input to the instruction cache via the memory bus, the cache line is checked to find the indirect jump instruction by scanning the values that appear on the memory bus. Based on the check, update the hint bit associated with the cache line to indicate that the indirect jump instruction does not exist in the cache line; as well as Based on the aforementioned hint bit, the indirect jump target predictor circuit is disabled to prevent the indirect jump target predictor circuit from generating a target address prediction when an instruction from the cache line enters a stage of the pipeline that includes the indirect jump target predictor circuit.
19. An integrated circuit, comprising: A processor core, the processor core including a pipeline configured to execute instructions of the RISC V instruction set; The indirect jump target predictor circuit in the pipeline is configured to generate a prediction of the target address of the extracted indirect jump instruction and output the prediction to the extracted target queue. as well as Immediately jump to the processor circuit, which is configured to: Detect the instruction sequence extracted by the processor core, wherein the instruction sequence includes an AUIPC instruction followed by a JALR instruction; In response to the detection of the instruction sequence, the indirect jump target predictor circuit is disabled to prevent the indirect jump target predictor circuit from generating a target address prediction for the JALR instruction; Based on the immediate numeric field and program counter value of the AUIPC instruction, the target address of the JALR instruction is determined before the AUIPC instruction is issued to the execution stage of the pipeline; and The target address is written into the entry corresponding to the JALR instruction in the target extraction queue.
20. The integrated circuit according to claim 19, wherein, The immediate jump handler circuit is configured as follows: Update the status bits in the instruction cache path predictor so that the indirect jump target predictor circuit is disabled when the JALR instruction enters the fetch stage of the pipeline that includes the indirect jump target predictor circuit.