Systems and methods for selectively bypassing address generation hardware in a processor instruction pipeline
By introducing the AAGEN Bypass Determination Unit (ABDU), the effective address of AAGEN-level compute load/store instructions is selectively bypassed, solving the problem of wasted time and power in the prior art and improving the efficiency and performance of the processor.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- ADVANCED MICRO DEVICES INC
- Filing Date
- 2019-08-20
- Publication Date
- 2026-06-12
Smart Images

Figure CN112534406B_ABST
Abstract
Description
Background of the Invention
[0002] In current processor implementations, instructions are executed by an instruction pipeline, which is a set of functional units (i.e., digital logic circuits). This set of functional units includes: a branch prediction unit and a fetch unit, collectively referred to as the front end of the pipeline; a decoding unit, which includes a dispatch stage; an execution / scheduler unit (EXSC); and a load / store unit, which interfaces with a Level 1 (L1) cache, which in turn interfaces with a Level 2 (L2) cache. The instruction pipeline processes various types of instructions, including load / store instructions. Each load / store instruction is either a load instruction for retrieving data from a memory address or a store instruction for writing data to a memory address. The memory address from which a load / store instruction loads or stores data is called the effective address of the load / store instruction, and it is specified in the load / store instruction using the addressing mode.
[0003] EXSC includes digital logic circuitry known as the address generation (AGEN) stage (also called AAGEN hardware, AAGEN unit (AGU), address calculation unit (ACU), etc.), which computes the effective address for each load / store instruction processed by the instruction pipeline. Each AAGEN calculation incurs a cost, at least in terms of time and power. Each load / store instruction then enters the load / store unit from EXSC, which executes the instruction using the effective address computed by the AAGEN stage. Because the effective address is computed for each load / store instruction at the AAGEN stage, the current processor wastes time and power computed for load / store instructions that know all the inputs used to compute the effective address at points prior to the AAGEN stage in the instruction pipeline. Summary of the Invention
[0004] This paper discloses systems and methods for selectively bypassing the AAGEN hardware in a processor instruction pipeline. Among other advantages, instead of wasting time and power using its AAGEN stage to compute effective addresses for every load / store instruction in a given instruction set (i.e., in a given instance of executable code, such as a program, application, applet, etc.), the processor computes effective addresses only for an identified subset of those load / store instructions. This reduces both processing time and power consumption, among other technical advantages.
[0005] In some implementations, the system and method identify situations where, at a point in the instruction pipeline prior to the AAGEN stage, all inputs for the AAGEN computation of the corresponding load / store instruction are known. In such cases, load / store instructions are routed to bypass the AAGEN stage and not trigger AAGEN computation, while load / store instructions that are not known at least one AAGEN computation input at that point in the instruction pipeline are routed via the AAGEN stage, such that the AAGEN stage still performs AAGEN computation on those load / store instructions.
[0006] As used herein, the term "AGEN computed input" refers to the input that the AGEN stage will use to compute the effective address of a given load / store instruction when it is actually routed through the AGEN stage. According to this system and method, not all load / store instructions are routed through the AGEN stage. In some cases, the AGEN computed input is known at a point in the pipeline before the AGEN stage because the input is a constant value that does not change by definition. Two examples of this type of load / store instruction are (i) program counter (PC) dependent (aka instruction pointer (IP) dependent) load / store instructions, and (ii) shift-only (aka instant shift) load / store instructions.
[0007] In other cases, the input for a given AGN computation is known at the point before the AGN stage in the instruction pipeline because this input has a known value (stored in, for example, a register) that may change (due to, for example, the execution of one or more other instructions). An example of the second type of load / store instruction is a stack pointer (SP)-dependent load / store instruction. Regarding this second type of load / store instruction, the system and method monitor this dependency and only allow these load / store instructions to completely bypass the AGN stage if no event occurs that invalidates the dependency (e.g., the dependent register is overwritten by a subsequent instruction). When such an event does occur, the implementation of the system and method "reverts" from allowing those load / store instructions to bypass the AGN stage, instead routing them via the AGN stage. This incurs costs in terms of time and power, but is done to achieve accurate execution.
[0008] In one implementation, for load / store instructions routed to bypass the AAGEN stage, the processor performs an addition operation (performed, for example, at a load / store unit) on the AAGEN computed inputs of the load / store instruction (i.e., the operands related to the effective address of the load / store instruction) to determine the effective addresses of those load / store instructions. This still incurs costs in terms of time and power, but these costs are less than the costs incurred by the AAGEN stage processing those same load / store instructions. In some implementations, the addition operation is prepared at a point in the instruction pipeline before the AAGEN stage by converting one or more register references (e.g., a reference to the SP register (rSP)) into integer values currently stored in the referenced register, thereby avoiding the need for the subsequent stage (e.g., a load / store unit) to access the register to retrieve the same value.
[0009] This system and method address several technical problems in processor-first implementations, including the issue of routing each load / store instruction via its corresponding AAGEN stage to compute the effective address. This is time-consuming and power-intensive. This system and method offer a technical solution to this problem by determining, in the digital logic at the processor pipeline stage prior to the AAGEN stage, whether the effective address of each load / store instruction is already known. If not, the load / store instruction is routed via the AAGEN stage. If so, the load / store instruction is routed to bypass the AAGEN stage.
[0010] One embodiment takes the form of a method executed by one or more processors. The method includes receiving a load / store instruction into the processor's AAGEN bypass determination unit (ABDU). If the valid address of the load / store instruction is not known at the ABDU, the load / store instruction is routed via the processor's AAGEN level. However, if the valid address of the load / store instruction is known at the ABDU, the load / store instruction is routed to bypass the AAGEN level. Another embodiment takes the form of an integrated circuit having instructions that, when executed, cause the integrated circuit, or a system embedded therein or otherwise mounted therein, or communicatively connected to the integrated circuit, to perform the method. Another embodiment takes the form of a system having a processor and a non-transitory data storage device containing instructions that, when executed by the processor, cause the system to perform the method.
[0011] Another implementation takes the form of a processor, which includes an AGN level and an ABDU. The ABDU receives load / store instructions. If the valid address of the load / store instruction is unknown at the ABDU, the load / store instruction is routed via the AGN level. However, if the valid address of the load / store instruction is known at the ABDU, the load / store instruction is routed to bypass the AGN level.
[0012] Another embodiment takes the form of a non-transitory computer-readable medium containing instructions executable by an integrated circuit manufacturing system to manufacture a processor having at least the elements listed in the previous paragraph. In at least one such embodiment, the instructions include a register-transfer level (RTL) representation of the processor. In at least another such embodiment, the instructions include high-level design language (HDL) instructions representing the processor.
[0013] In one implementation, the effective address of the load / store instruction is known at the ABDU when the load / store instruction is a PC-dependent load / store instruction and / or a shift-only load / store instruction. In other examples, the effective address of the load / store instruction is known at the ABDU when (i) the load / store instruction is an SP-dependent load / store instruction and (ii) the ABDU has the current value of rSP.
[0014] In one implementation, the AAGEN level uses multiple valid address inputs of the load / store instruction to calculate the valid address of the load / store instruction. Not knowing the valid address of the load / store instruction at the ABDU includes the case where at least one of the valid address inputs of the load / store instruction is unknown at the ABDU. The valid address of the load / store instruction is known at the ABDU when each of the valid address inputs of the load / store instruction is known.
[0015] In one embodiment, the processor includes: a load / store unit; a first circuit path communicatively coupled to the ABDU and the load / store unit, and including an AAGEN level; and a second circuit path communicatively coupled to the ABDU and the load / store unit, and bypassing the AAGEN level. Routing load / store instructions via the AAGEN level includes routing load / store instructions via the first circuit path. Routing load / store instructions to bypass the AAGEN level includes routing load / store instructions via the second circuit path. In another embodiment, routing load / store instructions via the second circuit path includes asserting a bypass qualification flag corresponding to the load / store instruction. The load / store unit processes load / store instructions whose corresponding bypass qualification flag is asserted, and discards load / store instructions whose corresponding bypass qualification flag is cleared.
[0016] In one implementation, the method is executed per clock cycle by the processor relative to a first integer number of load / store instructions. The method also includes asserting a corresponding bypass qualification flag for each load / store instruction routed via a second circuit path. The load / store unit processes load / store instructions whose corresponding bypass qualification flags are asserted and discards load / store instructions whose corresponding bypass qualification flags are cleared. One such implementation includes asserting corresponding bypass qualification flags for up to a second integer number of load / store instructions per clock cycle, wherein the second integer number is less than the first integer number. In one such implementation, the load / store unit has exactly the second integer number of load / store pipelines.
[0017] In one implementation, the load / store unit calculates the effective address of a load / store instruction received by the load / store unit via a second circuit path. In one implementation, the load / store instruction includes a reference to a register, and the method includes replacing the reference in the load / store instruction with a value currently stored in the register.
[0018] This document describes further variations and substitutions of the above-listed examples of implementation schemes. Furthermore, it is explicitly stated that such variations and substitutions described herein can be implemented with respect to any implementation scheme, including any methodological implementation scheme, any system implementation scheme, and any computer-readable medium implementation scheme of integrated circuit manufacturing instructions, regardless of the type of implementation scheme in which such variations and substitutions are primarily described herein. Moreover, this flexibility and cross-applicability of implementation schemes persists despite the use of any slightly different language (e.g., process, method, step, function, function set, etc.) to describe and / or characterize such implementation schemes. Attached Figure Description
[0019] A more detailed understanding can be obtained from the following description, presented by way of example in conjunction with the accompanying drawings, in which the same reference numerals are used for the same elements.
[0020] Figure 1 This is a simplified diagram of an example processor-based device including an example processor according to one implementation scheme.
[0021] Figure 2 It is based on an implementation plan. Figure 1 A partial view of the first example instruction pipeline of the processor.
[0022] Figure 3 It is based on an implementation plan. Figure 1 A partial view of the second example instruction pipeline of the processor.
[0023] Figure 4 It is based on an implementation plan. Figure 1 A partial view of the third example instruction pipeline of the processor.
[0024] Figure 5 It is based on an implementation plan. Figure 1 A partial view of the fourth example instruction pipeline of the processor, wherein the example AGN bypass determination unit (ABDU) resides in the dispatch stage of the decoding unit of the fourth example instruction pipeline.
[0025] Figure 6 It is based on a first example circuit configuration according to an implementation scheme. Figure 5 A simplified diagram of ABDU.
[0026] Figure 7 It is based on a second example circuit configuration according to an implementation scheme. Figure 5 A simplified diagram of ABDU.
[0027] Figure 8 This is a flowchart illustrating an example of the actual implementation of the path selection logic in an ABDU, based on an implementation scheme.
[0028] Figure 9 This is a flowchart of an example method for selectively bypassing address generation hardware according to one implementation scheme.
[0029] Figure 10 It is based on an implementation plan. Figure 9 The flowchart illustrates an example implementation of load / store instruction routing as part of the method.
[0030] Detailed description of the attached diagram
[0031] To facilitate understanding of the principles of this disclosure, reference is made to embodiments illustrated in the accompanying drawings, which are described below. The embodiments disclosed herein are not intended to be exhaustive or to limit this disclosure to the precise forms disclosed in the following detailed description. Rather, the embodiments were chosen and described so that others skilled in the art can utilize the teachings of these embodiments. Therefore, it is not intended to limit the scope of this disclosure.
[0032] Throughout this disclosure and the claims, ordinal modifiers such as first, second, third, and fourth are used to refer to various parts, data (such as various identifiers), and / or other elements. Such use is not intended to indicate or specify a particular or required order of elements. Rather, the ordinal modifiers are used to assist the reader in identifying the referenced element and distinguishing it from other elements, and should not be interpreted narrowly as adhering to any particular order.
[0033] Figure 1An example of a processor-based device 100 is shown, which includes a processor 102, a data storage device 104, a communication interface 106, and an optional user interface 108, all of which are communicatively interconnected via a bus structure 110. The processor-based device 100 may include different components because... Figure 1 The illustrations are given by way of example. As an example, the processor-based device 100 may be a computer, personal computer, desktop computer, workstation, laptop computer, tablet computer, cellular phone, smartphone, wearable device, personal digital assistant (PDA), set-top box, game console, game controller, server, printer, or any other processor-based device.
[0034] Processor 102 may be a microprocessor, central processing unit (CPU), graphics processing unit (GPU), one or more processor cores, or any other type of processor that implements an instruction pipeline and is equipped and configured to embody and / or perform one or more embodiments of the system and method. Data storage device 104 may be any type of non-transitory data storage device, such as random access memory (RAM), read-only memory (ROM), erasable programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), flash memory, magnetic disk, optical disk, etc.
[0035] In one embodiment, communication interface 106 includes a wired communication interface for communicating with one or more other processor-based devices and / or other communication entities according to a wired communication protocol (such as Ethernet). In one embodiment, instead of or in addition to the wired communication interface, communication interface 106 includes a wireless communication interface, which includes corresponding hardware, firmware, etc., for wirelessly communicating with one or more devices and / or other entities using one or more wireless communication protocols (such as WiFi, Bluetooth, LTE, WiMAX, CDMA, etc.).
[0036] User interface 108 is not present in all cases of processor-based device 100. For example, if processor-based device 100 is a web server, a user interface may not exist. When user interface 108 is present, it includes one or more input devices and / or one or more output devices. The one or more input devices may include a touchscreen, keyboard, mouse, microphone, etc., while the one or more output devices may include a display (e.g., a touchscreen), one or more speakers, one or more indicator light-emitting diodes (LEDs), etc.
[0037] Figure 2This is a partial view of an example instruction pipeline of processor 102, depicting processor 102 as including ABDU 200 and AAGEN stage 204. In some embodiments, ABDU 200 is implemented as deterministic digital logic circuitry, but in other embodiments, ABDU 200 is implemented as a state machine or a combination of state machines. In one embodiment, AAGEN stage 204 is implemented as deterministic digital logic circuitry and, as known in the art, may include one or more arithmetic logic units (ALUs), one or more of their own registers, etc., and computes effective addresses using one or more types of arithmetic (e.g., linear arithmetic, modular arithmetic, etc.).
[0038] like Figure 2 As shown, ABDU 200 receives load / store instruction 206 via communication path 208 and routes load / store instruction 206 via a first circuit path referred to herein as AGEN path 201 or a second circuit path referred to herein as AGEN bypass path 202. Both AGEN path 201 and AGEN bypass path 202 are circuit paths that include hardware such as wiring, contacts, pins, circuit elements such as triggers, and / or other hardware for conveying electrical signals.
[0039] AGEN path 201 includes AGEN level 204. AGEN bypass path 202 does not include AGEN level 204. When the valid address of load / store instruction 206 is unknown at ABDU 200, ABDU 200 routes load / store instruction 206 via AGEN path 201, and conversely, when the valid address of load / store instruction 206 is known at ABDU 200, it routes load / store instruction 206 via AGEN bypass path 202. In one embodiment, the valid address of load / store instruction 206 is unknown at ABDU 200 when at least one of the inputs used to calculate the valid address is unknown, and the valid address of load / store instruction 206 is known at ABDU 200 when each of the inputs used to calculate the valid address is known. ABDU 200 does not actually need to calculate the valid address of load / store instruction 206.
[0040] Figure 3Another example of the instruction pipeline for processor 102 is depicted as also including a load / store unit 300 and an L1 data cache 302. Both AGEN path 201 and AGEN bypass path 202 extend between ABDU 200 and the load / store unit 300, which receives load / store instructions via AGEN path 201 and AGEN bypass path 202, and processes those load / store instructions into the L1 data cache 302 via data paths 304 and 306. The L1 data cache 302 interfaces with an L2 data cache (not shown). In one embodiment, when processing a load / store instruction received via AGEN bypass path 202, the load / store unit 300 determines the effective address of the load / store instruction by performing an addition operation on two or more operands of the load / store instruction.
[0041] Figure 4 An embodiment of the instruction pipeline of processor 102 is described, comprising: a decoding unit 402, within which ABDU 200 resides; an EXSC 404, within which an AAGEN stage 204 resides; and a load / store and data cache unit (LSDC) 406, which substantially performs... Figure 3 The load / store unit 300 performs the functions of the ABDU 200, but the L1 data cache 302 is an incorporated element. EXSC 404 includes a physical register file (PRF) 410 and a digital logic device referred to herein as a register value upstream relay (RVUR) 412, which transmits one or more register values 414 read from the PRF 410 to ABDU 200 via data link 416. LSDC 406 includes selector circuitry 418 that interfaces with the AGN path 201, the AGN bypass path 202, and the L1 data cache 302.
[0042] Figure 5 An implementation in which dispatch level 500 resides within decoding unit 402 and includes ABDU 200 is depicted. Data links 501 and 502 form the initial portions of AGN path 201 and AGN bypass path 202, respectively.
[0043] Figure 6An example is depicted in which ABDU 200 includes path selection logic circuitry 600 and path switching circuitry 602. The inputs to path selection logic circuitry 600 are load / store instruction 206 and register value 414, and the outputs are load / store instruction 206 (via data link 604) and switching control signal 606. Path selection logic circuitry 600 implements path selection logic 601. The inputs to path switching circuitry 602 are load / store instruction 206 (via data link 604) and switching control signal 606, and the output of path switching circuitry 602 is load / store instruction 206 on data link 501 or data link 502.
[0044] The path switching circuit 602 includes a switching point 607, a switchable data link 608, a contact 610 at the initial end of the data link 501, and a contact 612 at the initial end of the data link 502. Figure 6 The ABDU 200 is depicted routing the load / store instruction 206 via the AGN path 201. A switchable data link 608 extends from the switching point 607 to the contact 610, causing the path switching circuit 602 (and therefore the ABDU 200) to output the load / store instruction 206 on the data link 501.
[0045] Figure 7 An example is depicted in which ABDU 200 routes load / store instruction 206 via AGN bypass path 202. Switchable data link 608 extends from switching point 607 to contact 612, causing path switching circuit 602 (and therefore ABDU 200) to output load / store instruction 206 on data link 502.
[0046] Figure 8 An example implementation of path selection logic 601, implemented by path selection logic circuitry 600, is depicted. At step 802, path selection logic circuitry 600 receives load / store instruction 206 from the fetch stage (not shown) or other stage of the instruction pipeline of processor 102.
[0047] At step 804, the path selection logic circuit 600 determines whether all valid address inputs to the load / store instruction 206 are known. If it is determined at step 804 that all valid address inputs to the load / store instruction 206 are unknown, then at step 806, the path selection logic circuit 600 sets the switching control signal 606 to AGEN, which can be implemented as a logic binary 0. However, if it is determined at step 804 that all valid address inputs to the load / store instruction 206 are known, then at step 808, the path selection logic circuit 600 sets the switching control signal 606 to AGEN-Bypass, which can be implemented as a logic binary 1. At step 810, the path selection logic circuit 600 outputs both the load / store instruction 206 and the switching control signal 606 (set to AGEN or AGEN-Bypass).
[0048] In one implementation, where the load / store instruction 206 includes one or more references to one or more registers, path selection logic 601 includes an ABDU 200 having the current value of each such register, as a necessary condition for determining at step 804 that all valid address inputs for the load / store instruction 206 are known. In one example, ABDU 200 obtains such a value from register value 414 from RVUR 412.
[0049] When the path selection logic circuit 600 sets the switching control signal 606 to AGEN, the path switching circuit 602 responsively places the switchable data link 608 in the AGEN position. Figure 6 The location is shown in the diagram, and ABDU 200 routes load / store instructions 206 via the AAGEN path 201. When the path selection logic circuit 600 alternatively sets the switching control signal 606 to AAGEN-bypass, the path switching circuit 602 responsively places the switchable data link 608 in the specified position. Figure 7 The location is shown in the diagram, and ABDU 200 routes the load / store instruction 206 via the AGN bypass path 202.
[0050] Figure 9 This is a flowchart of an example method 900 for selectively bypassing the AAGEN hardware. Unless otherwise stated, the following references... Figure 4 The instruction pipeline described in the diagram is used to describe method 900. For example, referencing... Figure 3 300 different load / store units Figure 4 The load / store unit (i.e., LSDC 406). In some embodiments, ABDU 200 is executed per clock cycle relative to a single load / store instruction execution method 900, while in other embodiments, ABDU 200 is executed per clock cycle relative to multiple load / store instruction execution methods 900.
[0051] At step 902, ABDU 200 receives load / store instruction 206 from the fetch stage (not shown) or other stage of the instruction pipeline of processor 102. In one embodiment, load / store instruction 206 includes all the information required by ABDU 200 to determine whether to route load / store instruction 206 via AGN path 201 or via AGN bypass path 202. Method 900 also includes steps 906 and 908. In any given instance of ABDU 200 performing method 900, ABDU 200 performs step 906 or step 908 depending on whether the valid address of load / store instruction 206 is known at ABDU 200, as in Figure 9 The decision box 904 represents the middle part.
[0052] If the valid address of load / store instruction 206 is unknown at ABDU 200, then at step 906, ABDU 200 routes load / store instruction 206 via AAGEN level 204. In one embodiment, the valid address of load / store instruction 206 is unknown at ABDU 200 when at least one of the inputs used to calculate the valid address of load / store instruction 206 is unknown at ABDU 200. In one embodiment, ABDU 200 performs step 906 by routing load / store instruction 206 via AAGEN path 201, which in one embodiment traverses EXSC 404 and includes AAGEN level 204 residing therein.
[0053] However, if the valid address of load / store instruction 206 is known at ABDU 200, then at step 908, ABDU 200 routes load / store instruction 206 to bypass AAGEN level 204. In one embodiment, the valid address of load / store instruction 206 is known at ABDU 200 when each of the inputs used to calculate the valid address of load / store instruction 206 is known at ABDU 200. In one embodiment, ABDU 200 performs step 908 by routing load / store instruction 206 via AAGEN bypass path 202. In some embodiments, AAGEN bypass path 202 traverses EXSC 404 (but not AAGEN level 204). In other embodiments, AAGEN bypass path 202 does not traverse EXSC 404.
[0054] In various implementations, there are many different ways and situations in which ABDU 200 selectively executes step 906 or step 908 with respect to a given load / store instruction, as indicated by decision block 904. To illustrate some of those options, assume that processor 102 uses a “base + index + offset” addressing scheme, according to which load / store instruction 206 has the following structure (simplified for the purposes of this disclosure; other fields may exist and other addressing schemes may be used):
[0055]
[0056] This is an instruction (which is an opcode) for “loading” a value stored at an address in memory into a register named “Register 1”, the address being the sum of: (i) the value in the “Base Address” field or stored in the register identified in the “Base Address” field, (ii) the value in the “Index” field or stored in the register identified in the “Index” field, and (iii) the value in the “Offset” field.
[0057] In one implementation, ABDU 200 selectively performs step 906 or step 908 relative to load / store instruction 206 by determining whether ABDU 200 has the current value of each of the base address, index, and offset fields of load / store instruction 206. In the typical case where the offset field contains constants (unlike a reference or pointer to a value stored elsewhere), ABDU 200 may consider the offset to be known. Regarding the base address and index, ABDU 200 may consider the base address and index to be known if they are constants (i.e., 0 or another integer), or if they contain a reference to a register (such as PC, rSP, or any other register) where the ABDU has its current value. One way ABDU 200 may have the current value of a referenced register is if RVUR 412 recently relayed a copy of the data stored in the referenced register to ABDU 200.
[0058] In one implementation, when load / store instruction 206 is a PC-dependent load / store instruction, the effective address of load / store instruction 206 is known at ABDU 200. PC (also known as Instruction Pointer (IP)) is a register that stores the address of the current instruction being executed by processor 102 (or, in some cases, to be executed next). Modifying the example instruction structure above to a PC-dependent instruction produces the instruction shown here:
[0059]
[0060] The effective address of the instruction is the sum of the value in the PC register and the value in the offset field of the instruction (in some cases, there is a non-zero constant in the index field, which is also included in the sum).
[0061] In one implementation, when load / store instruction 206 is a shift-only load / store instruction (such as the one shown here), the effective address of load / store instruction 206 is known at ABDU 200:
[0062]
[0063] The effective address of this instruction is the value in the offset field. In some cases, one or both of the base address and index fields contain non-zero constants. In such cases, the effective address is still the sum of the base address, index, and offset fields, but not equal to the value in the offset field.
[0064] In one implementation, when (i) load / store instruction 206 is an SP-related load / store instruction and (ii) ABDU 200 has the current value of rSP, the effective address of load / store instruction 206 is known at ABDU 200, where rSP is a register that holds the memory address of the current top of the stack (also known as the call stack, execution stack, program stack, control stack, runtime stack, machine stack, etc.). An example SP-related load / store instruction is shown here:
[0065]
[0066] The effective address of this instruction is the sum of the value in rSP and the value in the offset field (and any non-zero values present in the index field).
[0067] Figure 10 Is Figure 9 The flowchart illustrates an example implementation of load / store instruction routing, represented by decision box 904. At step 1002, ABDU 200 parses load / store instruction 206. In this example, load / store instruction 206 has the following form:
[0068]
[0069]
[0070] Where "l / s" represents the "load" or "store" of the opcode.
[0071] At step 1004, ABDU 200 determines whether the base address field of load / store instruction 206 contains a reference to the PC, i.e., whether load / store instruction 206 is a PC-dependent load / store instruction. If it is determined at step 1004 that the base address field of load / store instruction 206 does indeed contain a reference to the PC, then at step 908, ABDU 200 routes the load / store instruction to bypass AGN level 204. However, if it is determined at step 1004 that the base address field of load / store instruction 206 does not contain a reference to the PC, then control proceeds to step 1006, where ABDU 200 determines whether both the base address field and the index field of load / store instruction 206 are equal to zero, i.e., whether load / store instruction 206 is a shift-only load / store instruction. In one embodiment, step 1004 includes a second necessary condition, i.e., ABDU 200 has the current value of the PC.
[0072] If it is determined at step 1006 that both the base address field and the index field of the load / store instruction 206 are equal to zero, then at step 908, ABDU 200 routes the load / store instruction to bypass AGN level 204. However, if it is determined at step 1006 that both the base address field and the index field of the load / store instruction 206 are not equal to zero, i.e., at least one of those two fields is not equal to zero, then control proceeds to step 1008, where ABDU 200 determines whether the base address field of the load / store instruction 206 contains a reference to rSP, i.e., whether the load / store instruction 206 is an SP-related load / store instruction.
[0073] If it is determined at step 1008 that the base address field of the load / store instruction 206 does indeed contain a reference to rSP, then at step 908, ABDU 200 routes the load / store instruction 206 to bypass the AAGEN level 204. However, if it is determined at step 1008 that the base address field of the load / store instruction 206 does not contain a reference to rSP, then at step 906, ABDU 200 routes the load / store instruction 206 via the AAGEN level 204. In one embodiment, step 1008 includes a second necessary condition, namely, that ABDU 200 has the current value of rSP. In some embodiments, steps 1004, 1006, and 1008 are executed simultaneously on the load / store instruction 206 as a logical OR of three different cases.
[0074] In some implementations, processor 102 implements control flow for the AAGEN bypass path 202. In such implementations, the AAGEN bypass path 202 carries not only load / store instructions routed via the ABDU 200, but also signaling paths carrying control information associated with and communicated in parallel with those load / store instructions. In some implementations, this control information takes the form of binary flags (referred to as "bypass qualifier flags") that are transmitted along the AAGEN bypass path 202 in parallel with each load / store instruction routed via that path. A bypass qualifier flag is asserted (i.e., set, equal to 1) to indicate that the corresponding load / store instruction is eligible to bypass AAGEN level 204, while a bypass qualifier flag is cleared (i.e., reset, equal to 0) to indicate that the corresponding load / store instruction is not eligible to bypass AAGEN level 204.
[0075] In embodiments implementing this control flow, one or more components of the instruction pipeline (i) process load / store instructions on the AAGEN bypass path 202 whose bypass qualification flag is asserted, and (ii) ignore load / store instructions on the AAGEN bypass path 202 whose bypass qualification flag is cleared. Such components include LSDC 406, and in some embodiments, EXSC 404 and / or one or more other components are also included.
[0076] In another example, this control flow is not employed. In this case, (i) each load / store instruction 206 evaluated by ABDU 200 for AAGEN bypass eligibility is routed only via one of the two paths (i.e., AAGEN path 201 or AAGEN bypass path 202) instead of both, and (ii) only relatively simple types of load / store instructions (e.g., offset-only) are eligible to bypass AAGEN level 204. Control flow can be implemented in this type of implementation, but since those relatively simple types of load / store instructions do not become ineligible to bypass AAGEN level 204, the aforementioned control flow is unnecessary.
[0077] In some implementations, load / store instructions with register-dependent (e.g., rSP-dependent) addressing are eligible for AGN bypass. In at least some of these implementations, control flow is implemented such that the bypass eligibility flag for each load / store instruction routed via AGN bypass path 202 is initially asserted. If processor 102 later determines that an instruction is no longer eligible for AGN bypass (e.g., if the instruction depends on an invalid rSP value), processor 102 clears the corresponding bypass eligibility flag and backtracks its overall process so that the instruction can subsequently be routed via AGN path 201.
[0078] In some implementations, each load / store instruction for which the ABDU 200 evaluates its AAGEN bypass eligibility is transmitted via the AAGEN bypass path 202. The corresponding bypass eligibility flag of a load / store instruction determined by the ABDU 200 to be eligible for AAGEN bypass is asserted (and is considered in the language of this disclosure to be an instruction already routed via the AAGEN bypass path 202), while all other load / store instructions are still transmitted along the AAGEN bypass path 202 if their corresponding bypass eligibility flags are cleared, and are accordingly ignored.
[0079] In one implementation, in the case of a load / store instruction containing one or more register references, if processor 102 later determines, for example, that the load / store instruction contains content that has become an invalid register reference, processor 102 clears the bypass eligibility flag corresponding to any load / store instruction already routed via the AGN bypass path 202. One example of this occurring is when processor 102 determines that a write operation is pending for a register referenced by a given load / store instruction. Another example is when processor 102 determines that an instruction following a given load / store instruction has changed the value contained in a register referenced by the given load / store instruction.
[0080] In one implementation, ABDU 200 replaces any register references in load / store instruction 206 with a copy of the data (e.g., an integer) currently stored in the referenced register. This can be performed by ABDU 200 using information from register value 414. In implementations operating in this manner, this step eliminates the need for any downstream entity to spend time and energy retrieving data already held by ABDU 200.
[0081] In some implementations, ABDU 200 evaluates whether a valid address of each of a plurality of load / store instructions is known at ABDU 200 in a given clock cycle, and routes each evaluated load / store instruction accordingly via AGN path 201 or AGN bypass path 202. In some cases, this results in ABDU 200 routing multiple load / store instructions via AGN bypass path 202 in a given clock cycle. Any plurality of load / store instructions can be evaluated and thus routed in parallel. In one implementation, ABDU processes up to six load / store instructions in parallel per clock cycle.
[0082] In some implementations, the ABDU 200 limits the number of load / store instructions it can route via the AGN bypass path 202 in a given clock cycle. In some such cases, the upper limit in a given clock cycle is equal to the number of load / store pipelines the LSDC 406 has. Thus, in one example, even though the ABDU 200 can route up to 6 load / store instructions per clock cycle via the AGN bypass path 202, the ABDU 200 actually never routes more than 3 load / store instructions per clock cycle via the AGN bypass path 202 because, in this example, the LSDC 406 only has 3 load / store pipelines.
[0083] In different implementations, ABDU 200 enforces the limit in many different ways. In some implementations, ABDU 200 routes load / store instructions up to the limit only per clock cycle via the AAGEN bypass path 202, which is done, for example, by asserting a bypass eligibility flag for the limit per clock cycle. In other implementations, ABDU 200 implements a second control flag for each load / store instruction. This second control flag is referred to herein as the bypass selection flag, and if both the corresponding bypass eligibility flag and the corresponding bypass selection flag are asserted, the load / store instruction is processed only on the AAGEN bypass path 202 by, for example, LSDC 406. The two-flag option may provide more flexibility, but at the cost of resources.
[0084] In some cases, when traversing EXSC 404, at least two load / store instructions routed via the AGN bypass path 202 in a given clock cycle are still eligible for AGN bypass because no invalidation event has occurred regarding them. In some such implementations, EXSC 404 selects one or more of those still eligible instructions to be executed on the AGN bypass path 202, while discarding the others. EXSC 404 may make such a selection randomly, or it may use strategies to make such selections, such as favoring load / store instructions that do not depend on one or more registers over those that do (to reduce the probability of incurring the cost of invalidating load / store instructions that were originally routed on the AGN bypass path 202). In at least some of these implementations, EXSC 404 tracks its selections and informs one or more other components of such decisions. In an implementation where a complete withdrawal strategy is achieved as soon as a load / store instruction on the AAGEN bypass path 202 removes its AAGEN bypass qualification, EXSC 404 notifies upstream entities, such as ABDU 200, fetch unit, etc., so that the relevant load / store instruction is instead routed via the AAGEN path 201 and the pipeline is refreshed if necessary.
[0085] In some implementations, a copy of each load / store instruction evaluated by ABDU 200 is transmitted along both the AAGEN path 201 and the AAGEN bypass path 202, and corresponding control flags are available for entities on both paths. In such implementations, as stated in this disclosure, if ABDU 200 initially clears the corresponding bypass qualifying flag, a given load / store instruction is considered to have been routed by ABDU 200 via AAGEN path 201; alternatively, if ABDU 200 initially asserts the corresponding bypass qualifying flag, it is considered to have been routed via AAGEN bypass path 202. In such implementations, efficiency can be achieved relative to a full withdrawal option because processor 102 is typically able to clear the corresponding bypass qualifying flag in a timely manner to guide AAGEN stage 204 in calculating the effective address of the load / store instruction. Alternatively, separate control paths may be implemented for AAGEN path 201 and AAGEN bypass path 202.
[0086] In one implementation, decoding unit 402 and EXSC 404 cooperate in managing a limited number of scheduler tokens for decoding unit 402. In an example implementation, when EXSC 404 decides to revoke the AEN bypass eligibility of a given load / store instruction, EXSC 404 responds by allocating a scheduler entry in AEN path 201. To prepare for this, in some implementations, decoding unit 402 proactively assumes this will occur and, regardless of whether ABDU 200 initially asserts or initially clears the corresponding bypass eligibility flag, assigns a scheduler token (e.g., ID) accordingly to each load / store instruction. Thus, when EXSC 404 revokes the AEN bypass eligibility of a given instruction, the instruction is ready for processing by AEN level 204. When EXSC 404 alternatively allows a given load / store instruction to maintain its AEN bypass eligibility, EXSC 404 returns the corresponding previously allocated scheduler token to decoding unit 402.
[0087] In one implementation, token exchange also occurs between EXSC 404 and LSDC 406. In those cases, the tokens are related to the current capacity of various load / store pipelines in LSDC 406. When LSDC 406 picks up an instruction to be processed from those load / store pipelines, LSDC 406 accordingly notifies EXSC 404 by returning the corresponding load / store pipeline token that EXSC 404 has already assigned to that instruction for reuse.
[0088] Various embodiments take the form of a non-transitory computer-readable medium containing instructions executable by an integrated circuit manufacturing system to manufacture any of the described embodiments of processor 102. The instructions contained on the computer-readable medium may take the form of or include: RTL representation; HDL (also known as hardware description code) instructions in a language such as Analog HDL (AHDL), Verilog HDL, SystemVerilogHDL, Very High Speed Integrated Circuit (VHSIC) Hardware Description Language (VHDL), etc.; code in a higher-level or modeling language such as C, C++, SystemC, Simulink, MATLAB, etc.; physical layout code such as Graphical Database System II (GDSII) code; and / or one or more other types of instructions.
Claims
1. A method, the method being executed by one or more processors, the method comprising: The load / store instruction is received in the address generation bypass determination unit (ABDU) of the processor, which is located before the address generation stage of the processor in the pipeline; Parse the load / store instruction and determine whether the ABDU has a current value for each of a plurality of valid address inputs used to calculate a valid address of the load / store instruction; If the valid address of the load / store instruction is not known at the ABDU, the load / store instruction is routed to the AAGEN level. The valid address of the load / store instruction is not known at the ABDU when at least one of the plurality of valid address inputs of the load / store instruction is not known at the ABDU. as well as If the effective address of the load / store instruction is known at the ABDU, the load / store instruction is routed to the load / store unit to bypass the AAGEN level, wherein the effective address of the load / store instruction is known at the ABDU when the current value of each of the plurality of effective address inputs used to calculate the effective address of the load / store instruction is known at the ABDU.
2. The method of claim 1, wherein when the load / store instruction is a program counter (PC)-related load / store instruction or a shift-only load / store instruction, the effective address of the load / store instruction is known at the ABDU.
3. The method of claim 1, wherein the effective address of the load / store instruction is known at the ABDU in the following cases: The load / store instructions are stack pointer (SP) related load / store instructions; and The ABDU has the current value of the SP register (rSP).
4. The method of claim 1, wherein: The AAGEN level is configured to calculate the effective address of the load / store instruction using the plurality of effective address inputs of the load / store instruction.
5. The method of claim 1, wherein: The processor also includes: The loading / storage unit; A first circuit path, communicatively coupling the ABDU and the load / store unit, and including the AAGEN level; and A second circuit path communicatively couples the ABDU and the load / store unit, and bypasses the AAGEN stage. in: Routing the load / store instruction to the AAGEN level includes routing the load / store instruction via the first circuit path; and Routing the load / store instructions to bypass the AAGEN level includes routing the load / store instructions via the second circuit path.
6. The method of claim 5, wherein: Routing the load / store instruction via the second circuit path includes asserting a bypass qualification flag corresponding to the load / store instruction; and The load / store unit is configured as follows: Process the load / store instruction whose corresponding bypass qualification flag is asserted; and Discard the load / store instruction whose corresponding bypass qualification flag has been cleared.
7. A processor, the processor comprising: Address generation (AGEN) level; as well as The AAGEN bypass determination unit (ABDU) is located prior to the address generation stage in the pipeline, and the ABDU is configured to: Receive load / store instructions; Parse the load / store instruction and determine whether the ABDU has a current value for each of a plurality of valid address inputs used to calculate a valid address of the load / store instruction; If the valid address of the load / store instruction is not known at the ABDU, the load / store instruction is routed to the AAGEN level. The valid address of the load / store instruction is not known at the ABDU when at least one of the plurality of valid address inputs of the load / store instruction is not known at the ABDU. as well as If the effective address of the load / store instruction is known at the ABDU, the load / store instruction is routed to the load / store unit to bypass the AAGEN level, wherein the effective address of the load / store instruction is known at the ABDU when the current value of each of the plurality of effective address inputs used to calculate the effective address of the load / store instruction is known at the ABDU.
8. The processor of claim 7, wherein when the load / store instruction is a program counter (PC)-dependent load / store instruction or a shift-only load / store instruction, the effective address of the load / store instruction is known at the ABDU.
9. The processor of claim 7, wherein the effective address of the load / store instruction is known at the ABDU in the following cases: The load / store instructions are stack pointer (SP) related load / store instructions; and The ABDU has the current value of the SP register (rSP).
10. The processor of claim 7, wherein: The AAGEN level is configured to calculate the effective address of the load / store instruction using the plurality of effective address inputs of the load / store instruction.
11. The processor of claim 7, further comprising: The loading / storage unit; A first circuit path communicatively couples the ABDU and the load / store unit, and includes the AGEN level; as well as A second circuit path communicatively couples the ABDU and the load / store unit, and bypasses the AAGEN stage. The ABDU is configured as follows: The load / store instruction is routed to the AGEN level via the first circuit path; and The load / store instructions are routed via the second circuit path to bypass the AGEN level.
12. The processor of claim 11, wherein: The ABDU is configured to assert a bypass qualification flag corresponding to the load / store instruction when the load / store instruction is routed via the second circuit path; and The load / store unit is configured as follows: Process the load / store instruction whose corresponding bypass qualification flag is asserted; and Discard the load / store instruction whose corresponding bypass qualification flag has been cleared.
13. The processor of claim 11, further configured to: Each clock cycle routes each of the first integer number of load / store instructions via the first circuit path or the second circuit path; and Assume a bypass qualifying flag for each load / store instruction routed via the second circuit path. The load / store unit is configured as follows: Process the load / store instruction whose corresponding bypass qualification flag is asserted; and Discard the load / store instruction whose corresponding bypass qualification flag has been cleared.