The poor
scalability of existing
superscalar processors has been of great concern to the
computer engineering community. In particular, the critical-path delays of many components in existing implementations grow quadratically with the issue width and the window size. This patent presents a novel way to reimplement these components and reduce their critical-
path delay growth. It then describes an entire processor
microarchitecture, called the Ultrascalar processor, that has better critical-
path delay growth than existing superscalars. Most of our scalable designs are based on a single circuit, a cyclic segmented parallel prefix (cspp). We observe that processor components typically operate on a wrap-around sequence of instructions, computing some associative property of that sequence. For example, to assign an ALU to the oldest requesting instruction, each instruction in the
instruction sequence must be told whether any preceding instructions are requesting an ALU. Similarly, to read an argument register, an instruction must somehow communicate with the most recent preceding instruction that wrote that register. A cspp circuit can implement such functions by computing for each instruction within a wrap-around
instruction sequence the accumulative result of applying some associative operator to all the preceding instructions. A cspp circuit has a critical path gate
delay logarithmic in the length of the
instruction sequence. Depending on its associative operation and its
layout, a cspp circuit can have a critical path wire
delay sublinear in the length of the instruction sequence.