Methods for generating code for an architecture encoding an extended register specification

a technology of extended registers and methods, applied in the computer field, can solve the problems of increasing latencies, increasing latency, and increasing the variety of latency, and achieve the effects of increasing the number of latency and increasing the latencies

Inactive Publication Date: 2008-09-04
INT BUSINESS MASCH CORP
View PDF30 Cites 33 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

In modern microprocessors, increases in latencies have been an increasingly severe problem.
Thus, in terms of processor cycles to access a location in memory, latency has increased significantly.
That is, as CMOS scaling is applied ever more aggressively, wire speeds do not scale at the same rate as logic speeds, leading to a variety of latency increases, e.g., increasing the time required to complete operations by requiring longer time to write hack their results.
However, the number of registers specified in architectures has not increased since the introduction of RISC computing (when the size of register files was increased from the then customary 8 or 16 registers to 32 registers) until recently.
However, to exploit these larger register files, complex (and area intensive) renaming logic and out-of-order issue capabilities are required.
Even then, the inability to express the best schedule for the program using a compiler or a skillfully tuned Basic Linear Algebra Subprogram (BLAS) or other such library limits the overall performance potential.
While register renaming does allow an increase in the number of registers, register renaming is a complex task that requires additional steps in the instruction processing of microprocessors.
While this resolves the issue of instruction encoding space, it leads to inefficient encoding due to a reduction of code density because an instruction word disadvantageously occupies more than a single machine word, thereby reducing the number of instructions which can be stored in a given memory unit.
Legacy architectures, on the other hand, are not without deficiency.
For example, since many bit combinations have been assigned a meaning in legacy architectures, and certain bit fields have been set aside to signify specific architectural information (such as extended opcodes, register fields, and so forth) legacy architectures offer significant obstacles to encoding new information.
Specifically, when allocating new instructions, the specification for these new instructions cannot arbitrarily allocate new fields without complicating the decoding of both the pre-existing and these new instructions.
Additionally, the number of bits in instruction sets with fixed instruction word width limits the number of different instructions that can be encoded.
This encoding limitation is causing increasing problems as instruction sets are extended.
However, it is difficult or impossible to specify additional registers in the standard 32-bit RISC instruction encoding.
Among the problems associated with variable length CISC encoding is the additional complexity it requires in the instruction decode, resulting in additional decode pipeline stages in the machine or a reduced frequency.
Moreover, another problem with variable length CISC encoding is that it allows instructions to span natural memory boundaries (e.g., cache line and page boundaries), complicating instruction fetch and virtual address translation.
Another problem with variable length CISC encoding is that such a CISC approach cannot be compatibly retrofitted to a RISC architecture.
Further, no mechanisms are defined to address the issue of page-spanning instructions, and so forth.
However, if all instructions are 64-bits, approximately twice as much memory space as is currently used would be required to hold instructions (which would disadvantageously affect elements like an instruction cache).
In addition, this is incompatible with existing RISC code with 32-bit instructions.
This provides a simplification of some aspects, e.g., an implementation can avoid the issues associated with bundles crossing natural memory boundaries, but does not address the other significant drawbacks.
However, it “wastes” bits specifying the interaction between instructions.
The three instruction packing also forces additional complexity in the implementation to deal with three instructions at once.
Finally, this three instruction packing format has no requirement to be compatible with existing 32-bit instruction sets, and there is no obvious mechanism to achieve compatibility with (legacy) 32-bit RISC encodings.
This embodiment is undesirable for fixed instruction width RISC processors, as extension bytes are generally incapable of being accommodated in the instruction stream of a fixed width instruction set architecture.
This embodiment is undesirable for fixed instruction width RISC processors, as extension bytes cannot readily be accommodated in the instruction stream of a fixed width instruction set architecture.
Those skilled in the art will understand that the cost of decoding a prefix, determining the mode and the bank field, accompanied by fetching the instruction being modified by the prefix, incurs a significant complexity, delay and hardware inefficiency.
In particular, the decoding of the prefix and bank selector has to be performed early, leading to additional complexity.
In addition, prefixes are generally not readily employed in an architecture supporting only a fixed instruction word width.
Using the segment selector as a bank selector for all operands of a given instruction is undesirable because it requires access to a control register to identify a bank, and restricts all instructions to have operands coming from just a single bank, leading to inefficient register allocation.
Thus, if a common value has to be combined with other operands residing in multiple banks, copies of the common value have to be maintained, computed and updated in all banks, such that they can be combined with the other operands residing in the other banks, leading to inefficient register usage due to data duplication, and inefficient performance profile due to the duplication of work to compute the common value in all banks.
This is undesirable because it requires the access to a control register to identify a bank, and restricts all operations to have operands coming from just a single bank, leading to inefficient register allocation.
Thus, if a common value has to be combined with other operands residing in multiple banks, copies of the common value have to be maintained, computed and updated in all banks, such that they can be combined with the other operands residing in the other banks, leading to inefficient register usage due to data duplication, and inefficient performance profile due to the duplication of work to compute the common value in all banks.
Specifically, the disadvantages relate to the fact that register names can only be properly resolved after the address generation phase, as a multitude of memory address forms can refer to a memory mapped register.
This will increase the latency of access to these registers to almost the latency for first level cache access.
This limitation is particularly severe for RISC processors, which can only reference memory operands in load and store operations, imposing the additional cost of performing copies from the memory-mapped in-core registers to computationally useable operand registers.
In another disadvantageous aspect of the '646 Patent, when addresses are generated before address generation from a subset of “preferred forms”, address aliasing can occur and lead to incorrect program execution.
In yet another disadvantageous aspect of the '646 patent, when an address to such in-core register is added to a linked list, and accessed by a remote processor, this will lead to data coherence inconsistencies.
Alternatively, costly methods for accessing such registers from SMP remote nodes have to be implemented and provided.
While this extends the number of registers implemented in the processor, such an approach is not suitable for the extension of the register set useable by a single process or program.
This is limited in that only the architected set of prior art registers can be accessed at any one time, thus not making more than the number of prior art registers available at any one time.
In another disadvantageous aspect of the '625 patent, additional instructions are required in the instruction stream to update the control word.
In one non-synchronizing aspect of an implementation, multiple rename versions of the control register have to be maintained, disadvantageously leading to design complexity, and high area and power usage.
While this extends the number of registers implemented in the processor, this is not suitable for the extension of the register set useable by a single process or program.
In general, microcode has different requirements, and methods from microcode are recognized to not be applicable to architected instruction sets by those skilled in the art due to issues related to the internal representation, requirements for compatibility, decoding of instructions and detection of data and structural hazards (which are not supported in the restricted microcode programming model), as well as the need of maintaining compatible across generations of a design.
However, while this test allows the representation of constraints in an irregular architecture, it is only an approximation of colorability.
In addition, while this test allows for the representation of colorability for a wide range or architectures, the test is expensive to implement, resulting in slow compilation times.
Disadvantageously, this test cannot determine colorability in an extended register specification as set forth herein.
Disadvantageously, these tests are only an approximation and are excessively general, and hence expensive to implement.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Methods for generating code for an architecture encoding an extended register specification
  • Methods for generating code for an architecture encoding an extended register specification
  • Methods for generating code for an architecture encoding an extended register specification

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0109]The present invention is directed to methods for generating code for an architecture encoding an extended register specification.

[0110]It is to be appreciated that while the methods set forth herein are based on an exemplary extended register specification for the VMX2 instruction set and its VMX128 subset, those skilled in this and related arts will readily understand how to apply the principles taught herein to other extended register specifications, such as those targeting, but not limited to, scalar data processing.

[0111]It should be understood that the elements shown in the FIGURES may be implemented in various forms of hardware, software or combinations thereof. Preferably, these elements are implemented in software on one or more appropriately programmed general-purpose digital computers having a processor and memory and input / output interfaces.

[0112]Embodiments of the present invention can take the form of an entirely hardware embodiment, an entirely software embodimen...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

There are provided methods and computer program products for generating code for an architecture encoding an extended register specification. A method for generating code for a fixed-width instruction set includes identifying a non-contiguous register specifier. The method further includes generating a fixed-width instruction word that includes the non-contiguous register specifier.

Description

CROSS-REFERENCE TO RELATED APPLICATIONS[0001]This is a non-provisional application claiming the benefit of U.S. provisional application Ser. No. 60 / 707,572, entitled “Methods for Generating Code for an Architecture Encoding an Extended Register Specification”, filed on Aug. 12, 2005, which is incorporated by reference herein. Moreover, this application is related to a non-provisional application, Attorney Docket No. YOR920050390US2, entitled “Implementing Instruction Set Architectures with Non-Contiguous Register File Specifiers”, filed concurrently herewith, and incorporated by reference herein.BACKGROUND[0002]1. Technical Field[0003]The present invention relates generally to computers and, more particularly, to methods for generating code for an architecture encoding an extended register specification.[0004]2. Description of the Related Art[0005]In modern microprocessors, increases in latencies have been an increasingly severe problem. These increases are occurring both for operat...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(United States)
IPC IPC(8): G06F9/30
CPCG06F8/447
Inventor GSCHWIND, MICHAEL KARLMONTOYE, ROBERT KEVINOLSSON, BRETTWELLMAN, JOHN-DAVID
Owner INT BUSINESS MASCH CORP
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products