Backward compatibility through changes in instruction execution latency

By employing BC mode to limit resources, disable functions, and adjust latency and algorithms, the new CPU matches legacy performance, resolving backward compatibility issues and reducing errors in legacy applications.

JP7874140B2Active Publication Date: 2026-06-15SONY INTERACTIVE ENTERTAINMENT LLC

Patent Information

Authority / Receiving Office
JP · JP
Patent Type
Patents
Current Assignee / Owner
SONY INTERACTIVE ENTERTAINMENT LLC
Filing Date
2024-09-26
Publication Date
2026-06-15

AI Technical Summary

Technical Problem

Modern computer systems face backward compatibility issues when new devices with different performance characteristics from legacy devices run legacy applications, leading to errors due to differences in CPU performance and execution speed.

Method used

Implementing backward compatibility mode (BC mode) by limiting CPU resources, disabling certain functions, modifying instruction execution latency, and altering algorithmic details to match legacy CPU behavior, thereby reducing errors in legacy applications.

🎯Benefits of technology

The new CPU in BC mode closely approximates the performance of legacy CPUs, minimizing errors and ensuring smooth operation of legacy applications.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure 0007874140000001
    Figure 0007874140000001
  • Figure 0007874140000002
    Figure 0007874140000002
  • Figure 0007874140000003
    Figure 0007874140000003
Patent Text Reader

Abstract

To provide a method and a system for achieving backward compatibility of a new CPU with a legacy CPU.SOLUTION: A new device executing an application on a new CPU determines whether the application is for a legacy device having a legacy CPU. When the application is for the legacy device, the new CPU executes the application by disabling a selected feature of the new CPU that is not present on the legacy CPU, or altering a latency of instruction execution of the new CPU so as to match or approximate a latency of the legacy CPU, or altering algorithmic details of operation of one or more units of the new CPU so as to match or approximate algorithmic details of operation of corresponding units of the legacy CPU.SELECTED DRAWING: Figure 2
Need to check novelty before this filing date? Find Prior Art

Description

【Technical Field】 【0001】 This application claims priority to U.S. Provisional Patent Application No. 14 / 810,334, filed Jul. 27, 2015, by the same applicant, the entire content of which is incorporated herein by reference. 【0002】 Aspects of the present disclosure relate to the execution of computer applications on a computer system. In particular, aspects of the present disclosure relate to a system or method that provides backward compatibility for applications / titles designed for older versions of a computer system. 【Background Art】 【0003】 Modern computer systems often use multiple different processors for various computing tasks. For example, a modern computer may have, in addition to multiple central processing units (CPUs), a graphics processing unit (GPU) dedicated to certain computational tasks in a graphics pipeline, or a unit dedicated to acoustic digital signal processing, all of which may be part of an accelerated processing unit (APU) that may also include other units. These processors are connected to various types of memory using buses that may be internal to the APU or externally located on the computer's motherboard. 【0004】 An application set is typically created for a computer system such as a gaming console or smartphone (a "legacy device"), and when a modified or further advanced version of the computer system is released (a "new device"), it is preferable for the applications of the legacy device to operate perfectly on the new device without recompilation or any changes taking into account the properties of the new device. Such aspects of new devices, including their hardware architecture, firmware, and operating system, are often referred to as "backward compatibility." 【Summary of the Invention】 [Problems that the invention aims to solve] 【0005】 Backward compatibility is often achieved through binary compatibility, which allows new devices to run programs written for legacy devices. However, when the real-time behavior of a device category is critical to its operation, as is the case with game consoles or smartphones, a significant difference in the operating speed of the new device may prevent it from being backward compatible with the legacy device. Problems hindering backward compatibility arise when the new device is less powerful than the legacy device, and this is also true when the new device is more powerful or has different performance characteristics compared to the legacy device. 【0006】 From this perspective, the nature of this disclosure arises. [Means for solving the problem] 【0007】 The teachings in this disclosure can be easily understood by considering the following detailed description in conjunction with the attached drawings. [Brief explanation of the drawing] 【0008】 [Figure 1] This is a block diagram showing an embodiment of a central processing unit (CPU) core that may be configured to operate in backward-compatible mode according to an aspect of the present disclosure. [Figure 2] This flowchart shows an embodiment of a process flow that allows the CPU to operate in backward compatibility mode according to an aspect of this disclosure. [Figure 3] This is a block diagram of a device having a CPU configured to operate in backward compatibility mode according to an aspect of the present disclosure. [Modes for carrying out the invention] 【0009】 The following detailed description includes numerous specific details for illustrative purposes, but any person skilled in the art will understand that numerous variations and modifications of the following details are within the scope of the present invention. Accordingly, the exemplary embodiments of the invention described below are stated without impairing the universality of the claimed invention and without imposing limitations on the claimed invention. introduction 【0010】 [Introduction] Even if the CPU of the new device is binary compatible with the legacy device (i.e., it can run programs written for the legacy device), differences in the performance characteristics of the CPU of the new device and the legacy device will cause errors in the legacy application, resulting in the new device becoming backward compatible. 【0011】 If the CPU of a new device is less powerful than the CPU of a legacy device, it may be impossible to meet real-time deadlines imposed by display timing or audio stream output, resulting in numerous errors in legacy applications. If the CPU of a new device is significantly more powerful than the CPU of a legacy device, it may also result in numerous errors in legacy applications because they have never attempted such high-speed operation before. For example, in a producer-consumer model, if a data consumer (e.g., a CPU) operates faster than expected, the data consumer may attempt to access data before the data producer (e.g., another component of the computer) makes data access possible. Alternatively, if a data producer (e.g., a CPU) operates faster than expected, the data producer may overwrite data that the data consumer (e.g., another component of the computer) is still using. 【0012】 Furthermore, since the code execution speed of a CPU depends on the characteristics of the specific code being executed, the degree of performance improvement of a new device's CPU compared to a legacy device may depend on the specific code being executed. This can lead to a problem in the aforementioned producer-consumer model, where both the producer and the consumer, despite being CPUs, execute legacy application code at relative speeds they have never been able to handle on legacy hardware. 【0013】 [Examples] Aspects of this disclosure describe computer systems and methods that can enable greater backward compatibility with respect to legacy computer systems. 【0014】 In embodiments of this disclosure, when running in backward compatibility mode ("BC mode"), certain CPU resources are limited and various aspects of CPU operation are modified. 【0015】 Due to various resource limitations, the performance of a CPU in BC mode approaches that of a legacy CPU, resulting in fewer errors in legacy applications caused by unexpected CPU performance characteristics. 【0016】 In addition, certain CPU functions not present in legacy CPUs are disabled in BC mode, the instruction execution latency of the CPU is modified in BC mode to be equal to or approach the latency of the legacy CPU, and the algorithmic details of the operation of various CPU units may be modified in BC mode to match or approach the algorithmic details of the operation of these units in legacy CPUs. As a result, the performance of the new CPU in BC mode comes very close to that of the legacy CPU, and consequently, there are fewer errors in legacy applications due to unexpected performance characteristics of the new CPU. 【0017】 The following describes various aspects of this disclosure relating to the general-purpose architecture of the CPU, as well as the limitations on specific resources, the disabling of functions, the modification of latency, and the modification of the operational algorithm details in BC mode. 【0018】 Figure 1 depicts the general-purpose architecture of the CPU core 100. The CPU core 100 typically includes a branch prediction unit 102 that attempts to predict whether a branch will occur and (in the event that a branch occurs) attempts to predict the destination address of the branch. High-accuracy branch prediction is highly desirable because the more accurate these predictions are, the more efficient the speculative execution of the code becomes. The branch prediction unit 102 may include highly specialized subunits such as a return address stack 104 that tracks the return address from a subroutine, an indirect target array 106 that tracks the destination of an indirect branch, and a branch target buffer 108 and its associated prediction logic that tracks the past history of branches to more accurately predict the resulting address of a branch. 【0019】 According to certain aspects of this disclosure, in BC mode, the size of the new CPU's indirect target array 106, the size of the return address stack 104, or the size of the branch target buffer 108 can be reduced to match, or more closely approximate, the respective sizes in the legacy CPU. To clarify, this reduction takes the form of reducing the usable portion of resources, for example by not allowing the use of a portion of the return address stack, thereby reducing the number of traceable calls and associated returns, with all resources becoming available when BC mode is no longer active. 【0020】 In certain aspects of this disclosure, in BC mode, the algorithmic details of the operation of the branch target buffer 108 of the new CPU, and its associated prediction logic, may be modified to match those of the legacy CPU. For example, not as an limitation, if the legacy CPU has limitations in its ability to track the behavior of branch instructions that are close to each other, the new CPU in BC mode may be adapted to the behavior of the legacy CPU. Alternatively, if the legacy CPU uses a substantially different form of branch prediction logic (e.g., a saturated counter instead of an adaptive predictor), the new CPU may include the logic of the legacy CPU and enable it in BC mode. 【0021】 According to certain aspects of the present disclosure, in the BC mode, the dedicated loop predictor of the new CPU can be disabled because the legacy CPU does not have a dedicated loop predictor even if the branch target buffer 108 of the new CPU and its related prediction logic include a dedicated loop predictor. 【0022】 The CPU core 100 typically includes an instruction fetch and decode unit 110 that includes an instruction fetch unit 112, an instruction byte buffer 114, and an instruction decode unit 116. The CPU core 100 also typically includes a number of instruction-related caches and an instruction translation lookaside buffer (ITLB) 120. These may include an ITLB cache hierarchy 124 that caches virtual addresses to physical address translation information such as page table entries and page directory entries. This information is used to translate the virtual address of an instruction to a physical address, thereby enabling the instruction fetch unit 112 to load instructions from the cache hierarchy. By way of example and not limitation, program instructions may be cached according to a cache hierarchy that includes a level 1 instruction cache (L1I-cache) 122 that exists within the core, as well as other cache levels 176 that exist outside the CPU core 100, and these caches are first searched for program instructions using the physical address of the instruction. If the instruction is not found, the instruction is loaded from the system memory 101. Depending on the architecture, as described below, there may also be a micro-op cache 126 that includes the decoded instructions. 【0023】 In certain aspects of the present disclosure, the size or associativity of the L1I-cache 124, micro-op cache 126, or various levels of the ITLB cache hierarchy 122 during BC mode can be changed to match or approximate their respective sizes and associativities in the legacy CPU. By way of example and not limitation, changing the size of the ITLB cache hierarchy 124, e.g., reducing it, can involve (1) reducing the number of levels, or (2) changing the size of one or more levels (e.g., cache size, block size, number of blocks in a set). Changing the associativity of a cache can involve, for example, operating a fully associative cache as a 4-way or 2-way cache. Aspects of the present disclosure include embodiments in which the size or associativity of an instruction-related cache or ITLB is reduced, but the present disclosure is not limited to such embodiments. For example, a legacy CPU can also have a larger cache with a lower associativity (e.g., 2-way instead of 4-way) within the legacy CPU. In such a case, the new CPU can operate in BC mode with an extended corresponding cache size and reduced associativity to match or approximate the behavior of the cache on the legacy CPU. 【0024】 Once program instructions are fetched, they are typically placed in the instruction byte buffer 114 and await processing by the instruction fetch and decode unit 110. Decoding can be a very complex process, and since it is difficult to decode multiple instructions per cycle, there can be limitations on instruction alignment or types of instructions that limit the number of instructions that can be decoded during a cycle. Depending on the architecture, the decoded instructions can be placed in the micro-op cache 126 (if one exists on the new CPU), so that the decode stage can be avoided during subsequent utilization of the program instructions. 【0025】 In certain aspects of this disclosure, the algorithmic details of the operation of the instruction fetch and decode unit 110 of the new CPU in BC mode may be modified to match those of the legacy CPU. For example, if the legacy CPU restricts instruction decoding by opcode within a specific area of ​​the instruction byte buffer 114, the new CPU may similarly restrict decoding. 【0026】 In certain embodiments of this disclosure, if the microoperation cache 126 is present on the new CPU but not on the legacy CPU, the microoperation cache 126 of the new CPU may be disabled in BC mode. 【0027】 Decoded instructions are typically passed to other units for dispatch and scheduling 130. These units may use a retirement queue 132 to track the status of instructions through the remainder of the CPU pipeline. Register renaming may also occur because a limited number of general-purpose and SIMD registers are available on many CPU architectures. Since logical registers (also known as architecture registers) occur within the stream of instructions being executed, physical registers 140 are allocated to represent them during register renaming. Physical registers 140 include single-instruction multiplexed data (SIMD) register banks 142 and general-purpose (GP) register banks 144, which can be much larger in size than the logical registers available on a particular CPU architecture, potentially resulting in significant performance improvements. After register renaming 134, instructions are typically placed in a scheduling queue 136, where a number of instructions may be selected to be executed by execution units 150 on a cycle basis (based on dependencies). 【0028】 In certain embodiments of this disclosure, the size of the CPU retirement queue 132, the scheduling queue 136, or the SIMD register bank 142 or GP register bank 144 in BC mode may be reduced to match, or more closely approximate, the respective sizes in a legacy CPU. To clarify, this reduction takes the form of reducing the available portion of resources, for example, by limiting the number of physical registers available to applications in BC mode, so that all register banks become available to applications when BC mode is no longer active. 【0029】 The execution unit 150 typically includes a SIMD pipe 152 that performs numerous parallel operations on multiple data fields contained in SIMD registers of 128 bits or more contained in the SIMD register bank 142, a logic operation unit (ALU) 154 that performs numerous logic, arithmetic, and miscellaneous operations on GPRs contained in the GP register bank 144, and an address generation unit (AGU) 156 that calculates the address to which memory should be stored or loaded. Multiple instances of each type of execution unit may exist, and instances may have different capabilities; for example, a particular SIMD pipe 152 may be capable of performing floating-point multiplication but not floating-point addition. 【0030】 In certain embodiments of this disclosure, the number of available ALUs, AGUs, or SIMD pipes in BC mode may be reduced to match, or more closely approximate, the number of such units present on a legacy CPU. 【0031】 In certain embodiments of this disclosure, the instruction execution latency of the new CPU in BC mode may be modified to be equal to or close to the latency of the legacy CPU. For example, the latency of a division operation on the new CPU in BC mode may be extended to match or more closely approximate the latency of a division operation on the legacy CPU (for example, by calculating the result more slowly or by delaying the transfer of the result to subsequent stages of the pipeline). 【0032】 Store and load operations are typically buffered in store queue 162 and load queue 164, allowing numerous memory operations to be executed in parallel. To support memory operations, the CPU core 100 typically includes numerous data-related caches and data translation index buffers (DTLBs) 170. The DTLB cache hierarchy 172 caches virtual addresses for physical address translation, such as page table entries and page directory entries. This information is used to translate virtual addresses to physical addresses for memory operations, enabling the storage or loading of data from system memory. Data is typically cached in a level 1 data cache (L1D cache) 174 located within the core, as well as in other cache levels 176 located outside the core 100. 【0033】 In certain embodiments of this disclosure, the size and associativeness of the L1D cache 174, or the various levels of DTLB cache tiers 172, in BC mode may be reduced to match, or more closely approximate, the respective sizes and associativeness in a legacy CPU. In certain embodiments of this disclosure, the size of the CPU's store queue 162 or load queue 164 (e.g., the number of acceptable unsupported store or load operations) in BC mode may be reduced to match, or more closely approximate, the respective sizes in a legacy CPU. 【0034】 Figure 2 is a flowchart illustrating an embodiment of a possible process flow of the method according to an aspect of the present disclosure. The method begins at 201, for example, by loading an application into a system having a new CPU. As shown at 210, a determination is made as to whether the application was designed for the new CPU or for a previous version of the system, by checking a software ID, software checksum, metadata associated with the software, media type, or other mechanism. Such a determination may be made in the software running on the system or in the system's hardware. Once it is determined that the loaded application targets the new CPU, the system may operate normally, as shown at 220. For example, the CPU may operate normally without limiting available resources, disabling functions, changing instruction execution latency, or changing algorithmic details to match or approximate the behavior of the legacy CPU. 【0035】 When it is determined that a loaded application targets a legacy CPU, the CPU operates in BC mode by restricting selected available resources (242), disabling selected features that do not exist on the legacy CPU (244), modifying instruction execution latency (246), or modifying algorithm details (248), or a combination of two or more of these, in order to match or approximate the behavior of the legacy CPU. Examples of these possibilities are described above. 【0036】 As an example, not an limitation, to operate the CPU by limiting the selected resources (242), BC mode may be implemented by a suitable configuration of the new CPU hardware, a suitable configuration of the operating system running the CPU, or a combination of both. For example, as mentioned above, in BC mode, the size of the CPU's indirect target array 106, the size of the return address stack 104, or the size of the branch target buffer 108 may be reduced to match or more closely match the respective sizes in the legacy CPU. As an example, not an limitation, the relevant hardware may be configured so that the operating system or CPU firmware in BC mode reduces the size of the indirect target array 106 to match or more closely match the respective sizes in the legacy CPU. The pseudocode below illustrates an example of this implementation. void function BC_mode_indirect_target_array_size if BC_Mode is true { set indirect_target_array_size to reduced_indirect_target_array_size } 【0037】 The size of the return address stack 104, or the size of the branch target buffer 108, or other available resources can also be reduced in a similar manner. 【0038】 Similarly, to disable selected functions and operate the CPU (244), certain hardware resources that do not exist on the legacy CPU but do exist on the new CPU (e.g., microoperation cache 126) may be configured to be disabled by the operating system or CPU firmware in BC mode. Alternatively, hardware resources that do not exist on the legacy CPU but do exist on the new CPU may be configured to be ignored by the application in BC mode. 【0039】 As an example, and not as an limitation, the hardware of the execution unit 150 may be configured to modify the instruction execution latency of the new CPU to match or approximate the latency of the legacy CPU (246) in order to operate the CPU by adding equivalents of “non-operational” instructions in BC mode to obtain the desired latency in BC mode. 【0040】 As an example, not an limitation, the new CPU is operated by modifying the algorithmic details of the operation of one or more units of the new CPU (248). As an example, not an limitation, the algorithmic details of the operation of the branch prediction unit 102 may be modified in BC mode. For example, if the ability to track the behavior of branch instructions that are close to each other is limited in the legacy CPU, as described above, the branch prediction unit 102 in BC mode may be configured to match the behavior of this legacy CPU in BC mode. Alternatively, if the legacy CPU uses a substantially different type of branch prediction logic (e.g., a saturation counter instead of an adaptive predictor), the branch prediction unit 102 of the new CPU may include the logic of the legacy CPU and enable it in BC mode. In another embodiment, the algorithmic details of the operation of the instruction fetch and decode unit 110, dispatch and scheduling unit 130, or execution unit 150 of the new CPU may be configured similarly to the legacy logic and enable it in BC mode. 【0041】 Referring to Figure 3, a descriptive embodiment of System 300 configured to operate according to the embodiments of this disclosure is depicted. According to embodiments of this disclosure, System 300 may be an embedded system, a mobile phone, a personal computer, a tablet computer, a portable game device, a workstation, or a game console. 【0042】 The system 300 generally comprises a central processing unit (CPU) 320 which may include the type of CPU cores and other functions of the type depicted in Figure 1 and described above. As an example, but not an limitation, the CPU 320 may be part of an acceleration unit (APU) 310, which includes the CPU 320 and a graphics processing unit (GPU) 330 on a single chip. In an alternative embodiment, the CPU 320 and GPU 330 may be implemented as separate hardware components on separate chips. 【0043】 System 300 may also include memory 340. Memory 340 may optionally include a primary memory unit accessible by the CPU 320 and GPU 330. The CPU 320 and GPU 330 may each include one or more processor cores, for example, one core, two cores, four cores, eight cores, or more than eight cores. The CPU 320 and GPU 330 are configured to access one or more memory units using a data bus 390, and in some embodiments, it may be convenient for System 300 to have two or more different buses. 【0044】 Memory 340 may include one or more memory units in the form of integrated circuits that provide addressable memory, such as RAM and DRAM. The memory includes executable instructions configured to perform the method shown in Figure 2 when determining whether to operate CPU 320 in BC mode when running an application originally created to run on a legacy CPU. In addition, memory 340 may include dedicated graphics memory for temporarily storing graphics resources, graphics buffers, or other graphics data in the graphics rendering pipeline. 【0045】 The CPU 320 may be configured to execute CPU code that may include an operating system (OS) 321 or an application 322 (e.g., a video game). The OS 321 may be configured to perform certain functions that cause the CPU 320 to operate in BC mode, as described above. The CPU code may include a graphics application programming interface (API) 324 that issues drawing commands or drawing calls to a program executed by the GPU 330, based on the state of the application 322. The CPU code may also perform physical simulations and other functions. Parts of the code for one or more of the OS 321, application 322, or API 324 may be stored in memory 340, an internal or external CPU cache, or a mass storage device accessible by the CPU 320. 【0046】 System 300 may also include well-known support functions 350 that can communicate with other components of the system, for example, via a bus 390. Such support functions may, but not limited to, include input / output (I / O) elements 352, one or more clocks 356 that may include separate clocks for the CPU and GPU, respectively, and one or more levels of cache 358 that may exist outside the CPU 320. System 300 may optionally include mass storage devices 360, such as disk drives, CD-ROM drives, flash memory, tape drives, or Blu-ray drives, for storing programs and / or data. In one embodiment, the mass storage device 360 ​​may receive computer-readable media 362 containing legacy applications originally designed to run on systems with legacy CPUs. Alternatively, the legacy application 362 (or a portion thereof) may be stored in memory 340, or partially stored in cache 358. 【0047】 Device 300 may also include a display unit 380 that presents rendered graphics 382 prepared by the GPU 330 to the user. Device 300 may also include a user interface unit 370 that facilitates interaction between the system 100 and the user. The display unit 380 may be in the form of a flat panel display, a cathode ray tube (CRT) screen, a touchscreen, a head-mounted display (HMD), or other device capable of displaying text, numbers, graphic symbols, or images. The display 380 may display rendered graphics 382 processed by various techniques described herein. The user interface 370 may include one or more peripheral devices such as a keyboard, mouse, joystick, light pen, game controller, touchscreen, and / or other devices that can be used in conjunction with a graphical user interface (GUI). In certain embodiments, for example, if application 322 includes a video game or other graphics-intensive application, the state and underlying graphic content of application 322 may be identified at least partially by user input through the user interface 370. 【0048】 System 300 may also include a network interface 372 that enables the device to communicate with other devices over a network. The network may be, for example, a local area network (LAN), a wide area network such as the Internet, a personal area network such as a Bluetooth® network, or other types of networks. Various of the illustrated and described components may be implemented in hardware, software, or firmware, or any combination of two or more of these. 【0049】 According to aspects of this disclosure, the CPU 320 may include hardware components such as the CPU core 100 components of Figure 1 that can operate in BC mode to match or approximate the behavior of a legacy CPU by limiting selected available resources (242), disabling selected functions that do not exist on a legacy CPU (244), changing instruction execution latency (246), or changing algorithm details (248), or a combination of two or more of these, as described above with respect to Figure 2. 【0050】 Aspects of this disclosure overcome backward compatibility issues that arise when programs written for legacy systems run on more powerful new systems. By running the new CPU in BC mode, with limitations on selected available resources, with disabled selected features not present on the legacy CPU, with altered instruction execution latency, or with altered algorithmic details, or a combination of two or more of these, the new CPU can be made to match or approximate the behavior of the legacy CPU. 【0051】 The above is a complete description of preferred embodiments of the present invention, but various substitutes, modifications, and equivalents are also possible. Accordingly, the scope of the present invention should not be defined by reference to the foregoing description, but rather by reference to the entire scope of the appended claims and their equivalents together. Any feature described herein, whether preferred or not, may be combined with any other feature described herein, whether preferred or not. In the following claims, the indefinite article "A" or "An" refers to the quantity of one or more nouns that follow the article, unless otherwise specified. In the enumeration of alternative elements, as used herein, the term "or" is used in a comprehensive sense unless otherwise specified, for example, "X or Y" refers to X only, Y only, or both X and Y. Two or more elements listed as substitutes may be combined together. Unless a means-plus-function limitation is explicitly detailed in a given claim using the expression "means for," the appended claims should not be construed as including a means-plus-function limitation.

Claims

[Claim 1] A new device running the application on the new CPU determines whether the application is for a legacy device with a legacy CPU, If the new device determines that the application is for the legacy device, it modifies the cache associativity on the new CPU to match or approximate the cache associativity on the legacy CPU, and then executes the application on the new CPU. By modifying the detailed algorithm of the operation of the instruction decoding unit of the new CPU to match or approximate the detailed algorithm of the operation of the instruction decoding unit of the legacy CPU, the detailed algorithm of the operation of one or more units of the new CPU is modified to match or approximate the detailed algorithm of the operation of the corresponding unit of the legacy CPU, thereby executing the application. A method that includes this. [Claim 2] The method according to claim 1, wherein running the application on the new CPU involves changing the degree of associativity of the cache on the new CPU to match or approximate the degree of associativity of the cache on the legacy CPU, and further involves changing the degree of associativity of the L1 cache on the new CPU to match or approximate the degree of associativity of the L1 cache on the legacy CPU, and then running the application on the new CPU. [Claim 3] The method according to claim 1, wherein running the application on the new CPU involves changing the degree of associativity of the cache on the new CPU to match or approximate the degree of associativity of the cache on the legacy CPU, and further involves changing the degree of associativity of the microoperation cache on the new CPU to match or approximate the degree of associativity of the microoperation cache on the legacy CPU, and then running the application on the new CPU. [Claim 4] The method according to claim 1, wherein running the application on the new CPU involves modifying the degree of associativity of the cache on the new CPU to match or approximate the degree of associativity of the cache on the legacy CPU, and further involves modifying the degree of associativity of the ITLB cache hierarchy on the new CPU to match or approximate the degree of associativity of the ITLB cache hierarchy on the legacy CPU, and then running the application on the new CPU. [Claim 5] The method according to claim 4, wherein running the application on the new CPU involves modifying the degree of associativity of the ITLB cache hierarchy on the new CPU to match or approximate the degree of associativity of the ITLB cache hierarchy on the legacy CPU, thereby reducing the configuration of the ITLB cache hierarchy on the new CPU. [Claim 6] The method according to claim 5, wherein reducing the configuration of the ITLB cache hierarchy on the new CPU includes reducing the number of levels in the ITLB cache hierarchy on the new CPU. [Claim 7] The method according to claim 5, wherein reducing the configuration of the ITLB cache hierarchy on the new CPU includes changing the cache parameters at one or more levels of the ITLB cache hierarchy on the new CPU. [Claim 8] The method according to claim 7, wherein changing cache parameters at one or more levels of the ITLB cache hierarchy on the new CPU includes changing the cache size, block size, and number of blocks in a set of the ITLB cache hierarchy on the new CPU. [Claim 9] The method according to claim 1, wherein running the application on the new CPU involves changing the degree of associativity of the cache on the new CPU to match or approximate the degree of associativity of the cache on the legacy CPU, and thereby operating the fully associative cache on the new CPU as a four-way cache. [Claim 10] The method according to claim 1, wherein running the application on the new CPU involves changing the degree of associativity of the cache on the new CPU to match or approximate the degree of associativity of the cache on the legacy CPU, and thereby operating the fully associative cache on the new CPU as a two-way cache. [Claim 11] The cache on the legacy CPU is larger and less associative than the cache on the new CPU. The method according to claim 1, wherein running the application on the new CPU involves changing the degree of associativity of the cache on the new CPU to match or approximate the degree of associativity of the cache on the legacy CPU, and further comprising reducing the cache size of the cache on the new CPU and reducing the degree of associativity of the cache on the new CPU to match or approximate the behavior of the cache on the legacy CPU. [Claim 12] A system comprising a new central processing unit (CPU) configured to execute instructions for an application, wherein the new CPU has a logical unit, The new CPU is configured to determine whether the application is for a legacy device having a legacy CPU, and if it determines that the application is for a legacy device, to change the degree of associativeness of the cache on the new CPU to match or approximate the degree of associativeness of the cache on the legacy CPU, and then execute the application on the new CPU. The new CPU is configured to execute the application by modifying the algorithm details of the operation of one or more units of the new CPU to match or approximate the algorithm details of the operation of the corresponding units of the legacy CPU, by modifying the algorithm details of the operation of the instruction decoding unit of the new CPU to match or approximate the algorithm details of the operation of the instruction decoding unit of the legacy CPU. system. [Claim 13] The system according to claim 12, wherein the new CPU is configured to change the degree of associativity of the L1 cache on the new CPU to match or approximate the degree of associativity of the L1 cache on the legacy CPU, and execute the application on the new CPU, thereby changing the degree of associativity of the cache on the new CPU to match or approximate the degree of associativity of the cache on the legacy CPU, and execute the application on the new CPU. [Claim 14] The system according to claim 12, wherein the new CPU is configured to change the degree of associativity of the microoperation cache on the new CPU to match or approximate the degree of associativity of the microoperation cache on the legacy CPU, and to run the application on the new CPU, thereby changing the degree of associativity of the cache on the new CPU to match or approximate the degree of associativity of the cache on the legacy CPU, and to run the application on the new CPU. [Claim 15] The system according to claim 12, wherein the new CPU is configured to run the application on the new CPU, thereby changing the degree of associativity of the ITLB cache hierarchy on the new CPU to match or approximate the degree of associativity of the ITLB cache hierarchy on the legacy CPU, thereby changing the degree of associativity of the cache on the new CPU to match or approximate the degree of associativity of the cache on the legacy CPU, and running the application on the new CPU. [Claim 16] The system according to claim 15, wherein the new CPU is configured to run the application on the new CPU by reducing the configuration of the ITLB cache hierarchy on the new CPU, thereby changing the degree of associativity of the ITLB cache hierarchy on the new CPU to match or approximate the degree of associativity of the ITLB cache hierarchy on the legacy CPU. [Claim 17] The system according to claim 16, wherein reducing the configuration of the ITLB cache hierarchy on the new CPU includes reducing the number of levels in the ITLB cache hierarchy on the new CPU. [Claim 18] The system according to claim 16, wherein reducing the configuration of the ITLB cache hierarchy on the new CPU includes changing the cache parameters at one or more levels of the ITLB cache hierarchy on the new CPU. [Claim 19] The system according to claim 18, wherein changing cache parameters at one or more levels of the ITLB cache hierarchy on the new CPU includes changing the cache size, block size, and number of blocks in a set of the ITLB cache hierarchy on the new CPU. [Claim 20] The system according to claim 12, wherein the new CPU is configured to run the application on the new CPU by operating the fully associative cache on the new CPU as a four-way cache, thereby changing the degree of associativity of the cache on the new CPU to match or approximate the degree of associativity of the cache on the legacy CPU. [Claim 21] The system according to claim 12, wherein the new CPU is configured to run the application on the new CPU by operating the fully associative cache on the new CPU as a two-way cache, thereby changing the degree of associativity of the cache on the new CPU to match or approximate the degree of associativity of the cache on the legacy CPU. [Claim 22] The cache on the legacy CPU is larger and less associative than the cache on the new CPU. The system according to claim 12, wherein the new CPU is configured to run the application on the new CPU by reducing the cache size of the cache on the new CPU and reducing the degree of associativity of the cache on the new CPU to match or approximate the behavior of the cache on the legacy CPU.