Selectively controllable memory tag inspection

By using the Selective Controllable Memory Tag (ChkTag) mechanism, instructions and prefix bits are inserted into the compiler to optimize compiler performance. This addresses the problem of excessive overhead in existing memory security methods, enabling more flexible and efficient memory tag checking, and supporting the detection of memory security errors and the protection of data regions.

CN122197101APending Publication Date: 2026-06-12INTEL CORP

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
INTEL CORP
Filing Date
2025-12-10
Publication Date
2026-06-12

AI Technical Summary

Technical Problem

Existing memory security methods incur excessive memory and performance overhead, and cannot flexibly and selectively control tag checks, leading to unnecessary checks and performance degradation.

Method used

A selective controllable memory tag checking mechanism (ChkTag) is adopted, which optimizes the compiler by instrumenting embedded instructions and prefix bits. Explicit or implicit tag checking instructions are inserted only before potentially unsafe memory accesses, merging multiple memory accesses into a small number of tag checking instructions and reducing unnecessary checks.

🎯Benefits of technology

It enables more flexible and efficient memory tag checking, optimizes compiler performance, reduces unnecessary checks, lowers implementation complexity and silicon area waste, and supports the detection of memory security programming errors and the protection of data areas.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122197101A_ABST
    Figure CN122197101A_ABST
Patent Text Reader

Abstract

Selective controllable memory tag checking is disclosed. In an embodiment, an apparatus includes: instruction decoder circuitry for decoding a first instruction that references a memory location via a tagged pointer; and execution circuitry coupled to the instruction decoder circuitry for performing one or more memory tag checking operations in response to the first instruction. The one or more memory tag checking operations include referencing an entry location to find a first tag value, and comparing the first tag value with a second tag value provided by the tagged pointer. The entry location will be in a first sub-region of a memory region to be reserved for a tag table. The first sub-region will be in a first set of sub-regions of the memory region. The first set is used to include only sub-regions submitted to the tag store. The memory region to be reserved for the tag table is also used to include a second set of sub-regions. The second set is used to include only sub-regions not submitted to the tag store.
Need to check novelty before this filing date? Find Prior Art

Description

Background Technology

[0001] Computers and other information processing systems may store confidential, private, and secret information in their memory. Software may contain vulnerabilities that can be exploited to steal such information. Data corruption is also a risk. Hardware may also contain vulnerabilities that can be exploited and / or adversaries may physically modify the system to steal information. Therefore, memory security and protection are critical considerations in computer system architecture and design.

[0002] In an information processing system, a processor can execute software programs based on a finite set of instructions defined by the processor's instruction set architecture (ISA), which is available for execution by the processor. Instructions within the ISA can be called macro instructions, which are the opposite of micro instructions or micro-operations (uops) generated by the processor decoding macro instructions.

[0003] Existing (or non-extended) ISAs can be extended with new instructions for next-generation processors to support new features, etc., thereby creating extended ISAs that are backward compatible with existing ISAs (e.g., including instructions from existing ISAs plus new instructions). To accommodate this possibility, an existing ISA may have been defined to include one or more opcodes that are not executed by processors designed to support existing ISAs rather than extended ISAs. Within an existing ISA, these opcodes and / or their corresponding instructions may be referred to as no-op instructions or no-ops (NOPs) because no operation is executed in response to decoding these opcodes by such a processor. However, one or more NOPs can be redefined within an extended ISA as new instructions that will be executed by processors designed to support extended ISAs. Attached Figure Description

[0004] Examples according to this disclosure will be described with reference to the accompanying drawings, in which:

[0005] Figure 1A The figure illustrates an example of a processor for efficient tag checking for dynamic repeated memory access according to an embodiment.

[0006] Figure 1B Details of an example of memory tag checking instructions according to an embodiment are shown.

[0007] Figure 1C An example of a helper function for a memory tag checking instruction is shown according to an embodiment.

[0008] Figure 2AThe diagram illustrates a block diagram according to an embodiment, including an enhanced compiler for instrumenting source code with instructions to inspect memory accesses.

[0009] Figure 2B The figure illustrates an example of a pointer format according to an embodiment.

[0010] Figure 2C The figure illustrates an example of looking up a tag in a tag table according to an embodiment.

[0011] Figure 2D The figure illustrates an example tag table layout for an example based on a 4KB linear data page, according to an embodiment.

[0012] Figure 3A and Figure 3B The figure illustrates an example of a tag table base address register according to an embodiment.

[0013] Figure 3C and Figure 3D The figure illustrates an example of configurable positioning of a tag table according to an embodiment.

[0014] Figure 3E The figure illustrates an example of a register including an overall feature enable bit according to an embodiment.

[0015] Figure 3F The figure illustrates an example formula for the effect of enabling bits according to the defined features of an embodiment.

[0016] Figure 3G and Figure 3H The figure illustrates an example of an error code register according to an embodiment.

[0017] Figure 3I and Figure 3J The figure illustrates an example of an instrument for instruction configuration according to an embodiment.

[0018] Figure 3K The figure illustrates example coding for the new instructions according to an embodiment.

[0019] Figure 3L The figure illustrates a method for distinguishing between loading and storing using prefix bits, according to an embodiment.

[0020] Figure 3M and Figure 3N The figure illustrates an example of feature-enabled logic according to an embodiment.

[0021] Figure 4 The figure illustrates an example computing system according to an embodiment.

[0022] Figure 5The diagram illustrates a block diagram of an example processor and / or System on a Chip (SoC) that may have one or more cores and an integrated memory controller, according to an embodiment.

[0023] Figure 6A This is a block diagram illustrating both an example ordered pipeline and an example register renaming, out-of-order issue / execution pipeline according to an embodiment.

[0024] Figure 6B This is a block diagram illustrating both an example ordered architecture core and an example register renaming, out-of-order issue / execution architecture core to be included in a processor according to an embodiment.

[0025] Figure 7 The figure illustrates an example of one or more execution unit circuits according to an embodiment.

[0026] Figure 8 The figure illustrates, according to an embodiment, the use of a software instruction converter to convert binary instructions in a source instruction set architecture into binary instructions in a target instruction set architecture. Detailed Implementation

[0027] This disclosure relates to methods, apparatus, systems, and nontransitory computer-readable storage media for selectively controllable memory tag inspection. According to some examples, an apparatus includes: instruction decoder circuitry for decoding a first instruction that references a memory location via a tagged pointer; and execution circuitry coupled to the instruction decoder circuitry for performing one or more memory tag inspection operations in response to the first instruction. The one or more memory tag inspection operations include referencing an entry location to find a first tag value, and comparing the first tag value with a second tag value provided by the tagged pointer. The entry location is in a first sub-region of a memory region reserved for a tag table. The first sub-region is in a first set of sub-regions of the memory region. The first set is used to include only sub-regions submitted to the tag storage. The memory region reserved for the tag table is also used to include a second set of sub-regions. The second set is used to include only sub-regions not submitted to the tag storage.

[0028] As mentioned in the background section, memory security and protection are important considerations in computer system architecture and design. Some methods for providing memory security (e.g., ARM Memory Tagging Extension (MTE)) (any of which may be referred to as memory tagging, memory tag checking, tag checking, etc.) involve: associating a first tag (or other metadata) with a memory location (e.g., to indicate ownership) (e.g., by storing the first tag along with the data in the memory location, or by storing the first tag in a table or other data structure indexed by the address of the memory location); comparing a second tag (or other metadata) in a pointer to an address of a memory location associated with an attempted access to that memory location with the first tag; and allowing access to the memory location only if the second tag matches the first tag.

[0029] Existing methods can incur excessive memory and / or performance overhead, for example, by requiring upfront reservation of physical memory and / or disallowing the selection of checks for specific accesses. Therefore, the use of this embodiment may be desirable as it provides a more flexible and / or efficient opt-in, pay-as-you-go model for memory tagging than existing methods. In this embodiment, instrumentation can be embedded within the binary program using a combination of instructions, prefixes, and / or prefix bits to selectively control tagging. This opt-in model allows optimizing compilers and memory-safe language compilers to avoid unnecessary and undesirable checks (e.g., statically known accesses to untagged regions (stack variables and global variables), accesses statically proven safe by the compiler, redundant checks, etc.). Optimizing compilers can also consolidate checks for multiple memory accesses into a smaller number of tagging instructions(s). Other benefits may include allowing the use of a streamlined instruction set to reduce implementation complexity and avoiding dedicated additions outside the core, which avoids wasting silicon space on untagged uses.

[0030] Implementations may include a selectively controllable memory tag checking mechanism or architecture, which, for convenience, may be referred to as ChkTag (pronounced "CheckTag"), or simply as "feature" or "the feature," but the use of the term ChkTag in this specification is merely illustrative and does not limit the implementations to mechanisms, architectures, etc., referred to as ChkTag. Implementations including ChkTag may provide a mechanism for detecting memory safety programming errors (such as buffer overflows and use after free) by utilizing instructions, prefixes, and / or prefix bits inserted by the compiler before memory accesses (e.g., potentially unsafe memory accesses).

[0031] In this embodiment, the embodiment may be used to provide the following: • Discovered out-of-bounds and use-after-free (UAF) vulnerabilities in deployed software. • Reasonable workload and single binary file for software application. • Supports protection of any data area. • Limit false positives (if the software does not have a "vulnerability", it should not fail).

[0032] Figure 1A The figure illustrates a simplified view of a processor 100 for memory tagging according to an embodiment. Processor 100 may represent all or part of a hardware processor, processor core, execution core, core, etc. (any of which may be referred to as a processor, core, etc.) and / or hardware component, including one or more processors, cores, etc. integrated on a single substrate or packaged within a single package, each of which may include any combination of multiple execution threads and / or multiple execution cores. Each processor represented as processor 100 or in processor 100 may be any type of processor, including general-purpose microprocessors (such as processors from the Intel® Core® processor family or other processor families from Intel® Corporation or another company), dedicated processors or microcontrollers, or any other device or component in an information processing system in which embodiments of the embodiment may be implemented. Processor 100 may be architecturally configured and designed to operate according to any ISA, with or without microcode control. For convenience and / or example purposes, some features (e.g., instructions, registers, ISA extensions, etc.) may be referred to by names associated with a particular processor architecture (e.g., x86, Intel® 64, IA32, linear address masking (LAM)), but embodiments are not limited to those features, names, architectures, etc.

[0033] The processor 100 can be implemented in logic gates and / or any other type of circuitry, all or part of which can be included in discrete components and / or integrated into the circuitry of the processing device or any other means in a computer or other information processing system. For example, Figure 1A The processor 100 may correspond to and / or be implemented in or included in any of the following: Figure 4 The processors in them are 470, 480, or 415. Figure 5 Processor 500 or core 502A to 502N, and / or Figure 6B The core 690, each is described below.

[0034] As shown in the figure, processor 100 includes an instruction unit 110 and an execution unit 120. Processor 100 may include any number of these elements (e.g., multiple execution units) and / or Figure 1A Any other element not shown in the diagram.

[0035] As described below, instruction unit 110 can be used with Figure 6B Front-end unit 630 corresponds to and / or is implemented / included in front-end unit 630, and / or may include any combination of the following: circuits, logic gates, (one or more) programmable logic arrays, (one or more) lookup tables, structures, hardware, etc. (such as instruction decoders, e.g., Figure 6B The decoding circuit 640 in the processor 100 is used to acquire, receive, decode, interpret, schedule, and / or dispose of instructions, such as memory tagging instructions 112 to be executed by the processor 100 (e.g., CHKLDTAG, CHKSTTAG, another explicit ChkTag instruction, data access instructions with a ChkTag prefix (e.g., MOV, MOVD, MOVQ, MOVSD, MOVSS, MOVSX, MOVSXD, MOVZX, VMOVD, VMOVQ, VMOVSD, VMOVSS, etc., as described below). Figure 1A In this context, instructions that can be decoded or otherwise processed by instruction unit 110 are represented as blocks with dashed boundaries, because the instruction itself is not hardware, but instruction unit 110 may include hardware or logic capable of decoding or otherwise processing the instruction.

[0036] While some embodiments may be described using specific instructions and / or instruction formats, any instruction format may be used in the embodiments; for example, an instruction may include an opcode and one or more operands, wherein the opcode may be decoded into one or more microinstructions or microoperations for execution by the execution unit 120. Operands or other parameters may be associated with the instruction implicitly, directly, indirectly, or in any other way.

[0037] Execution unit 120 may represent an execution unit implemented in any combination of circuits, hardware, arithmetic-logic units, load-store units, etc., coupled to instruction unit 110 for performing operations in response to decoded instructions (e.g., microinstructions, uops, control signals, etc.) generated by instruction unit 110, such as those described below. Figure 6B and / or Figure 7 Any combination of the execution engine unit 650, (one or more) execution clusters 660, (one or more) execution unit circuits 662 and / or memory access circuits 664.

[0038] Implementations may include associating a tag with a memory chip and checking if the corresponding tag value exists in a pointer used to access the memory. If the tag in the pointer does not match the tag associated with the memory location, an exception is generated.

[0039] In embodiments, explicit tag checking instructions (e.g., ChkTag instructions including CHKLDTAG and CHKSTTAG) and / or instructions with the ChkTag prefix can be inserted by the compiler before potentially unsafe memory accesses to detect memory safety programming errors such as buffer overflows and use after free. Alternatively, tags can be implicitly checked for some or all memory accesses.

[0040] Figure 1B Details of examples of the CHKLDTAG and CHKSTTAG instructions according to embodiments are shown. Figure 1C Examples of helper functions for these and other ChkTag instructions are shown according to an embodiment.

[0041] Figure 2A The diagram illustrates a block diagram 200 according to an embodiment, including: an enhanced compiler 220 for instrumenting source code 210 with instructions to inspect memory accesses (e.g., explicit ChkTag instructions or instructions with a ChkTag prefix); and a memory allocator 240 for allocating one or more portions of memory (e.g., data memory 250) (e.g., in response to malloc instruction 242) to a program, application, or other software. The memory allocator (e.g., allocator 240) may be implemented within system software (however, the embodiments are not limited to software implementations of the memory allocator). In the resulting instrumented code 230, each memory access (e.g., memory access 234) is preceded by a ChkTag operation (e.g., ChkTag operation 232, which may be in response to a ChkTag instruction inserted before the memory access instruction or in response to the execution of an instruction with a ChkTag prefix), in which a tag in a pointer associated with the memory access operation is compared with a tag stored in a flat tag table 252 in linear memory associated with the corresponding memory location.

[0042] In one embodiment, the ChkTag instruction can specify an access range within which a tag in a pointer is compared to a tag associated with the corresponding memory location. The access range can be specified by encoding both the memory operand against the base address and the data access size into the instruction. In other embodiments, the access range can be specified by the memory operand in the ChkTag instruction (e.g., the base address register specifies the first byte of the access range, and the effective address specifies the last byte of the access range). Some embodiments may incorporate segment support for calculating the access range.

[0043] The following terms are used in the description of the embodiments. Definitions are given by way of example, but the embodiments are not limited to these definitions (e.g., pointers may be of other sizes than 64 bits, LA_MSB or other bit positions may be different, etc.). Similarly, any other references to bit positions or bit lengths in values, registers, tags, etc., in this specification or the corresponding drawings are given by way of example and do not limit the embodiments to the scope referenced. •ChkTag prefix: Can be applied to a subset of instruction types that access memory to indicate a prefix or bit setting that may require ChkTag tag checking if ChkTag is enabled. • Data-LA: The linear address used for paging memory access. The resulting address does not include the pointer tag bits. • LA_MSB: Index of the most effective linear address bits from paging mode rather than LAM mode: 56 for LA57 and 47 for LA48. • Pointer: A 64-bit value generated from an address, containing a label and an address. • Tag checking: A comparison of the pointer tag with one or more corresponding tags loaded from memory. Triggered by the CHKLDTAG or CHKSTTAG instruction or the ChkTag prefix (if enabled). Associated linear address preprocessing checks, address space wrapping checks, and reserved bit checks can also be performed. • Tag-LA: The linear address used by the CPU during tag checking to access entries in the tag table.

[0044] According to embodiments, examples of features (e.g., ChkTag) that can be included in the system architecture include: The CHKLDTAG and CHKSTTAG instructions accept any memory access object and specify the size of the data access. Compilers and assemblers can use these to inspect arbitrary data accesses. • For certain MOV type instructions, prefix bits or bytes are used to generate ChkTag operations with reduced code size overhead compared to CHKLDTAG and CHKSTTAG instructions. • Separate tag tables for each half of the linear address space in linear memory. The software configures the tag table location using a new model-specific register (MSR). The size of the linear address reservation for each tag table is 1 / 32 of the size of each half of the linear address space covered by that table. The size of the linear address space is determined by the paging mode. Pages within the linear range of the tag table can initially be uncommitted. The set of committed tag table pages can be expanded on demand as tags are initialized for additional data pages to provide a pay-as-you-go model. An alternative is to narrow the inspected range of the linear address space, which would result in a corresponding reduction in the linear reservation of the tag table. Defining more than two inspected address ranges will also be possible. Tagging is performed with a 16-byte granularity and a 4-bit tag size. Other granularities and tag sizes are possible. Tags can be read / written along with all existing types of load / store instructions. This allows for optimized tag table access. For example, the allocator can restrict the use of locked tag update operations to where they are actually necessary. The allocator can also perform bulk tag updates using single instruction multiple data (SIMD) instructions. Large memory operations (e.g., in string and memory library routines) can perform SIMD loads and checks directly on the tag memory using existing instruction types. • Precise mismatch detection, even for storage. • Controls in the new MSR allow software to dynamically select a checking mode (e.g., off (e.g., for minimal overhead), load and store (e.g., for maximum security coverage), and store only (e.g., for intermediate overhead)) separately for each half of the linear address space. The overhead can scale with the range of tags. For example, processes with various configurations can exist on a shared kernel (which itself can be tagged or untagged, and can be instrumented or uninstrumented, with dynamically configurable modes for load and store, store only, or disable checks (if tagged)): 1) a tagged process with load and store checks, 2) a tagged process with store-only checks, 3) an instrumented process with tags disabled (e.g., minimal overhead, only from additional instructions treated as NOPs and ignored prefixes), and 4) an uninstrumented process (zero overhead from ChkTag). In addition to tags, other types of metadata may also be potentially encoded into pointers and / or stored in metadata tables, such as single- or double-ended boundaries, version, permission bits, partition identifiers (IDs), privilege levels, accessed bits and / or dirty bits, identifiers for codes authorized to access data (such as hash values), keys used by processor circuitry to encrypt / decrypt data and / or other metadata, key IDs, integrity values ​​(IVs) or counter values, aggregate cipher message authentication codes (MACs) for data allocation, integrity-check values ​​(ICVs) or error-correcting codes (ECCs), element sizes (e.g., to allow errors to be generated when attempting to access an allocation at an offset that is not an even multiple of the element size), and data object sizes (e.g., to allow exceptions to be generated when accessing an invalid location outside the data object, even if the space reserved for the allocation is larger than the size required for the data object).

[0045] Examples of pointer formats are as follows Figure 2B As shown in the diagram.

[0046] Figure 2C The diagram illustrates an example of looking up a tag in a tag table. In this embodiment, each tag covers a 16-byte naturally aligned memory granule. The tag for a given access attempt can be located by first dividing the distance of the data's linear address from a first address in half of the linear address space containing it by 32. The reason for dividing by 32 instead of the 16-byte granule size is that a single tag table byte contains two tags. Next, a scaled address is added to the base address for the tag table to generate the final linear address for the tag byte. The tag table base address can be specified as described below.

[0047] Figure 2D The diagram illustrates an example tag table layout for an example based on 4KB linear data pages (e.g., tag table coverage of data pages). To check access, tags for each granule to be accessed are loaded from the tag table and compared to tags in a pointer. If any of the loaded tags does not match a tag in the pointer, an exception is generated. In an embodiment, for tagging violation conditions, features (e.g., ChkTag) may introduce new architectural exception types (e.g., TaggingViolation, #TV).

[0048] In the embodiments, using the CPUID (processor identifier) ​​enumeration in the extended features (e.g., CPUID.(0x7.0x1).ECX[6] (EAX=07H, ECX=01H→ECX[6]=1b)), ChkTag is only supported in 64-bit mode (e.g., IA32_EFER.LMA and CS.L==1).

[0049] In one embodiment, tag loading can adhere to a standard memory ordering model for loading without requiring fencing. In another embodiment, instructions prefixed with ChkTag perform tag loading, followed by data-LA (linear address) access. Tag loading may be repeated due to a fault occurring later in the instruction, causing the software to re-execute the instruction from scratch. Furthermore, tag loading may be repeated even without a fault. However, in embodiments where tag loading and checking are ordered before data-LA access, the ChkTag prefix can prevent the introduction of any new instances of duplicate data-LA access. The software can avoid performing tag loading from uncacheable (UC) memory, where side effects may occur due to memory-mapped input / output (MMIO). Other embodiments may order tag loading and checking in other ways relative to data-LA access.

[0050] Implementation examples may include new MSRs, wherein two MSRs may be defined, for example, as follows:

[0051] IA32_CHKTAG_LO ( Figure 3A ): • Includes the ChkTag enable bit at the (low) address where bit 63 is 0. • Context switching is possible between processes.

[0052] IA32_CHKTAG_HI ( Figure 3B ): • The ChkTag enable bit and the CPL manager contain the (high) address where bit 63 is 1. • It is expected to remain constant across multiple processes.

[0053] The MSR can be thread-wide, readable and writable (R / W), and is initialized to 0 (e.g., in response to a reset). Attempting to set reserved bits may result in a general protection failure. The configurable location of the tag table controlled by the MSR is for LA48. Figure 3C And for LA57 Figure 3D The diagram is shown in the image. Note that these addresses are listed as data for which LAM masking has been completed - LA.

[0054] Implementations may include global feature (e.g., ChkTag) enable bits, such as CR4.CHKTAG (e.g.) Figure 3E CR4 bit 33 shown.

[0055] In an embodiment, features (e.g., ChkTag) may only be supported in 64-bit mode (IA32_EFER.LMA and CS.L == 1). On conventional processors outside of 64-bit mode and lacking ChkTag support, the CHKLDTAG and CHKSTTAG instructions are executed as NOPs, and the ChkTag prefix and prefix bits are ignored.

[0056] As an example, Figure 3F The formulas shown define the effect of the CHKTAG and LAM enable bits in IA32_CHKTAG_LO, IA32_CHKTAG_HI, CR3, and CR4. The value of TagChkEn determines whether the tag checking operation being evaluated will be enabled. The parameter "is_chk_store_op" is true for CHKSTTAG instructions stored in data-LA and instructions with the ChkTag prefix, even if they are also loaded from data-LA.

[0057] In one embodiment, a feature (e.g., ChkTag) does not support tag checking for memory operation objects with potentially non-zero segment base addresses (i.e., those with valid segments of FS or GS). However, MOV instructions with the ChkTag prefix that reference these segments will still perform data-LA access, just without tag checking. Other embodiments may support tag checking for memory operation objects with potentially non-zero segment base addresses.

[0058] In embodiments, it may be desirable for privileged software to keep label checks enabled while accessing user addresses (e.g., unlike linear address space separation (LASS) and supervisor-mode access prevention (SMAP), where the hypervisor software opts out of those access control checks when intentionally accessing user memory). Other enable bit definitions (e.g., separate enable bits for each current privilege level (CPL) and address space half or other range definitions) and combinations of enable bits are possible.

[0059] like Figure 3GAs shown in the example, a feature (e.g., ChkTag) can utilize TAGRD (bit 8) to expand the Page-Fault Error Code (PFEC), which is set to 1 when a page fault occurs during tag-LA access. When TAGRD is set, CR2 will be set to tag-LA.

[0060] In an embodiment, for tagging violation conditions, a feature (e.g., ChkTag) can introduce a new architectural anomaly type, TaggingViolation, where: • Abbreviation = #TV · Vector = 22 • Description = Tag Violation • Exception class = fault • Class = Benign Error code = Yes • Source = ChkTag tag - Check instructions: CHKLDTAG, CHKSTTAG, and MOV type instructions with the ChkTag prefix.

[0061] In conjunction with tagged violation exceptions, the implementation example may include the following: • Error data (LA) will be pushed onto the stack as event data only if and only if Flexible Return and Event Delivery (FRED) is enabled. Error data - LA is also stored in the virtual machine control structure (VMCS) as exit eligibility, regardless of the FRED status on the guest machine. For tag mismatches, bits 63:4 of the erroneous data-LA identify the 16-byte aligned base address (excluding tag bits) of the data-LA particle that caused the mismatch. In the case of multiple mismatches, the reported mismatch is model-specific. Bits 3:0 of ​​the erroneous data-LA are reserved, and the software cannot assume that these bits will always be zero. In addition, tagging violation error codes (TVEC, for example, such as...) Figure 3H (As shown) is either pushed onto the stack or stored in the VMCS as an exit event identifier error code. Bit 11 of the exit event identifier information will also be set to indicate that the error code is valid. In some implementations, the label in TVEC can always be different. • Other combinations of one or more data items described in this section may be reported along with tagged violation exceptions. Other types of data, such as the index of the erroneous data particle relative to the first particle containing data -LA, may be additionally or in lieu of the above data reports.

[0062] In embodiments, features (e.g., ChkTag) can introduce three types of instruction set architecture (ISA) extensions, such as... Figure 3I As shown in the image.

[0063] In an embodiment, multi-byte access edge cases can be handled as follows (e.g., for one of two behaviors based on data-LA and checked access size): • For accesses that cross non-canonical regions with some bytes of input and some bytes of output, the result will be #GP(0) / #SS(0), just like a normal access. • For accesses that wrap around the 64-bit address space (fff… to 000…), the result will be #GP(0) / #SS(0), which is new for ChkTag operations. In some embodiments, this can be done even when ChkTag is disabled.

[0064] Based on the tag loading address range, the embodiment may include the following: The number of tag bytes to load for a tag inspection operation depends on both the size of the access being inspected and the alignment of the data-LA. • Tag loading is aligned to avoid generating page faults and extended page table (EPT) violations for pages other than those containing the actual tag bytes required for the current check.

[0065] Implementations may include architectural properties to avoid leaving breadcrumbs, which may allow for differentiation between label mismatches and label matches in transient executions (e.g., cache line states, including those for page table entries (PTEs)), translation lookaside buffer (TLB) states, and load / store (LD / ST) operations (including those for address / data (A / D) bit updates).

[0066] Figure 3J An example of instrumentation for instruction configuration is shown. When the instruction code lists REX_X, it can also refer to REX2.X3 interchangeably.

[0067] The implementation can coexist with other technologies, for example, having the following interactions: • Intel® Accelerator Interfacing Architecture (AiA): Unaffected. • Intel® AMX: If tag checking is required, add the CHKLDTAG / CHKSTTAG command before the TILELOAD / TILESTORE command. • Intel® APX CFCMOV: Conditional checks are performed using CMOV with CHKLDTAG / CHKSTTAG instructions. • Intel® Control Flow Enforcement Technology (Intel® CET): Unaffected. • Debug register: Tag loading triggers breakpoints. • Scattered, clustered, and masked MOV instructions: There is no hardware support for checking scattered, clustered, and masked MOV instructions. The compiler must calculate the address range to be checked and perform these checks using the CHKLDTAG and CHKSTTAG instructions. • Linear Address Masking (LAM): ChkTag can use or rely on LAM or other features to mask a subset of address bits. LAM masking is not applied to (implicit) tags - LA. • Linear-Address-Space Separation (LASS): When LASS is enabled, a LASS check is performed on the data-LA during tag checking to prevent inappropriate transient breadcrumbs from being accessed by the tag. However, not all software that requires ChkTag is LASS-compatible; for example, some firmware, so LASS is not a prerequisite. • Intel® Machine Check Architecture (MCA) / Poison: For tag loading - same as normal loading. Microcode patch loading: Unaffected. • Intel® Processor Event-Based Sampling (PEBS) and PerfMon: PEBS writes using LA do not perform tag checks. • Persistent memory (PMEM): PMEM can be inspected and / or contains a tag table, and behaves like volatile memory with respect to ChkTag. • Processor Tracing (RTIT): Naturally supports tracing #TV, #TV VM exits with error data -LA, and tag load bits for EPT violations. VM exits occur on components that support event tracing. Code addresses are unaffected. Processor trace buffer writes are not checked. Status reporting naturally follows existing PT event tracing architectures, such as: o is tracked via existing packet types: #TV, VM exit based on #TV (including error data - LA as VM exit eligibility), and exit eligibility bits used to distinguish tag loading for EPT violations. o Untraceable: Error data -LA for #TV that does not cause VM exit, including the pointer and memory tag value of #TV (whether it exits the VM) in TVEC, and the TAGRD bit in the PFEC of #PF (whether it exits the VM) (because PFEC is not traced). • Protection key: Effective when the tag is loaded. • Intel® Software Protection Extensions (Intel® SGX): When executed in an enclave where ChkTag is not enabled, ChkTag instructions are executed as NOPs, and the ChkTag prefix is ​​ignored. Software tag checking is possible. • SMM, STM: CR4.CHKTAG is cleared upon SMI entry and STM configuration, and restored upon exit. SMM / STM can be enabled upon selection. Parallel VM exit / entry remains unchanged. • Intel® Trust Domain Extensions (Intel® TDX): Within a TD, ChkTag functions as expected (just as it does in a VM). Proof via attribute (ATTRIBUTE). • Intel® TSX: Tracks tag loading in the TSX read set as a normal load. Aborts transactions based on #TV. • Intel® TXT: CR4[63:32] is saved, cleared and restored across ACM (existing behavior; including CR4.CHKTAG). • Intel® VT-x: The new VMCS fields include host and guest IA32_CHKTAG_HI states, along with associated VMX controls and control enumerations, for loading during VM entry and exit. For example, two 64-bit VMCS fields can be used to store the IA32_CHKTAG_HI state, one in the guest state area and one in the host state area. Additionally, there can be a "Load IA32_CHKTAG_HI" VM entry control and a "Load IA32_CHKTAG_HI" VM exit control. VMX translations can manage the MSR as follows: if the "Load IA32_CHKTAG_HI" VM entry control is 1, then VM entry will load the IA32_CHKTAG_HI MSR from the corresponding field in the guest state area. If the "Load IA32_CHKTAG_HI" VM-Entry control is set to 1, VM exit can unconditionally save the value of the IA32_CHKTAG_HI MSR to the corresponding field in the Guest Status area, or a separate control can be defined to manage this behavior. If the "Load IA32_CHKTAG_HI" VM-Exit control is set to 1, VM exit can load the IA32_CHKTAG_HI MSR from the corresponding field in the Host Status area. Additional VMCS fields and controls can be defined to save and restore the Guest and / or Host IA32_CHKTAG_LO MSR state during VM entry and / or exit. Native support for #TV-based exits is available, with reporting for error data -LA and TVEC. A new EPT violation exit eligibility bit can be defined to differentiate tag loading (including page walkthroughs). #VE reports the same information as an EPT violation VM exit. • Intel® VT-d: In the absence of Shared Virtual Memory (SVM), VT-d translation is GPA→HPA, and GPA is unaffected by LAM or ChkTag. • Intel® VT-Redirect Protection (Intel® VT-rp) / Hypervisor-managed Linear-Address Translation (HLAT): The HLAT walkthrough process is used for all applicable LAs (even tagged LAs).

[0068] As described in the background section, a processor, processor core, execution core, etc. (any of which may be referred to as a core) can execute instructions defined by an ISA. An ISA may include one or more NOPs, which can be redefined as one or more new instructions for extending the ISA. However, the number of NOPs may be limited. Therefore, embodiments provide a technique for adding multiple new instructions using only one NOP opcode.

[0069] As an example, the embodiment includes adding two new instructions (e.g., CHKLDTAG and CHKSTTAG) to the x86 ISA using a NOP opcode (e.g., OF 1C). Further in the example, the opcode can also be extended to indicate the size of one or more data accesses associated with the new instructions. Figure 3K The figure illustrates this example.

[0070] Figure 3K Example encodings for fourteen new instructions are shown, the operation of which will be described below. In the “Encoding” column, the encoding is indicated according to the Intel® 64 instruction format, which includes an opcode field and may include a REX prefix field and an opcode extension field. All of these instructions use the same two-byte hexadecimal opcode (0F1C) for NOPs, thus reserving additional NOP opcodes for other future instructions.

[0071] As shown in the “Encoding” column, the REX prefix (hexadecimal 40 to 4F) indicates that the two-byte opcode should be decoded into a CHKTAG instruction (described below), where the W bit of the REX prefix indicates whether the CHKTAG instruction is a CHKLDTAG (e.g., W=0 or REX.W0) or a CHKSTTAG (e.g., W=1 or REX.W1) instruction. Therefore, embodiments provide a way to distinguish between instructions involving or related to storage (e.g., CHKTAG instructions for tag-checking memory access) and instructions involving or related to loading, thereby supporting operational modes that relate only to one of storage or loading (e.g., a CHKTAG architecture supporting operational modes that check memory tags for storage rather than loading), without assigning completely separate opcodes.

[0072] In this embodiment, the more compact REX.W0 encoding is used for loading because there may be more load instructions than store instructions. Storage that already uses REX X and / or B bits will not suffer any increase in code size due to REX.W1 encoding.

[0073] Furthermore, the seven CHKLDTAG instructions can be distinguished by opcode extensions (e.g., 1, 2, 3, 4, 5, 6, or 7 in the reg field of the ModR / M byte), such as the seven CHKSTTAG instructions, to indicate the size of one or more data accesses (e.g., 1, 2, 4, 8, 16, 32, or 64 bytes respectively). Accordingly, the mnemonics shown in the "Instructions" column are CHKLDTAG1, CHKLDTAG2, CHKLDTAG4, CHKLDTAG8, CHKLDTAG16, CHKLDTAG32, and CHKLDTAG64, and CHKSTTAG1, CHKSTTAG2, CHKSTTAG4, CHKSTTAG8, CHKSTTAG16, CHKSTTAG32, and CHKSTTAG64, where "m" indicates that the instruction format includes a memory operation object for indicating the memory location of one or more data accesses.

[0074] Encoding the data access size into the opcode allows these codes to exclude additional prefixes (e.g., a hexadecimal 66 prefix indicating the size of the operand), thus providing a smaller code size. Therefore, column 104 also shows that the encoding does not use additional prefixes (NP).

[0075] Various other embodiments are possible, including, but not limited to, using bits (e.g., W bits) in another prefix (e.g., the Intel® Advanced Processor Extensions (APX) REX2 prefix) to distinguish between load and store.

[0076] Figure 3L The figure illustrates a method 300 for distinguishing between loading and storing using prefix bits according to an embodiment.

[0077] In 302, the instruction decoder circuit receives an instruction from an extended instruction set having an opcode corresponding to a NOP in the non-extended instruction set. In 304, the value of one or more instruction prefix bits determines whether an operation corresponding to the instruction (e.g., CHKLDTAG or CHKSTTAG) (e.g., a memory tag check operation, which may include a memory tag load operation) will be executed in conjunction with a load operation or a store operation (e.g., a data load or data store operation performed within the address range specified by the CHKLDTAG or CHKSTTAG instruction in response to a load or store instruction following the CHKLDTAG or CHKSTTAG instruction). In embodiments, the data access size may be determined based on the extended opcode of the instruction.

[0078] In 310, the operation corresponding to the instruction (e.g., CHKLDTAG) (e.g., memory tag checking) is combined with a load operation (e.g., execution within the address range specified by the CHKLDTAG instruction in response to a load instruction following the CHKLDTAG instruction). For example, a memory tag check can be performed for an address (or an address range including the address) to be used in the load operation. In 312, the load operation is performed.

[0079] In 314, a storage operation (e.g., in response to a storage instruction) is performed without combining the storage operation with an operation corresponding to the instruction (e.g., CHKLDTAG) (e.g., memory tag checking). For example, since the previous tag checking instruction was used for loading (CHKLDTAG) rather than for storing, the storage operation can be performed in response to the storage instruction without performing a memory tag check on the address used in the storage operation.

[0080] In 320, an operation corresponding to an instruction (e.g., CHKSTTAG) (e.g., memory tag checking) is combined with a store operation (e.g., a store instruction following the CHKSTTAG instruction, executed within the address range specified by the CHKSTTAG instruction) and performed. For example, a memory tag check can be performed for an address (or an address range including the address) to be used in the store operation. In 322, the store operation is performed.

[0081] In 324, a load operation (e.g., in response to a load instruction) is performed without combining a load operation with an operation corresponding to the instruction (e.g., CHKSTTAG) (e.g., memory tag checking). For example, since the previous tag checking instruction was used for storage (CHKSTTAG) rather than for loading, the load operation can be performed in response to the load instruction without performing a memory tag check on the address used in the load operation.

[0082] Implementations may include various types of CHKTAG instructions, such as CHKLDTAG instructions for providing tag checks for load operations and CHKSTTAG instructions for providing tag checks for store operations, enabling support for different tag checking modes (e.g., checking load and store, checking store but not load, etc.). In implementations, read-modify-write operations may be treated as store operations (e.g., performing one or more checks prior to read-modify-write data access in response to one or more CHKSTTAG instructions). Additional variants of the CHKTAG instructions may be defined so that the compiler associates each variant with a different instruction class (e.g., read-modify-write instructions, floating-point instructions, etc.), where the activation of each variant is controlled based on combinations of enable bits.

[0083] In embodiments, instruction encoding selection can be based on factors such as the frequency of the corresponding instructions and / or operations. For example, a loaded tag check can be assigned a more compact REX.WO encoding because there can be more load instructions than store instructions.

[0084] Various embodiments may include various implementations for enabling operations (e.g., tag checking) to be performed in response to decoded (or partially decoded) instructions. For example, implementations such as Figure 3M The circuitry shown, such as the enable circuit, determines the enabled state of the CHKTAG instruction. This enable circuitry allows the front end to be discarded (e.g., as described below). Figure 6B In the front-end unit 630, instructions can be considered unnecessary independently of the value of the corresponding memory address, without consuming additional pipeline resources.

[0085] Figure 3M The following signals involved in the control enable circuit are shown (which may be defined within the x86 ISA, Linear Address Masking (LAM) architecture and / or ChkTag architecture, and / or may be programmed into the Model Specific Register (MSR) or control register (e.g., CR3, CR4)). • CR3.LAM_U48 (User LAM48 enable bit in CR3, involved in the masking of linear address bits 62:48 of the user pointer) • CR3.LAM_U57 (User LAM57 enable bit in CR3, involved in the masking of linear address bits 62:57 of the user pointer) • IA32_CHKTAG_LO.EN (The ChkTag enable bit in the IA32_CHKTAG_LO MSR, which is involved in tag checking that controls loads and stores referencing low addresses) •CR4.LAM_SUP (The hypervisor LAM enable bit in CR4, involved in the masking of hypervisor pointers) • IA32_CHKTAG_HI.EN (The ChkTag enable bit in the IA32_CHKTAG_HI MSR, which is involved in tag checking that controls loads and stores that reference high addresses) • CPL (Current Privilege Level) • IA32_CHKTAG_LO.LOAD_CHECK_EN (The load ChkTag enable bit in the IA32_CHKTAG_LO MSR, which is involved in the tag checking that controls loads referencing low addresses) • IA32_CHKTAG_HI.LOAD_CHECK_EN (The load ChkTag enable bit in the IA32_CHKTAG_HI MSR, which is involved in tag checking that controls loads targeting high addresses) •CR4.CHKTAG (Total ChkTag enable bits in CR4) • IA32_EFER.LMA (bits in the extended feature enable MSR, EFER, which are involved in indicating whether IA-32e mode is active) • CS.L (determines the code segment descriptor bits involved in submode operations in IA-32e mode) • Segment (a separate address space that can be associated with an address used for data access, such as CS (code segment), DS (data segment), SS (stack segment), ES (data segment), FS (data segment), GS (data segment)) • Pointer

[63] (bit 63 of the pointer used for data access operations) Is it storage? (Is data access considered storage?)

[0086] For example, consider the following configuration values: ·CR3.LAM_U48 = 0 ·CR3.LAM_U57 = 0 ·IA32_CHKTAG_LO.EN = 0 ·CR4.LAM_SUP = 1 ·IA32_CHKTAG_HI.EN = 1 CPL = 3 ·IA32_CHKTAG_LO.LOAD_CHECK_EN = 0 ·IA32_CHKTAG_HI.LOAD_CHECK_EN = 1 ·CR4.CHKTAG = 1 ·IA32_EFER.LMA = 1 ·CS.L = 1 Segment = DS • It is the storage being inspected

[0087] Even if many parts of the enabled circuitry calculate high values, the final result of the circuitry will indicate that no label check is needed, even without knowing the value of the pointer. Therefore, embodiments can allow the processor's front end (e.g., as described below) to... Figure 6BThe front-end unit 630 in the middle avoids consuming any additional pipeline resources. Figure 3N An example of the decision logic used to enable inspection in the front end is shown.

[0088] In an embodiment, it may be desirable to divide the enable bit into the aforementioned different types of registers.

[0089] For example, the use may include switching the IA32_CHKTAG_LO / HI.EN bit and / or the IA32_CHKTAG_LO / HI.LOAD_CHECK_EN bit. Switching between a first mode for checking loads and stores and a second mode for checking stores only to modulate overhead can benefit from fast updates to *.LOAD_CHECK_EN to reduce overhead. Switching between a second mode for checking only the storage and a third mode for not checking can mitigate overhead by rapidly updating *.EN files to reduce overhead.

[0090] As another example, the enable bits mentioned above can be placed in the MSR to reduce the overhead of updating them. A potential alternative, placing the enable bits in the CR3 or CR4 registers, would be slower because CR3 and CR4 updates can take longer, serializing the operation.

[0091] As another example, adjusting IA32_CHKTAG_LO.EN on LAM_U48 / U57 also avoids the need to update IA32_CHKTAG_LO when switching between tagged and untagged processes, assuming that there are matching tag table base addresses, ChkTag EN, and LOAD_CHECK_EN between LAM processes. If this is not the case, additional register updates may be required.

[0092] As another example, enabling bit architectures, similar to those shown above, can also help accelerate virtual machine monitor (VMM) emulation for ChkTag. • The VMM simulating guest machine memory access or tag checking has already checked guest machine CR3 during guest machine page walkthrough. • When LAM is disabled, determining the low address (i.e., its pointer

[63] == 0) and enabling ChkTag does not increase the cost (when LAM_U48 / U57 is enabled, only the guest machine IA32_CHKTAG_LO is read additionally). • When the high address check is disabled based on guest machine CR4.LAM_SUP being disabled (i.e., its pointer

[63] == 1), the VMM can end (when LAM_SUP is enabled, only guest machine IA32_CHKTAG_HI is read separately).

[0093] Implementations may include other bit-enabled architectures to provide similar benefits to those described above.

[0094] Example apparatus, method, etc.

[0095] According to some examples, an apparatus (e.g., a hardware processor, processor core, execution core, etc.) includes: instruction decoder circuitry for decoding a first instruction that references a memory location via a tagged pointer; and execution circuitry coupled to the instruction decoder circuitry for performing one or more memory tag checking operations in response to the first instruction. The one or more memory tag checking operations include referencing an entry location to find a first tag value, and comparing the first tag value with a second tag value provided by the tagged pointer. The entry location will be in a first sub-region of a memory region to be reserved for a tag table. The first sub-region will be in a first set of sub-regions of the memory region. The first set is used to include only sub-regions committed to the tag store. The memory region to be reserved for the tag table is also used to include a second set of sub-regions. The second set is used to include only sub-regions not committed to the tag store.

[0096] Any such example may include any one or any combination of the following aspects. During memory tag initialization, a first set is scaled up as needed. One or more memory tagging checks also include triggering an exception in response to a mismatch between a first tag value and a second tag value. The memory location is referenced using a linear address in a linear address space. The linear address is used to locate the first tag value. The first sub-region is a page in linear memory. The page is 4KB in size. Locating the first tag value involves calculating a scaled address by dividing the distance of the linear address from the lowest address in the linear address space by a first number, which is based on the size of the memory location and the size of the first tag value. The memory location is 16 bytes in size. The first tag value is 4 bits in size. The first number is 32. The first sub-region includes tag storage space for covering 32 data pages. The linear address space has a first size, and the memory region to be reserved for the tag table has a second size, where the second size is the first size divided by 128KB. The device also includes a register for storing the base address of the tag table. Locating the first tag value also includes adding the scaled address to the base address. The linear address is in the first linear address space among multiple linear address spaces, and the memory region to be reserved for the tag table is in the first linear address space among multiple linear address spaces.

[0097] According to some examples, a method includes: decoding a first instruction for referencing a memory location via a tagged pointer; and performing one or more memory tag checking operations in response to the first instruction, wherein: the one or more memory tag checking operations include referencing an entry location to find a first tag value, and comparing the first tag value with a second tag value provided by the tagged pointer; and the entry location is in a first sub-region of a memory region to be reserved for a tag table, the first sub-region being in a first set of sub-regions of the memory region, the first set being used to include only sub-regions committed to the tag store, the memory region to be reserved for the tag table also being used to include a second set of sub-regions, and the second set being used to include only sub-regions not committed to the tag store.

[0098] Any such example may include any one or any combination of the following aspects. The method also includes scaling up the first set as needed during memory tag initialization. One or more memory tagging checks also include inducing an exception in response to a mismatch between a first tag value and a second tag value. The memory location is referenced by a linear address in a linear address space. The linear address is used to find the first tag value. The first sub-region is a page in linear memory. The page size is 4KB. Finding the first tag value includes calculating a scaled address by dividing the distance of the linear address from the lowest address in the linear address space by a first number, the first number being based on the size of the memory location and the size of the first tag value. The size of the memory location is 16 bytes. The size of the first tag value is 4 bits. The first number is 32. The first sub-region includes tag storage space for covering 32 data pages. The linear address space has a first size, and the memory region to be reserved for the tag table has a second size, where the second size is the first size divided by 128KB. The method also includes storing the base address of the tag table in a register. Finding the first tag value also includes adding the scaled address to the base address. The linear address is in the first linear address space among multiple linear address spaces, and the memory region to be reserved for the tag table is in the first linear address space among multiple linear address spaces.

[0099] According to some examples, a non-transitory machine-readable medium storage instruction includes a first instruction that, when decoded by a machine, causes the machine to execute a method comprising: referencing an entry location to find a first tag value, the first instruction being used to reference the memory location via a tagged pointer; and comparing the first tag value with a second tag value provided by the tagged pointer; wherein: the first instruction references the memory location via the tagged pointer; and the entry location is in a first sub-region of a memory region to be reserved for a tag table, the first sub-region being in a first set of sub-regions of the memory region, the first set being used to include only sub-regions committed to tag storage, the memory region to be reserved for the tag table also being used to include a second set of sub-regions, and the second set being used to include only sub-regions not committed to tag storage.

[0100] Any such example may include any one or any combination of the following aspects. The method also includes scaling up the first set as needed during memory tag initialization. One or more memory tagging checks also include inducing an exception in response to a mismatch between a first tag value and a second tag value. The memory location is referenced by a linear address in a linear address space. The linear address is used to find the first tag value. The first sub-region is a page in linear memory. The page size is 4KB. Finding the first tag value includes calculating a scaled address by dividing the distance of the linear address from the lowest address in the linear address space by a first number, the first number being based on the size of the memory location and the size of the first tag value. The size of the memory location is 16 bytes. The size of the first tag value is 4 bits. The first number is 32. The first sub-region includes tag storage space for covering 32 data pages. The linear address space has a first size, and the memory region to be reserved for the tag table has a second size, where the second size is the first size divided by 128KB. The method also includes storing the base address of the tag table in a register. Finding the first tag value also includes adding the scaled address to the base address. The linear address is in the first linear address space among multiple linear address spaces, and the memory region to be reserved for the tag table is in the first linear address space among multiple linear address spaces.

[0101] According to some examples, an apparatus may include means for performing any of the functions disclosed herein; an apparatus may include a data storage device that stores code, when executed by a hardware processor or controller, causing the hardware processor or controller to perform any method or any part of a method disclosed herein; apparatuses, methods, systems, etc., may be as described in the detailed description; a non-transitory machine-readable medium may store instructions that, when decoded and / or executed by a machine, cause the machine to perform any method or any part of a method disclosed herein. Embodiments may include any details, features, etc., or combinations of details, features, etc., described in this specification.

[0102] Example computer architecture

[0103] The following section details an example computer architecture. Other system designs and configurations known in the art for laptops, desktop computers, handheld personal computers (PCs), personal digital assistants, engineering workstations, servers, discrete servers, network devices, network hubs, switches, routers, embedded processors, digital signal processors (DSPs), graphics devices, video game devices, set-top boxes, microcontrollers, cellular phones, portable media players, handheld devices, and various other electronic devices are also suitable. In general, various systems or electronic devices capable of incorporating processors and / or other execution logic as disclosed herein are generally suitable.

[0104] Figure 4 The illustration illustrates an example computing system. The multiprocessor system 400 is an interfaced system and includes multiple processors or cores, including a first processor 470 and a second processor 480 coupled via an interface 450 such as a point-to-point (PP) interconnect, architecture, and / or bus. In some examples, the first processor 470 and the second processor 480 are homogeneous. In some examples, the first processor 470 and the second processor 480 are heterogeneous. Although the example system 400 is shown as having two processors, the system can have three or more processors, or it can be a single-processor system. In some examples, the computing system is a system-on-a-chip (SoC).

[0105] Processors 470 and 480 are shown as including integrated memory controller (IMC) circuitry 472 and 482, respectively. Processor 470 also includes interface circuitry 476 and 478; similarly, the second processor 480 includes interface circuitry 486 and 488. Processors 470 and 480 can exchange information via interface 450 using interface circuitry 478 and 488. IMCs 472 and 482 couple processors 470 and 480 to corresponding memories, namely memories 432 and 434, which may be portions of the main memory locally attached to the respective processor.

[0106] Processors 470 and 480 can each exchange information with network interface (NWI / F) 490 via separate interfaces 452 and 454 using interface circuits 476, 494, 486, and 498, respectively. Network interface 490 (e.g., one or more of an interconnect, bus, and / or structure, and in some examples, a chipset) can optionally exchange information with coprocessor 438 via interface circuit 492. In some examples, coprocessor 438 is a dedicated processor, such as, for example, a high-throughput processor, a network or communication processor, a compression engine, a graphics processor, a general-purpose graphics processing unit (GPGPU), a neural-network processing unit (NPU), an embedded processor, etc.

[0107] A shared cache (not shown) may be included in either of the processors 470, 480, or external to these processors but connected to them via an interface (such as a PP interconnect) such that if the processors are placed in a low-power mode, the local cache information of either or both processors may be stored in the shared cache.

[0108] Network interface 490 may be coupled to first interface 416 via interface circuitry 496. In some examples, first interface 416 may be an interface such as a Peripheral Component Interconnect (PCI) interconnect, a PCI Express interconnect, or another I / O interconnect. In some examples, first interface 416 is coupled to power control unit (PCU) 417, which may include circuitry, software, and / or firmware for performing power management operations in relation to processors 470, 480, and / or coprocessor 438. PCU 417 provides control information to a voltage regulator (not shown) to cause the voltage regulator to generate an appropriate regulated voltage. PCU 417 also provides control information to control the generated operating voltage. In various examples, PCU 417 may include various power management logic units (circuits) for performing hardware-based power management. Such power management can be entirely controlled by the processor (e.g., by various processor hardware, and it can be triggered by workload and / or power, thermal constraints, or other processor constraints), and / or power management can be performed in response to external sources (such as platform or power management sources or system software).

[0109] The PCU 417 is illustrated as logic separate from processors 470 and / or 480. In other cases, the PCU 417 may execute on one or more cores of processors 470 or 480 (not shown). In some cases, the PCU 417 may be implemented as a (dedicated or general-purpose) microcontroller or other control logic configured to execute its own dedicated power management code (sometimes referred to as P-code). In still other examples, the power management operations to be performed by the PCU 417 may be implemented externally to the processor, such as through a separate power management integrated circuit (PMIC) or another component external to the processor. In still other examples, the power management operations to be performed by the PCU 417 may be implemented within the BIOS or other system software.

[0110] Various I / O devices 414 may be coupled to a first interface 416 via a bus bridge 418, which in turn couples the first interface 416 to a second interface 420. In some examples, one or more additional processors 415 (such as coprocessors, high-throughput many-integrated-core (MIC) processors, GPGPUs, accelerators (such as graphics accelerators or digital signal processing (DSP) units), field-programmable gate arrays (FPGAs), or any other processors) are coupled to the first interface 416. In some examples, the second interface 420 may be a low-pin-count (LPC) interface. Various devices may be coupled to the second interface 420, including, for example, a keyboard and / or mouse 422, communication devices 427, and storage circuitry 428. Storage circuitry 428 may be one or more non-transitory machine-readable storage media as described below, such as a disk drive or other mass storage device that may include instruction / code and data 430. Furthermore, the audio I / O 424 can be coupled to the second interface 420. Note that other architectures besides the point-to-point architecture described above are also possible. For example, a system such as the multiprocessor system 400 can implement a multi-drop interface or other such architectures instead of a point-to-point architecture.

[0111] Example core architecture, processor, and computer architecture.

[0112] Processor cores can be implemented in different ways, for different purposes, and in different processors. For example, such core implementations can include: 1) general-purpose ordered cores designed for general-purpose computing; 2) high-performance general-purpose out-of-order cores designed for general-purpose computing; and 3) dedicated cores designed primarily for graphics and / or scientific (throughput) computing. Different processor implementations can include: 1) CPUs, which include one or more general-purpose ordered cores and / or one or more general-purpose out-of-order cores designed for general-purpose computing; and 2) coprocessors, which include one or more dedicated cores designed primarily for graphics and / or scientific (throughput) computing. These different processors give rise to different computer system architectures, which can include: 1) coprocessors on a separate chip from the CPU; 2) coprocessors in the same package as the CPU but on a separate die; 3) coprocessors on the same die as the CPU (in this case, such coprocessors are sometimes referred to as dedicated logic or dedicated cores, such as integrated graphics and / or scientific (throughput) logic); and 4) system on a chip (SoC), which can be included on the same die as the described CPU (sometimes referred to as one or more application cores or one or more application processors), the coprocessors described above, and additional functionality. An example core architecture is then described, followed by an example processor and computer architecture.

[0113] Figure 5 The diagram illustrates a block diagram of an example processor and / or SoC 500 that may have one or more cores and an integrated memory controller. The solid-line block diagram shows a processor 500 having a single core 502(A), system proxy unit circuitry 510, and a set of one or more interface controller unit circuitry 516, while the optional addition of dashed-line blocks shows an alternative processor 500 having multiple cores 502(A)-502(N), a set of one or more integrated memory controller unit circuitry 514 from the system proxy unit circuitry 510, and a set of dedicated logic 508 and one or more interface controller unit circuitry 516. Note that the processor 500 may be... Figure 4 One of the processors 470 or 480, or the coprocessor 438 or 415.

[0114] Therefore, different implementations of processor 500 may include: 1) a CPU, where dedicated logic 508 is integrated graphics and / or scientific (throughput) logic (which may include one or more cores, not shown), and cores 502(A)-502(N) are one or more general-purpose cores (e.g., general-purpose ordered cores, general-purpose out-of-order cores, or a combination of both); 2) a coprocessor, where cores 502(A)-502(N) are a large number of dedicated cores designed primarily for graphics and / or scientific (throughput); and 3) a coprocessor, where cores 502(A)-502(N) are a large number of general-purpose ordered cores. Thus, processor 500 can be a general-purpose processor, a coprocessor, or a dedicated processor, such as, for example, a network or communication processor, a compression engine, a graphics processor, a GPGPU (General-Purpose Graphics Processing Unit), a high-throughput many integrated cores (MIC) coprocessor (including 30 or more cores), an embedded processor, etc. The processor can be implemented on one or more chips. The processor 500 may be part of one or more substrates and / or may be implemented on one or more substrates using any of a variety of process technologies, such as, for example, complementary metal-oxide-semiconductor (CMOS), bipolar CMOS (BiCMOS), P-type metal-oxide-semiconductor (PMOS), or N-type metal-oxide-semiconductor (NMOS).

[0115] The memory hierarchy includes one or more levels of cache cell circuits 504(A)-504(N) within cores 502(A)-502(N), a collection of one or more shared cache cell circuits 506, and external memory (not shown) coupled to a collection of integrated memory controller cell circuits 514. The collection of one or more shared cache cell circuits 506 may include one or more intermediate levels of cache (such as level 2, L2, level 3, L3, level 4, L4), or other levels of cache (such as last-level cache, LLC), and / or combinations thereof. While in some examples, interface network circuitry 512 (e.g., a ring interconnect) interfaces to dedicated logic 508 (e.g., integrated graphics logic), the collection of one or more shared cache cell circuits 506, and system proxy cell circuitry 510, alternative examples use any number of known techniques for interfaced to such cells. In some examples, consistency is maintained between shared cache unit circuitry 506 and one or more of cores 502(A)-502(N). In some examples, interface controller unit circuitry 516 couples core 502 to one or more other devices 518, such as one or more I / O devices, storage devices, one or more communication devices (e.g., wireless networks, wired networks, etc.).

[0116] In some examples, one or more of cores 502(A)-502(N) are capable of multithreading. System agent unit circuitry 510 includes those components that coordinate and operate cores 502(A)-502(N). System agent unit circuitry 510 may include, for example, power control unit (PCU) circuitry and / or display unit circuitry (not shown). The PCU may be, or may include, the logic and components required to regulate the power state of cores 502(A)-502(N) and / or dedicated logic 508 (e.g., integrated graphics logic). Display unit circuitry is used to drive one or more externally connected displays.

[0117] Cores 502(A)-502(N) can be homogeneous in terms of instruction set architecture (ISA). Alternatively, cores 502(A)-502(N) can be heterogeneous in terms of ISA; that is, a subset of cores 502(A)-502(N) may be able to execute an ISA, while other cores may be able to execute only a subset of that ISA or another ISA.

[0118] Example core architecture—ordered and out-of-order core block diagrams.

[0119] Figure 6A This is a block diagram illustrating both the example ordered pipeline and the example register renaming and out-of-order issue / execution pipeline. Figure 6B This is a block diagram illustrating both the example ordered architecture core and the example register renaming, out-of-order issue / execution architecture core to be included in the processor, based on the example. Figures 6A-6B The solid-line boxes in the diagram illustrate ordered pipelines and ordered cores, while the optional additions of dashed boxes illustrate register renaming, out-of-order issue / execution pipelines, and cores. Since the ordered aspect is a subset of the out-of-order aspect, the out-of-order aspect will be described.

[0120] exist Figure 6A In this processor pipeline 600, a fetch phase 602, an optional length-decode phase 604, a decode phase 606, an optional alloc phase 608, an optional rename phase 610, a scheduling (also called dispatch or issue) phase 612, an optional register read / memory read phase 614, an execution phase 616, a write-back / memory write phase 618, an optional exception handling phase 622, and an optional commit phase 624 can be performed in each of these processor pipeline phases. For example, during the fetch phase 602, one or more instructions are fetched from instruction memory, and during the decode phase 606, the fetched instructions can be decoded, an address using the forwarded register port (e.g., a load store unit (LSU) address) can be generated, and branch forwarding (e.g., an immediate offset or a link register (LR)) can be performed. In one example, the decode phase 606 and the register read / memory read phase 614 can be combined into a single pipeline phase. In one example, during execution phase 616, decoded instructions can be executed, LSU address / data pipelined to the Advanced Microcontroller Bus (AMB) interface can be executed, multiplication and addition operations can be executed, arithmetic operations with branched results can be executed, and so on.

[0121] As an example, Figure 6BThe core of the example register renaming, out-of-order issue / execution architecture can be implemented as follows: pipeline 600: 1) instruction fetch circuit 638 executes fetch phase 602 and length decoding phase 604; 2) decoding circuit 640 executes decoding phase 606; 3) rename / allocator unit circuit 652 executes allocation phase 608 and rename phase 610; 4) (one or more) scheduler circuits 656 execute scheduling phase 612; 5) (one or more) physical register file circuits 658 and memory unit circuits 670 execute register read / memory read phase 614; (one or more) execution cluster 660 executes execution phase 616; 6) memory unit circuit 670 and (one or more) physical register file circuits 658 execute write-back / memory write phase 618; 7) various circuits may be involved in exception handling phase 622; and 8) retirement unit circuit 654 and (one or more) physical register file circuits 658 execute commit phase 624.

[0122] Figure 6B A processor core 690 is shown, which includes a front-end unit circuitry 630 coupled to an execution engine unit circuitry 650, and both of these are coupled to a memory unit circuitry 670. The core 690 can be a reduced instruction set architecture computing (RISC) core, a complex instruction set architecture computing (CISC) core, a very long instruction word (VLIW) core, or a hybrid or alternative core type. Alternatively, the core 690 can be a dedicated core, such as, for example, a network or communication core, a compression engine, a coprocessor core, a general-purpose computing graphics processing unit (GPGPU) core, a graphics core, and so on.

[0123] Front-end unit circuitry 630 may include branch prediction circuitry 632 coupled to instruction cache circuitry 634, which is coupled to translation lookaside buffer (TLB) 636, which is coupled to instruction fetch circuitry 638, which is coupled to decode circuitry 640. In one example, instruction cache circuitry 634 is included in memory unit circuitry 670 instead of front-end unit circuitry 630. Decoding circuitry 640 (or decoder) can decode instructions and generate one or more micro-operations, microcode entry points, microinstructions, other instructions, or other control signals as output, which are decoded from, or otherwise reflect, the original instruction or derived from the original instruction. Decoding circuitry 640 may further include address generation unit (AGU, not shown) circuitry. In one example, the AGU generates an LSU address using the forwarded register port and can further perform branch forwarding (e.g., immediate offset branch forwarding, LR register branch forwarding, etc.). The decoding circuit 640 can be implemented using various mechanisms. Examples of suitable mechanisms include, but are not limited to, lookup tables, hardware implementations, programmable logic arrays (PLAs), microcode read-only memory (ROMs), etc. In one example, core 690 includes a microcode ROM (not shown) or other medium (e.g., within the decoding circuit 640, or otherwise within the front-end unit circuit 630) storing microcode for certain macro instructions. In one example, the decoding circuit 640 includes micro-ops or operation caches (not shown) to retain / cacherate decoded operations, microtags, or micro-operations generated during the decoding phase or other phases of the processor pipeline 600. The decoding circuit 640 may be coupled to the rename / allocator unit circuit 652 in the execution engine circuitry 650.

[0124] The execution engine circuitry 650 includes a renaming / allocator unit circuitry 652 coupled to a retirement unit circuitry 654 and a collection of one or more scheduler circuits 656. The scheduler circuits 656 represent any number of different schedulers, including reservation stations, central instruction windows, etc. In some examples, the scheduler circuits 656 may include an arithmetic logic unit (ALU) scheduler / scheduling circuit, an ALU queue, an address generation unit (AGU) scheduler / scheduling circuit, an AGU queue, etc. The scheduler circuits 656 are coupled to one or more physical register file circuits 658. Each physical register file circuit in the physical register file circuits 658 represents one or more physical register files, where different physical register files store one or more different data types, such as scalar integers, scalar floating-point numbers, compressed integers, compressed floating-point numbers, vector integers, vector floating-point numbers, status (e.g., an instruction pointer as the address of the next instruction to be executed), etc. In one example, one or more physical register file circuits 658 include vector register unit circuits, write mask register unit circuits, and scalar register unit circuits. These register units can provide architectural vector registers, vector mask registers, general-purpose registers, etc. One or more physical register file circuits 658 are coupled to retirement unit circuits 654 (also called a retirement queue) to illustrate various ways register renaming and out-of-order execution can be implemented (e.g., using one or more reorder buffers (ROBs) and one or more retirement register files; using one or more future files, one or more history buffers, and one or more retirement register files; using register mappings and register pools, etc.). Retirement unit circuits 654 and one or more physical register file circuits 658 are coupled to one or more execution clusters 660. One or more execution clusters 660 include a set of one or more execution unit circuits 662 and a set of one or more memory access circuits 664. One or more execution unit circuits 662 can perform various arithmetic, logical, floating-point, or other types of operations (e.g., shift, addition, subtraction, multiplication) and can perform operations on various data types (e.g., scalar integers, scalar floating-point, compressed integers, compressed floating-point, vector integers, vector floating-point). While some examples may include multiple execution units or execution unit circuits dedicated to a particular function or set of functions, other examples may include only one execution unit circuit or multiple execution units / execution unit circuits that all perform all functions.One or more scheduler circuits 656, one or more physical register file circuits 658, and one or more execution clusters 660 are shown as possibly multiple, because some examples create separate pipelines for certain types of data / operations (e.g., scalar integer pipelines, scalar floating-point / compact integer / compact floating-point / vector integer / vector floating-point pipelines, and / or memory access pipelines each having their own scheduler circuitry, one or more physical register file circuits, and / or execution clusters—and in the case of separate memory access pipelines, some examples implement where only the execution cluster of that pipeline has one or more memory access unit circuits 664). It should also be understood that, in the case of using separate pipelines, one or more of these pipelines can be issued / executed out of order, and the remaining pipelines can be issued / executed in an ordered manner.

[0125] In some examples, the execution engine unit circuit 650 can perform load store unit (LSU) address / data pipelined to the Advanced Microcontroller Bus (AMB) interface (not shown), as well as address phased-out and write-back, data phased-out loading, storage, and branching.

[0126] A set of memory access circuitry 664 is coupled to memory cell circuitry 670, which includes data TLB circuitry 672 coupled to data cache circuitry 674, which is coupled to level 2 (L2) cache circuitry 676. In one example, memory access circuitry 664 may include load cell circuitry, memory address cell circuitry, and memory data cell circuitry, each coupled to data TLB circuitry 672 in memory cell circuitry 670. Instruction cache circuitry 634 is further coupled to level 2 (L2) cache circuitry 676 in memory cell circuitry 670. In one example, instruction cache 634 and data cache 674 are combined into L2 cache circuitry 676, level 3 (L3) cache circuitry (not shown), and / or a single instruction and data cache (not shown) in main memory. L2 cache circuitry 676 is coupled to one or more other levels of cache and ultimately to main memory.

[0127] Core 690 may support one or more instruction sets (e.g., x86 instruction set architecture (optionally with some extensions added with newer versions); MIPS instruction set architecture; ARM instruction set architecture (optionally with optional additional extensions such as NEON)), which include the instructions(s) described herein. In one example, Core 690 includes logic for supporting compact data instruction set architecture extensions (e.g., AVX1, AVX2), thereby allowing the use of compact data to perform operations used by many multimedia applications.

[0128] Example (one or more) execution unit circuits.

[0129] Figure 7 The diagram illustrates an example of one or more execution unit circuits, such as Figure 6B The execution unit circuit(s) 662(s) may include one or more ALU circuits 701, optional vector / single instruction multiple data (SIMD) circuits 703, load / store circuits 705, branch / jump circuits 707, and / or floating-point unit (FPU) circuits 709. The ALU circuits 701 perform integer arithmetic and / or Boolean operations. The vector / SIMD circuits 703 perform vector / SIMD operations on compressed data (such as SIMD / vector registers). The load / store circuits 705 execute load and store instructions to load data from memory into registers or store data from registers into memory. The load / store circuits 705 may also generate addresses. The branch / jump circuits 707 cause branches or jumps to memory addresses depending on the instruction. The FPU circuits 709 perform floating-point arithmetic. The width of the (one or more) execution unit circuits 662 varies depending on the example and can range from, for example, 16 bits to 1024 bits. In some examples, two or more smaller execution units are logically combined to form a larger execution unit (e.g., two 128-bit execution units are logically combined to form a 256-bit execution unit).

[0130] Program code can be applied to input information to perform the functions described herein and generate output information. The output information can be applied to one or more output devices in a known manner. For the purposes of this application, the processing system includes any system having a processor, such as, for example, a digital signal processor (DSP), a microcontroller, an application-specific integrated circuit (ASIC), a field-programmable gate array (FPGA), a microprocessor, or any combination thereof.

[0131] The program code can be implemented using a high-level procedural or object-oriented programming language to communicate with the processing system. If necessary, the program code can also be implemented using assembly or machine language. In fact, the mechanisms described in this document are not limited to any particular programming language. In any case, the language can be a compiled or interpreted language.

[0132] Examples of the mechanisms disclosed herein can be implemented in hardware, software, firmware, or a combination of such implementations. Examples can be implemented as computer programs or program code that execute on a programmable system, including at least one processor, a storage system (including volatile and non-volatile memories and / or storage elements), at least one input device, and at least one output device.

[0133] One or more aspects of at least one example can be implemented by representational instructions stored on a machine-readable medium, which represent various logics within a processor, and which, when read by a machine, cause the machine to manufacture logic for performing the techniques described herein. Such representations, referred to as “intellectual property (IP) cores,” can be stored on tangible machine-readable media and can be supplied to various customers or production facilities for loading into manufacturing machines that manufacture the logic or processor.

[0134] Such machine-readable storage media may include, but are not limited to, non-transitory tangible arrangements of articles made or formed by a machine or device, including storage media such as: hard disks; any other type of disk (including floppy disks, optical disks, compact disk read-only memory (CD-ROM), compact disk rewritable (CD-RW), and magneto-optical disks); semiconductor devices (such as read-only memory (ROM), random access memory (RAM) such as dynamic random access memory (DRAM) and static random access memory (SRAM), erasable programmable read-only memory (EPROM), flash memory, electrically erasable programmable read-only memory (EEPROM); phase change memory (PCM); magnetic cards or optical cards; or any other type of medium suitable for storing electronic instructions.

[0135] Therefore, examples also include non-transitory tangible machine-readable media containing instructions or design data, such as a Hardware Description Language (HDL), which defines the features of the architectures, circuits, devices, processors, and / or systems described herein. Such examples may also be referred to as program products.

[0136] Simulation (including binary translation, code transformation, etc.).

[0137] In some cases, instruction translators can be used to translate instructions from a source instruction set architecture to a target instruction set architecture. For example, an instruction translator can translate (e.g., using static binary translation, including dynamic binary translation with dynamic compilation), transform, emulate, or otherwise convert instructions into one or more other instructions to be processed by the kernel. Instruction translators can be implemented in software, hardware, firmware, or a combination thereof. Instruction translators can be on the processor, off the processor, or partially on and partially off the processor.

[0138] Figure 8This is a block diagram illustrating the use of a software instruction converter to convert binary instructions in a source ISA into binary instructions in a target ISA, according to an example. In the illustrated example, the instruction converter is a software instruction converter; however, alternatively, the instruction converter can be implemented using software, firmware, hardware, or various combinations thereof. Figure 8 It is shown that a program employing a high-level language 802 can be compiled using a first ISA compiler 804 to generate first ISA binary code 806 that can be natively executed by a processor 816 having at least one first ISA core. A processor 816 having at least one first ISA core means any processor capable of performing substantially the same function as an Intel® processor having at least one first ISA core by compatiblely executing or otherwise processing (1) a substantial portion of the first ISA or (2) a version of object code for an application or other software targeted to run on an Intel processor having at least one first ISA core, in order to achieve substantially the same results as a processor having at least one first ISA core. The first ISA compiler 804 means a compiler operable to generate first ISA binary code 806 (e.g., object code) that can be executed on a processor 816 having at least one first ISA core, with or without additional linking processing. Similarly, Figure 8 It is shown that a program employing a high-level language 802 can be compiled using an alternative ISA compiler 808 to generate alternative ISA binary code 810 that can be natively executed by a processor 814 without a first ISA core. An instruction converter 812 is used to translate the first ISA binary code 806 into code that can be natively executed by a processor 814 without a first ISA core. This translated code does not need to be identical to the alternative ISA binary code 810; however, the translated code will perform general operations and consist of instructions from the alternative ISA. Thus, the instruction converter 812 represents, through emulation, simulation, or any other process, software, firmware, hardware, or a combination thereof that allows a processor or other electronic device without a first ISA processor or core to execute the first ISA binary code 806.

[0139] References to "an example," "example," "an embodiment," "embodiment," etc., indicate that the described example or embodiment may include a particular feature, structure, or characteristic, but each example or embodiment may not necessarily include that particular feature, structure, or characteristic. Furthermore, such phrases do not necessarily refer to the same example or embodiment. Further, when a particular feature, structure, or characteristic is described in connection with an example or embodiment, it is considered that the influence of such feature, structure, or characteristic on such feature, structure, or characteristic in conjunction with other examples or embodiments, whether explicitly described or not, is within the knowledge scope of those skilled in the art.

[0140] Furthermore, in the examples described above, unless otherwise specifically indicated, separating language such as the phrases “at least one of A, B, or C” or “A, B, and / or C” is intended to be understood as referring to A, B, or C, or any combination thereof (i.e., A and B, A and C, B and C, and A, B, and C). As used in this specification and claims, and unless otherwise specified, the use of ordinal adjectives such as “first,” “second,” “third,” etc., used to describe elements merely indicates a specific instance of the element being referenced or a different instance of similar elements, and is not intended to imply that these elements so described must be in a particular order, temporally, spatially, hierarchically, or otherwise. Additionally, as used in the description of embodiments, the “ / ” character between items may indicate that the described content may include the first item and / or the second item (and / or any other additional item), or may be implemented using, utilizing, and / or based on the first item and / or the second item (and / or any other additional item).

[0141] Furthermore, the terms “bit,” “flag,” “field,” “entry,” “indicator,” etc., can be used to describe storage locations, tables, databases, or other data structures in registers of any type or content, whether implemented in hardware or software, and these terms are not intended to limit the embodiments to any particular type of storage location or the number of bits or other elements within any particular storage location. For example, the term “bit” can be used to refer to a bit location within a register and / or the data stored or to be stored in that bit location. The term “clear” can be used to indicate storing a logic value of 0 in a storage location, or otherwise storing a logic value of 0 in a storage location; and the term “set” can be used to indicate storing a logic value of 1, all 1s, or some other specified value in a storage location, or otherwise storing a logic value of 1, all 1s, or some other specified value in a storage location; however, these terms are not intended to limit the embodiments to any particular logical convention, as any logical convention can be used within the embodiments.

[0142] Therefore, the specification and drawings should be considered illustrative rather than restrictive. However, it will be apparent that various modifications and changes can be made to this disclosure without departing from the broader spirit and scope of the disclosure as set forth in the claims.

Claims

1. An apparatus for selectively controllable memory tag inspection, comprising: An instruction decoder circuit is used to decode a first instruction, which is used to reference a memory location via a tagged pointer. as well as An execution circuit, coupled to the instruction decoder circuit, is configured to perform one or more memory tag checking operations in response to the first instruction, wherein: The one or more memory tag checking operations include referencing an entry location to find a first tag value, and comparing the first tag value with a second tag value provided by the tagged pointer; and The entry location is in a first sub-region of the memory area to be reserved for the tag table. The first sub-region is in a first set of sub-regions of the memory area, which is used to include only sub-regions submitted to the tag storage. The memory area to be reserved for the tag table is also used to include a second set of sub-regions, which is used to include only sub-regions not submitted to the tag storage.

2. The apparatus according to claim 1, wherein, The first set is used to scale up as needed when initializing memory tags.

3. The apparatus according to claim 1, wherein, The one or more memory tagging check operations also include causing an anomaly in response to a mismatch between the first tag value and the second tag value.

4. The apparatus according to claim 1, wherein, The memory location is used to reference a linear address in the linear address space.

5. The apparatus according to claim 4, wherein, The linear address is used to locate the first tag value.

6. The apparatus according to claim 5, wherein, The first sub-region is a page in linear memory.

7. The apparatus according to claim 6, wherein, The page has a size of 4KB.

8. The apparatus according to claim 7, wherein, Finding the first tag value involves calculating a scaled address by dividing the distance of the linear address from the lowest address in the linear address space by a first number, the first number being based on the size of the memory location and the size of the first tag value.

9. The apparatus according to claim 8, wherein, The memory location is 16 bytes in size.

10. The apparatus according to claim 9, wherein, The first tag value is four bits in size.

11. The apparatus according to claim 10, wherein, The first number is 32.

12. The apparatus according to claim 11, wherein, The first sub-region includes tag storage space for covering 32 data pages.

13. The apparatus according to claim 12, wherein, The linear address space has a first size, and the memory region to be reserved for the tag table has a second size, wherein the second size is the first size divided by 128K.

14. The apparatus of claim 13, further comprising a register for storing the base address of the tag table.

15. The apparatus according to claim 14, wherein, Finding the first tag value also includes adding the scaled address to the base address.

16. The apparatus according to claim 15, wherein, The linear address is in the first linear address space among multiple linear address spaces, and the memory region to be reserved for the tag table is in the first linear address space among multiple linear address spaces.

17. A method for selectively controllable memory tag inspection, comprising: The first instruction is decoded, which is used to reference a memory location via a tagged pointer; as well as In response to the first instruction, one or more memory tag checking operations are performed, wherein: The one or more memory tag checking operations include referencing an entry location to find a first tag value, and comparing the first tag value with a second tag value provided by the tagged pointer; and The entry location is in a first sub-region of the memory area to be reserved for the tag table. The first sub-region is in a first set of sub-regions of the memory area, which is used to include only sub-regions submitted to the tag storage. The memory area to be reserved for the tag table is also used to include a second set of sub-regions, which is used to include only sub-regions not submitted to the tag storage.

18. The method of claim 17, further comprising scaling up the first set as needed during memory tag initialization.

19. A non-transitory machine-readable medium storing instructions, the instructions including first instructions, which, when decoded by a machine, cause the machine to perform a method for selectively controllable memory tag checking, the method comprising: The entry location is referenced to find the first tag value, and the first instruction is used to reference the memory location via a tagged pointer; as well as The first tag value is compared with the second tag value provided by the tagged pointer; where: The first instruction references a memory location via the tagged pointer; and The entry location is in a first sub-region of the memory area to be reserved for the tag table. The first sub-region is in a first set of sub-regions of the memory area, which is used to include only sub-regions submitted to the tag storage. The memory area to be reserved for the tag table is also used to include a second set of sub-regions, which is used to include only sub-regions not submitted to the tag storage.

20. The non-transitory machine-readable medium according to claim 19, wherein, The method further includes scaling up the first set as needed when initializing memory tags.