Snoop filter using a non-aggregated vector table
Patent Information
- Authority / Receiving Office
- JP · JP
- Patent Type
- Applications
- Current Assignee / Owner
- MICROSOFT TECHNOLOGY LICENSING LLC
- Filing Date
- 2024-06-12
- Publication Date
- 2026-06-19
Smart Images

Figure 2026519981000001_ABST
Abstract
Description
Background Art
[0001] Background
[0001] Processor-based devices may include a plurality of processing elements (PEs) (e.g., processor cores as a non-limiting example) that each provide one or more local caches for storing frequently accessed data. Since the plurality of PEs of a processor-based device may share memory resources such as system memory, multiple copies of shared data read from a given memory address may exist simultaneously within the system memory and within the local caches of the PEs. Thus, to ensure that all PEs have a consistent view of the shared data, the processor-based device provides support for a cache coherence protocol to enable local changes to the shared data within a PE to be propagated to other PEs.
Summary of the Invention
[0002] Summary
[0002] The described technology involves receiving, from an agent, a request to access a coherence granularity and assign an agent ID to one of a plurality of SFT entries in a snooping filter (SFT), performing a tag lookup function on the tag of the coherence granularity in the SFT to find a matching SFT entry, where the matching SFT entry tracks the tag of the coherence granularity, determining the number n of agents tracked by the matching SFT entry, and storing a DVT index in the tracking information field of the matching SFT entry in response to determining that the number n of agents tracked by the matching entry exceeds a threshold, where the DVT index selects a DVT entry in a non-aggregated vector table (DVT), and the selected DVT entry is configured to hold a tracking vector for tracking agents that cache the coherence granularity of the matching SFT entry.
[0003]
[0003] This summary is provided in a simplified form to introduce selected concepts from those further described below in the detailed description. This summary is not intended to identify any major or essential features of the subject matter described in the claims, nor is it intended to be used to limit the scope of the subject matter described in the claims.
[0004]
[0004] Other implementation forms are also described and mentioned herein. [Brief explanation of the drawing]
[0005] Brief explanation of the drawing [Figure 1]
[0005] This shows one implementation of a system that provides cache coherence using a snoop filter. [Figure 2]
[0006] An example of the structure of a snoop filter entry implementing the technology disclosed herein is shown. [Figure 3]
[0007] An example of a tracking mode for SFT entries in the cache coherence system disclosed herein is shown. [Figure 4]
[0008] Examples of values for various fields in the SFT entries of the cache coherence system disclosed herein are shown. [Figure 5]
[0009] This example demonstrates how an agent might need to check the SFT to determine if it desires access to coherence granularity and if snooping is required. [Figure 6]
[0010] This shows an example of how to operate when an SFT update is required. [Figure 7]
[0011] This example illustrates how the agent behaves when it newly caches a coherence-granularity copy that is not currently tracked by the SFT. [Figure 8]
[0010] An example of how the agent operates when it newly caches a copy of the coherence granularity currently being tracked by the SFT is shown. [Figure 9]
[0013] An example of how the agent communicates that coherence granularity has been removed from the cache is shown. [Figure 10]
[0014] An example of a system that may be useful in implementing the cache coherence system disclosed herein is shown. [Modes for carrying out the invention]
[0006] Detailed explanation
[0015] The implementations disclosed herein disclose a multiprocessor system employing hardware-forced cache coherency in which, when an agent such as a CPU or GPU wishes to access a memory location, the hardware automatically determines whether another agent currently holds a copy of that memory location. If the access is a read and the memory location is cached by another agent, the system memory may become obsolete, in which case the access must be satisfied by retrieving data from the other agent's cache. If the access is a write, any modified cached copy must first be written back to system memory, or overwritten / merged with the data being written. The memory block in which hardware-forced cache coherency is maintained is called the coherence granularity (cogran), and the system may match the size of its coherence granularity to the cache line size.
[0007]
[0016] In some implementations, the system may maintain a list of which agents currently cache the coherence granularity. In other implementations, there may not be a central coherence directory that needs to be maintained; instead, all agents are asked to determine whether they hold a copy of the coherence granularity in their respective caches during the process of handling a requested memory access. This query is commonly referred to as snooping. An over-snooping situation occurs when an agent is snooped and a coherence granularity is searched for in its cache, and that agent does not currently hold a copy of that coherence granularity. Over-snooping is functionally unproductive and unnecessarily confuses the agent. The systems disclosed herein disclose advantageous implementations that use snoop filters (SFTs) to facilitate the mitigation of over-snooping. Such implementations reduce the over-snooping penalty in terms of additional latency for memory access, functionally unproductive consumption of interconnect bandwidth, and wasted energy performing unnecessary cache lookups on over-snooped agents. Snoop filters can be thought of as a higher-level, comprehensive set-associative cache that aims to track the entire set of coherence granularities held by lower-level caches that do not hold data but need to maintain cache coherence.
[0008]
[0017] A non-precise snoop filter is a filter that tracks whether a coherence granularity is cached by an agent at a given time. While the overhead of tracking a non-precise SFT is smaller than other types, the lack of precision means that over-snooping is likely to occur if snoops need to be sent. The lack of precision also means that the SFT generally loses the ability to detect when a coherence granularity has been removed from all coherent caches.
[0009]
[0018] A precision snoop filter may employ vectors to strictly track which agents are caching a copy of a coherence granularity. Tracking the overhead of a precision SFT can require a relatively large amount of memory to implement, as it includes one bit for each coherence granularity being tracked. In this implementation, when an agent acquires a copy of a coherence granularity to write to its cache, the corresponding vector bit for that agent in the SFT entry tracking that coherence granularity is set. When that agent later deletes a coherence granularity, the corresponding vector bit in the SFT entry tracking that coherence granularity is cleared. This has several advantages over a non-precision SFT: (a) only the exact agents that need to snoop are snooped, and (b) the snoop range can be further reduced over time as individual agents delete coherence granularity from their cache and the SFT is updated accordingly, which only applies to deletions that agents communicate with the SFT.
[0010]
[0019] In a hybrid implementation, the SFT can track exactly (n) agents (typically 2-3) for each coherence granularity by recording an Agent ID (AID) in the tracking information of the SFT entry. The AID can be a unique identifier for each agent tracked by the SFT. For example, the AID can typically be an encoding of the SFT vector location where the agent may be set. Alternatively, the AID can be the agent's interconnect address—the ID used by the interconnect to send messages to that agent. If more than (n) agents cache copies of the coherence granularity, the SFT switches from AID tracking to non-precise tracking. When the hybrid implementation is in AID tracking mode, there is no over-snooping because the SFT entry knows exactly which agents to snoop to. On the other hand, when the hybrid implementation is in non-precise tracking mode, the SFT entry indicates that all agents should be snooped when the coherence granularity is currently held or tracked by the SFT. If the system has many coherent agents (e.g., 128), this method employs less hardware than the precise vector SFT – fewer state bits are needed to record (n) AIDs (if n is sufficiently small) than a larger vector.
[0011]
[0020] In systems with many coherent agents (e.g., 128), excessive snooping due to non-precise tracking is very costly in terms of consumed fabric bandwidth and wasted energy. Furthermore, the need for a large SFT for precision tracking is very costly in terms of space, which also increases the distance snoop (and other) messages travel. The workload associated with many shared data structures or shared instruction pages can quickly deplete the precision AID tracking capability of hybrid methods, potentially leading to the more frequent use of non-precise tracking modes. While some amount of excessive snooping may be tolerable because various non-precise tracking modes generally lack the ability to revert to precision tracking when coherence granularity is removed, snoop filter management itself generates excessive snooping overhead. Specifically, if it is not possible to know that any coherence granularity is no longer cached by any agent, the snoop filter may more frequently send "filter flush" snoops to free up space within the SFT itself so that newly tracked coherence granularity can be placed in the SFT.
[0012]
[0021] Figure 1 discloses one implementation of a cache coherence system 100 using a snoop filter, which improves upon one or more of the implementations described above. Specifically, the cache coherence system 100 may be implemented in a multicore architecture including several central processing unit (CPU) cores 102 and 104, a graphical processing unit (GPU) 106, one or more input / output (I / O) agents 108, a point of serialization (PoS) 110, and memory 114. This example shows two CPU cores and one GPU, but it is understood that any number of CPU cores and CPUs can be used without departing from the scope of this disclosure. Examples of I / O agents 108 include, but are not limited to, industry standard architecture (ISA) devices, peripheral interconnect (PCI) devices, PCI-X devices, PCI Express devices, universal serial bus (USB) devices, advanced technology attachment (ATA) devices, small computer system interface (SCSI) devices, and InfiniBand devices.
[0013]
[0022] The central processing unit cores 102, 104, 106 and I / O agent 108 may be referred to as agents 102-108, each referred to by an agent ID (AID). These agents 102-108 may have multiple levels of internal caches, such as L1, L2, and L3 caches. When agents 102-108 cache coherent and shared memory blocks (coherence granularity) in their internal caches, the snoop filter (SFT) 150 can track the records and locations of these coherence granularities. Any of agents 102-108 may issue coherent or non-coherent requests, and the Point of Serialization (PoS) 110 ensures that the serialization of memory access requests uses the snoop filter 150 to provide memory coherence.
[0014]
[0023] For example, PoS110 receives a coherent request 120 from CPU102. In response to the coherent request 120, PoS110 issues a snoop command 122 to CPU core 104, GPU 106, and I / O agent 108. CPU core 104, GPU 106, and I / O agent 108 may provide the requested coherent information to PoS110. When sending snoop 122, PoS110 refers to SFT150.
[0015]
[0024] An example implementation of SFT150 is shown by SFT150a. SFT150a includes a data structure that tracks agents 102-108 and their addresses, which are currently retrieving copies of any coherence granularity currently cached by agents 102-108. SFT150a can be an n-way set associative organization, as shown by the n array 154. The snoop filter 150a may include an array of entries 152, the contents of which will be described further later. Each of the entries 152 may include a tag field, such as a tag field 164, which is used to store the tag portion of the physical address (PA) that identifies the coherence granularity. For example, if the size of the coherence granularity is 64 bytes and the SFT is a 16-way associative SFT, bits 15:6 of the PA may be used for selecting the SFT set, and bits 47:16 of the PA may be stored as a tag in the tag field 164 of SFT entry 152a. If SFT150a needs to perform a lookup to determine if a coherence granularity PA exists in SFT150a, SFT150a selects one of 1024 sets using PA[15:6]. For the selected set, SFT150a may then compare PA[47:16] with the tag values stored in the tag fields 164 of 16 SFT entries 152 within the selected set (156). If a tag field 164 of any of the 16 SFT entries in the selected set finds a match, that way (e.g., way 5) is now tracking the coherence granularity being examined.
[0016]
[0025] In an implementation form of the cache coherence system disclosed in this specification, the SFT entry 152a can be configured to support a fixed number of implementation-defined precise tracking vectors using an associated non-aggregated vector table (DVT) 190. Various fields of the SFT entry 152a may include a status field 162 that indicates whether the status of the SFT entry is valid or invalid. As discussed above, the tag field 164 is used to store the tag portion of the physical address (PA) that identifies the coherence granularity. Further, the SFT entry 152a also includes a general field 166, a tracking mode field 168, a tracking information field 170, and a protection field 172.
[0017]
[0026] According to an implementation form of the cache coherence system disclosed in this specification, the tracking mode field 168 can take three different values, namely AID, imprecise, and DVT. When the tracking mode is AID 180, the tracking information field 170 can store up to two or more AIDs. The AID tracking mode is a precise tracking mode in which each AID stored in the tracking information field tracks the ID of the owner / sharer of the coherence granularity. A separate AID valid can be used for each AID to indicate which AID field is actively tracking the owner / sharer. The width of the AID is variable. When the tracking mode is imprecise 182, the tracking information field 170 includes a mechanism for imprecisely tracking more agents than the number of AIDs that can be held by the tracking information 170. This indicates an imprecise tracking mode where a limited number of bits in the tracking information field 170 potentially have to track all sharers (e.g., each bit represents a defined set of two or more agents). In this mode, over-snooping tends to occur.
[0018]
[0027] When the tracking mode is DVT184, the tracking information field 170 includes an index to the DVT. This is the precision tracking mode. In this mode, it may be preferable to hold a single AID and the associated AID valid bit, i.e., the tracking information field 170 is of the size "AID + DVT", but this is not necessarily required. This reduces the latency of requests that do not require access to the tracking table information for initial operation. According to an implementation of the cache coherence system disclosed herein, the SFT entry 152a can switch between three tracking modes, namely the AID mode 180, the non-precision mode 182, and the DVT mode 184, depending on the real-time situation and the configuration settings of the SFT entry 152a.
[0019]
[0028] The implementation disclosed herein adds the DVT mode 184 to associate the SFT entry with the DVT entry. Specifically, in the DVT mode 184, the tracking information field 170 includes the DVT index. This enables the customization of the implementation-defined size of the tracking information field 170 in the DVT mode to perform a trade-off. Therefore, the DVT mode 184 can be an AID + DVT option or a DVT-only option. The SFT 150a may include a DVT control block to manage access to the DVT 190 and the availability of the DVT 190.
[0020]
[0029] Figure 2 shows a detailed structure of a snoop filter entry 200 implementing the technology disclosed herein. Specifically, as disclosed herein, the snoop filter entry 200 includes a status field 262 indicating whether the SFT entry 200 is a valid or invalid SFT entry. A tag field 264 is used to store the tag portion of a physical address (PA) that identifies the coherence granularity. A general field 266 may include other general information. A protection field 272 may include an error correction code (ECC) used to ensure the integrity of the SFT entry 200. A tracking mode field 268 may be a two-bit field indicating whether the SFT entry 200 is in AID mode, non-precise mode, or DVT mode.
[0021]
[0030] In one implementation, the tracking information field 270 may be 26 bits long. Depending on the tracking mode, the information stored in the tracking field changes. In AID mode, the tracking information field 270 in this implementation is 13 bits wide and stores up to two AIDs, each containing 12 bits (AID(0) or AID(1)) to identify the AID and 1 bit (VLD(0) and VLD(1)) to indicate whether the AID is currently valid. The DVT tracking mode can be implemented using two options. When option 1, 210a, is used, the SFT entry is shown as functioning in DVT mode using the AID+DVT option. In this mode, the tracking information field 270 stores the AID of one agent and contains 12 bits to identify the AID (AID(0)), 1 bit (VLD(0)) to indicate whether the AID is currently valid, and 13 bits for the DVT index 212a. When option 2, 210b, is used, the SFT entry is shown as functioning in DVT mode, which uses the DVT-only option. In this mode, the tracking information field 270 stores only the DVT index 212b.
[0022]
[0031] DVT220 may include a DVT controller 222 to control access to and management of DVT220. Assuming there are n bits in the DVT index 212, DVT220 may contain up to 2n DVT entries. Specifically, the DVT index 212 may be used to indicate which DVT entry is reserved for use by the SFT entry 200 to hold its tracking vector. Each of the 0 to 2n-1 DVT entries may include a valid field 230 indicating whether the DVT entry is in use. Each DVT entry may also include an m-bit precision vector 232 that can be used to track m agents, where m is the maximum number of agents that can share the coherence granularity associated with this DVT entry.
[0023]
[0032] In one implementation, each DVT entry may also include a protection field 234 to ensure the integrity of the DVT entry. The protection field may be generated using error correction schemes such as parity or ECC for protection against soft errors. However, such a protection field may be optional. The value of the protection field of a DVT entry may be compared to a calculated value of the protection field, and in response to determining that the calculated value of the protection field does not match the value stored in the protection field, the tracking information held by the DVT entry cannot be trusted, and the associated SFT entry must switch its tracking mode to non-precise 182.
[0024]
[0033] The DVT220 can be implemented as a directly mapped structure, indexed by a DVT index in the tracking information field, using memory structures such as flops, register files, and SRAM, which have shared read / write ports. DVT access is delayed compared to lookups involving associated SFT entries.
[0025]
[0034] The DVT controller 222 is configured to manage the availability of DVT entries. The DVT controller 222 may track the total number of available entries as a free list. For a DVT 220 with many entries, the DVT controller 222 may also be aware of one or more available entries at all times to facilitate DVT entry allocation. In such an implementation, new requests request available DVT entries, and subsequent requests for which it is not possible to request a DVT entry should transition the tracking mode of the associated SFT entry in an imprecise manner. Furthermore, the DVT controller 222 is also configured to monitor when a DVT entry is no longer needed. This may occur, for example, when the agent's operation eliminates all sharing or when an SFT entry associated with a DVT entry is sacrificed. In this case, the DVT controller 222 adds the released DVT entry to the free list of available DVT entries.
[0026]
[0035] By separating the precise vector tracking of SFT entries into a separate structure, the space overhead and power consumption associated with the SFT entries 200 are reduced without sacrificing the ability to precisely track many agents based on the implementation-defined number of coherence granularities and the size of the DVT. Assuming a system with 128 agents, a snoop filter with 32K SFT entries, and a DVT 190 with 8K entries, the tracking information overhead for each SFT entry is reduced from 128 bits to 26 bits in the case of the tracking information field 270 in Option 1, 210a (AID+DVT mode), or from 128 bits to 13 bits in the case of the tracking information field 270 in Option 2, 210b (DVT-only mode). Although there is additional space associated with the DVT 190, these implementation forms still result in a large overall space reduction, given the number of SFT entries compared to the number of DVT entries. Any SFT lookup (required for any new memory access) accesses a narrower structure, thus reducing overall power consumption. Furthermore, DVT190 is accessed only when it is known to be necessary, as identified by the tracking mode field 268.
[0027]
[0036] The techniques disclosed herein reduce the amount of tracking information consumed by the SFT, and therefore reduce the required space and power consumption. For example, in an SFT with 32K entries, each having a 128-bit tracking vector, if only 1 / 4 of the SFT entries in use actually have an arbitrary shared coherence granularity (sharing means that two or more agents are simultaneously caching the coherence granularity), then 3 / 4 of the SFT entries in use have a tracking vector with only 1 bit (a single agent, since its coherence granularity is simultaneously shared), which can be "tracked" by using AID, and therefore do not require a large tracking vector for their coherence granularity.
[0028]
[0037] Specifically, in this implementation example, the technique disclosed herein extracts 8K (1 / 4 of 32K) entries into a separate structure called a DVT, and each DVT entry itself tracks all agents associated with the coherence granularity being tracked by the SFT entry using that DVT entry. If an SFT entry needs to use one of the larger tracking vectors because its coherence granularity is shared, it captures the DVT entry to hold that tracking vector for that coherence granularity and then records the assigned DVT entry (=DVT index) so that it knows where to find that tracking vector during future SFT lookups.
[0029]
[0038] Figure 3 shows the tracking mode 300 of an SFT entry in the cache coherence system disclosed herein. As shown herein, each SFT entry independently switches between three tracking modes, namely AID 302, DVT 304, and non-precise 306, depending on the real-time situation and its configuration settings. At the time of allocation, the SFT entry must start in AID mode and later switch to either DVT 304 mode (if a DVT entry is available) or non-precise 306 (if a DVT entry is not available) if the ability to add new AIDs to its tracking is exhausted. Further details regarding the transitions between AID mode 302, DVT mode 304, and non-precise mode 306 will be described later with reference to Figures 5 to 9.
[0030]
[0039] Figure 4 shows an example configuration of the tracking information field 400 of an SFT entry in the cache coherence system disclosed herein, when the SFT entry is in either DVT mode or AID+DVT mode. Here, when the SFT entry is in AID tracking mode, the SFT entry can precisely track two AIDs in two subfields. The first subfield 402 of the tracking information field 400 may store the AID and its valid bits, and the size of the AID is variable. In AID mode, the second subfield 404 may store a second AID and its valid bits, and the size of the second AID is similarly variable. In AID tracking mode, the SFT entry can track two or more agents, and in this mode, a DVT entry is not required.
[0031]
[0040] In DVT tracking mode, the second subfield 404 may store the DVT index. Here, the number of bits in the DVT index depends on the desired number of DVT entries that need to be indexed in the DVT220. For example, a 13-bit DVT index may be used to select from 8K (213) DVT entries. However, the DVT index size does not necessarily have to be 13 bits to fill the second subfield 404. For example, in one implementation, if there are only 1K (210) DVT entries, the size of the DVT index, and therefore the size of the second subfield 404, may be only 10 bits.
[0032]
[0041] In DVT tracking mode, the SFT entry can continue to track one AID in addition to the DVT index. This is advantageous because, for most memory accesses, the SFT entry contains enough information to initiate the next action without reading the DVT. In other words, if a new sharer is performing a load, they can send a snoop to the AID in the first subfield 402, and then need to access the DVT. In these common cases, there is no additional latency to perform that next action.
[0033]
[0042] If the SFT entry has a DVT mode and DVT-only option, it may require only one field 410, the size of which may be the larger of (a) the size of the AID storage + valid bits, and (b) the number of bits required for indexing to the DVT to hold the desired number of DVT entries present in the DVT. The DVT-only option minimizes the size of the tracking information field 400, but DVT 304 or non-precise 306 is required for coherence granularity with any sharing.
[0034]
[0043] Figure 5 shows operation 500 when the agent needs to check the SFT to determine if it wants access to a coherence granularity and if a snoop is required. Using operation 500, if the SFT lookup operation is a hit and the tracking mode of the matching SFT entry is DVT, the DVT entry may be accessed depending on the type of memory access. Operation 504 reads a set of SFT entries from the SFT that may hold the address of the coherence granularity being examined. Operation 506 selects a first SFT entry from the set of SFT entries, and for the selected SFT entry, operation 508 determines whether the state of the selected SFT entry is idle (i.e., invalid). If the state of the selected SFT entry is idle, operation 510 determines whether all entries read in operation 500 have been checked. If all entries have been checked, operation 512 determines that the SFT lookup did not find a matching SFT entry. If not all entries have been checked, operation 514 selects the next entry in the selected set of entries.
[0035]
[0044] If operation 508 determines that the state of the selected SFT entry is not idle, operation 516 determines whether the tag of the selected SFT entry matches the address being looked up. If the tag of the selected SFT entry does not match the address being looked up, control is passed to operation 510. However, if the tag of the selected SFT entry does match the address being looked up, operation 518 determines whether the tracking mode of the matching SFT entry is DVT. If the tracking mode is not DVT, operation 526 contains all the information available to have a view of which agent is caching the coherence granularity of the matching SFT entry.
[0036]
[0045] If the tracking mode of a matching SFT entry is DVT, operation 520 determines whether the tracking information field of the matching SFT entry provides a DVT index that can be used to select a DVT entry that holds the agent's fine tracking vectors that cache the coherence granularity of that SFT entry. Operation 522 then determines whether the DVT information is required for the coherence granularity access that initiated the SFT lookup. If the DVT information is required for the coherence granularity access, operation 524 reads a DVT entry from the DVT using the DVT index associated with the SFT entry.
[0037]
[0046] Figure 6 shows operation 600 for determining which operation flow should be selected for an SFT update. Specifically, operation 600 determines, in the case of SFT access, whether to allocate an entry (further shown below in Figure 7), whether to add the agent to the tracking of existing SFT and / or DVT entries (further shown below in Figure 8), or whether to remove the agent from the tracking of existing SFT and / or DVT entries (further shown below in Figure 9). As shown, operation 600 is performed when it is known that one of the following three conditions is true for the agent and therefore an SFT update is necessary: (a) the agent is accessing a coherence granularity that is not currently tracked by the SFT but should be tracked by the SFT, (b) the agent is newly caching a coherence granularity that is currently tracked by the SFT, or (c) it is known that the agent has deleted a copy of a coherence granularity that is currently tracked by the SFT.
[0038]
[0047] Operation 604 determines whether the agent needs to be added to or removed from the SFT. If the agent needs to be removed, operation 606 removes the agent from tracking existing entries using the "entryUpdateSubtract" flow (further shown in Figure 9 below). If the agent needs to be added, operation 608 determines whether the coherence granularity is currently being tracked by the SFT. If it is currently being tracked, operation 610 adds the agent to tracking existing entries using the "entryUpdateAdd" flow (further shown in Figure 8 below). If it is not currently being tracked, operation 612 allocates the entry using the "entryAllocation" flow (further shown in Figure 7 below).
[0039]
[0048] Figure 7 illustrates operation 700 when an agent newly caches a copy of a coherence granularity that is not currently tracked by the SFT. Specifically, if the SFT needs to add a new coherence granularity to its track for the first time, operation 704 determines whether there is an available SFT entry in the SFT that can accept the new coherence granularity. If so, operation 706 selects one of those available SFT entries. However, if none of the SFT entries are available, operation 712 determines whether the SFT selects a sacrificial SFT entry to remove and then makes place by sending a “filter flush” snoop to all agents indicated by that entry that may hold a copy of the sacrificial coherence granularity. In this case, since the sacrificial coherence granularity has been removed from the SFT, future SFT lookups to find that coherence granularity will miss that SFT. Thus, the SFT uses this to know that it does not need to send a snoop before accessing that coherence granularity. Therefore, any agent currently holding a sacrificial coherence granularity must flush that coherence granularity from its cache if the SFT removes it.
[0040]
[0049] Next, operation 714 determines whether the tracking mode of the sacrificed SFT entry is DVT. If the tracking mode of the SFT entry is not DVT, operation 720 sends a filter flush snoop to all agents that may hold a copy of the coherence granularity. If the tracking mode of the SFT entry is DVT, the filter flush relies on reading the DVT, and the precise tracking vectors from the DVT entries determine which agents should be snooped. Thus, operation 716 reads the tracking information field of the SFT entry and determines the DVT index used to select the precise tracking vectors from the DVT for agents that cache the coherence granularity, and operation 718 reads the DVT entries that hold the precise tracking vectors from the DVT. Operation 720 then sends a filter flush to all agents that may hold a copy of the coherence granularity.
[0041]
[0050] After each of operations 706 or 720, since there is only one agent caching coherence granularity at this point, operation 708 sets the SFT entry state to enabled and the tracking mode to AID for the new coherence granularity added to the SFT. Subsequently, operation 710 records the address of the coherence granularity, the agent AID, and other SFT entry metadata.
[0042]
[0051] Figure 8 shows operation 800 when an agent newly caches a copy of the coherence granularity currently tracked by the SFT. Specifically, operation 800 is used when the SFT needs to add an agent to an existing SFT entry that is already tracking the coherence granularity that the agent intends to cache. Operation 804 determines whether the tracking mode of the SFT entry is AID. If it is AID, operation 806 determines whether the SFT entry is able to record an additional AID in its tracking information field. If the SFT entry is able to record an additional AID in its tracking information field, operation 808 adds the AID of the new agent to the SFT entry.
[0043]
[0052] If the tracking mode of the SFT entry is not AID, operation 810 determines whether the tracking mode of the SFT entry is DVT. If the tracking mode of the SFT entry is DVT, operation 811 retrieves the DVT index from the tracking information field of the SFT entry, and operation 812 reads the DVT entry pointed to by the DVT index. Subsequently, operation 813 sets the tracking vector bit position in the DVT entry for the AID of the new agent, and operation 814 writes the new / updated precision vector to the DVT entry. If the tracking mode of the SFT entry is not DVT, operation 826 adds the new agent to non-precision tracking.
[0044]
[0053] If operation 806 determines that the SFT entry is not capable of recording additional AIDs in the track, operation 816 determines whether a DVT entry is available. If a DVT entry is available, operation 818 sets the tracking mode of the SFT entry to DVT, and operation 820 requests an available DVT entry. Subsequently, operation 822 replaces one or more AIDs in the tracking information field of the SFT entry with DVT indices, for example, replacing the highest numbered AID slot in the tracking information field of the SFT entry with a DVT index. Operation 824 sets the tracking vector bit position in the DVT entry for any currently tracked AID. If operation 816 determines that a DVT entry is not available, operation 830 sets the tracking mode of the SFT entry to non-precise, and operation 832 updates the non-precise track for any currently tracked AID.
[0045]
[0054] Figure 9 shows operation 900 when an agent communicates that it is removing coherence granularity from its own cache. Specifically, operation 900 is used when the SFT needs to remove an agent from an existing SFT entry when it finds that the agent has relinquished its copy of coherence granularity. Operation 904 determines whether the tracking mode of the SFT entry is AID. If the tracking mode of the SFT entry is AID, operation 906 determines whether the AID to be removed from the SFT entry is the only AID remaining in the SFT entry. If the AID to be removed from the SFT entry is the only AID remaining in the SFT entry, operation 908 changes the state of the SFT entry to idle. If the AID to be removed from the SFT entry is not the only AID remaining in the SFT entry, operation 910 removes that AID from the SFT entry.
[0046]
[0055] If operation 904 determines that the tracking mode of the SFT entry is not AID, operation 912 determines whether the tracking mode of the SFT entry is DVT. If the tracking mode of the SFT entry is not DVT, operation 920 determines that the tracking mode of the SFT entry is non-precise. If the tracking mode of the SFT entry is DVT, operation 914 reads the DVT using the DVT index of the SFT entry and obtains the precise tracking vector from the DVT entry associated with the SFT entry. Operation 916 then clears the tracking vector bit positions corresponding to the AID of the agent that is removing coherence granularity.
[0047]
[0056] Next, operation 918 determines whether any precision tracking vector bits are still set. If no precision tracking vector bits are set, operation 926 returns the DVT entry to the free list of available DVT entries, and operation 928 changes the state of the SFT entry to idle. If any precision tracking vector bits are still set, operation 922 determines whether any removed agents are also in the AID slot, and if so, the AID is removed from the AID slot of the SFT entry containing that AID. If an agent is removed from the AID slot, operation 924 re-enters the AID with another agent being tracked, and then operation 930 writes the updated precision vector to the DVT entry. If no agent is removed from the AID slot, operation 930 writes the updated precision vector to the DVT entry.
[0048]
[0057] The cache coherence system disclosed herein significantly reduces the tracking overhead in SFT while still providing some (implementation-defined) number of coherence granularities that can be tracked using precise tracking vectors. Specifically, the cache coherence system disclosed herein utilizes the nature of coherence granularity use in that the amount of tracking information for each coherence granularity is variable because it depends on the number of agents simultaneously sharing the coherence granularity, while most coherence granularities are not widely shared.
[0049]
[0058] Specifically, the cache coherence system disclosed herein uses a directly mapped, non-aggregated vector table (DVT) and controls for managing the DVT. DVT entries are accessed sequentially at a later point in time compared to their associated SFT entries. While this may add latency to certain types of memory access, the additional latency is generally acceptable when three or more agents share the coherence granularity and snooping (which already adds considerable latency to memory transactions) is required. Compared to other cache coherence systems, the system disclosed herein has the advantage of not affecting the SFT associativeness when some / all of the SFT coherence granularity sets up a transition to a precise tracking vector. Specifically, the cache coherence system disclosed herein has greater tolerance to 2AID-sized AID modes, which result in smaller tracking information fields for each SFT entry.
[0050]
[0059] Furthermore, compared to other implementations, the cache coherence system disclosed herein reduces the total number of bits provided to the precision trace vector for each SFT entry, allowing for more efficient use and thus reducing both area overhead and static power. Moreover, in the cache coherence system disclosed herein, dynamic power is also reduced because the DVT is accessed only when necessary, and every SFT lookup (the most common access) accesses fewer bits. This is because not only is the precision trace vector accessed only when necessary, but also only the precision trace vector of the target way is accessed (as opposed to all ways associated with the SFT lookup).
[0051]
[0060] Similarly, compared to implementations of cache coherence systems where every SFT entry may be provided for limited precision + non-precision tracking, the cache coherence system disclosed herein reduces excessive snooping, thereby reducing dynamic power consumption and generally improving performance. In other words, it reduces messages to and from agents that do not typically need to be snooped, reduces cache access within agents, and minimizes agent interruptions.
[0052]
[0061] Figure 10 shows an example system 1100 that may be useful in implementing the cache coherence system 1010 disclosed herein. The example hardware and operating environment of Figure 10 for implementing the described technology includes computing devices such as a general-purpose computing device in the form of a computer 20, a mobile phone, a personal data assistant (PDA), a tablet, a smartwatch, a gaming remote, or other types of computing devices. In the implementation of Figure 10, for example, the computer 20 includes a processing unit 21, system memory 22, and a system bus 23 that operably connects various system components, including from the system memory 22 to the processing unit 21. There may be only one processing unit 21 or there may be two or more processing units 21, so that the processor of the computer 20 includes a single central processing unit (CPU) or multiple processing units, commonly referred to as a parallel processing environment. The computer 20 may be a conventional computer, a distributed computer, or any other type of computer, and the implementation is not limited thereto.
[0053]
[0062] The system bus 23 can be any of several types of bus structures, including a memory bus or memory controller, a peripheral bus, a switchable fabric, a point-to-point connection, and a local bus using a variety of bus architectures. The system memory 22, sometimes simply called memory, includes read-only memory (ROM) 24 and random access memory (RAM) 25. The basic input / output system (BIOS) 26, which includes basic routines that facilitate the transfer of information between elements within the computer 20, such as during startup, is stored in the ROM 24. The computer 20 further includes a hard disk drive 27 for reading and writing to a hard disk (not shown), a magnetic disk drive 28 for reading from or writing to a removable magnetic disk 29, and an optical disk drive 30 for reading from or writing to a removable optical disk 31 such as a CD-ROM, DVD, or other optical medium.
[0054]
[0063] Computer 20 may be used to implement the cache coherence system disclosed herein. In one implementation, a frequency unwrapping module, which includes instructions for unwrapping frequencies based on a sampled reflected modulated signal, may be stored in the memory of computer 20, such as read-only memory (ROM) 24 and random access memory (RAM) 25.
[0055]
[0064] Furthermore, instructions stored in the memory of computer 20 may be used to generate transformation matrices using one or more operations disclosed in Figures 5 to 9. Similarly, instructions stored in the memory of computer 20 may be used to implement one or more operations in Figures 5 to 9. The memory of computer 20 may also include one or more instructions for implementing the cache coherence system disclosed herein.
[0056]
[0065] The hard disk drive 27, magnetic disk drive 28, and optical disk drive 30 are connected to the system bus 23 by hard disk drive interface 32, magnetic disk drive interface 33, and optical disk drive interface 34, respectively. The drives and their associated tangible computer-readable media provide non-volatile storage for computer-readable instructions, data structures, program modules, and other data of the computer 20. Those skilled in the art will understand that any type of tangible computer-readable media may be used in the example operating environment.
[0057]
[0066] The operating system 35, one or more application programs 36, other program modules 37, and several program modules including program data 38 may be stored on a hard disk, magnetic disk 29, optical disk 31, ROM 24, or RAM 25. The user may generate reminders on the personal computer 20 through input devices such as a keyboard 40 and a pointing device 42. Other input devices (not shown) may include a microphone (e.g., for voice input), a camera (e.g., for a natural user interface (NUI)), a joystick, a gamepad, a parabolic antenna, a scanner, etc. These and other input devices are often connected to the processing unit 21 through a serial port interface 46 coupled to the system bus 23, but may also be connected by other interfaces such as a parallel port, game port, or Universal Serial Bus (USB). A monitor 47 or other type of display device is also connected to the system bus 23 via an interface such as a video adapter 48. In addition to the monitor, the computer typically includes other peripheral output devices (not shown) such as speakers and a printer.
[0058]
[0067] Computer 20 may operate in a network environment using logical connections to one or more remote computers, such as remote computers 49. These logical connections are achieved by communication devices coupled to computer 20 or a part thereof, and the implementation is not limited to a specific type of communication device. Remote computers 49 could be another computer, server, router, network PC, client, peer device, or other common network node, typically including many or all of the elements described above with respect to computer 20. The logical connections shown in Figure 10 include local area networks (LANs) 51 and wide area networks (WANs) 52. Such networking environments are common in all types of networks, including office networks, enterprise-scale computer networks, intranets, and the internet.
[0059]
[0068] When used in a LAN networking environment, the computer 20 is connected to the local area network 51 through a network interface or adapter 53, which is a type of communication device. When used in a WAN networking environment, the computer 20 typically includes a modem 54, a network adapter, a certain type of communication device, or any other type of communication device for establishing communication over a wide area network 52. The modem 54 may be internal or external and is connected to the system bus 23 via a serial port interface 46. In a network environment, the program engine shown with respect to the personal computer 20 or a part thereof may be stored in a remote memory storage device. The network connections shown are examples, and it is understood that other means of communication devices may be used to establish communication links between computers.
[0060]
[0069] In one exemplary implementation, software or firmware instructions for the cache coherence system 1010 are stored in system memory 22 and / or storage device 29 or 31 and can be processed by processing unit 21. Cache coherence system operations and data can be stored in system memory 22 and / or storage device 29 or 31 as persistent data stores.
[0061]
[0070] In contrast to tangible computer-readable storage media, intangible computer-readable communication signals can embody computer-readable instructions, data structures, program modules, or other data present in modulated data signals such as carrier waves or other signal transport mechanisms. The term “modulated data signal” means a signal having one or more features that are set or modified to encode information into a signal. Intangible communication signals include, but are not limited to, wired media such as wired networks or direct wired connections, and wireless media such as acoustic, RF, infrared, and other wireless media.
[0062]
[0071] Some embodiments of a cache coherence system may include a product. The product may include a tangible storage medium for storing logic. Examples of storage mediums include one or more types of computer-readable storage mediums capable of storing electronic data, including volatile or non-volatile memory, removable or non-removable memory, erasable or non-erasable memory, writable or rewritable memory, etc. Examples of logic include various software elements such as software components, programs, applications, computer programs, application programs, system programs, machine programs, operating system software, middleware, firmware, software modules, routines, subroutines, functions, methods, procedures, software interfaces, application program interfaces (APIs), instruction sets, computing code, computer code, code segments, computer code segments, words, values, symbols, or any combination thereof. In one embodiment, for example, the product may store executable computer program instructions that, when executed by a computer, cause the computer to perform methods and / or operations according to the embodiments described. An executable computer program instruction may include any suitable type of code, such as source code, compiled code, interpreted code, executable code, static code, or dynamic code. An executable computer program instruction may be implemented according to a predefined computer language, style, or syntax to instruct the computer to perform a specific function. Instructions may be implemented using any suitable high-level, low-level, object-oriented, visual, compiled, and / or interpreted programming language.
[0063]
[0072] The cache coherence systems disclosed herein may include a variety of tangible computer-readable storage media and intangible computer-readable communication signals. Tangible computer-readable storage may be embodied in any available media that can be accessed by the cache coherence systems disclosed herein, and may include both volatile and non-volatile storage media, removable and non-removable storage media. Tangible computer-readable storage media include volatile and non-volatile storage media, removable and non-removable storage media, excluding intangible and transient communication signals, and implemented in any method or technique for storing information such as computer-readable instructions, data structures, program modules, or other data. Tangible computer-readable storage media include, but are not limited to, RAM, ROM, EEPROM, flash memory, or other memory technologies, CD-ROM, digital versatile disk (DVD), or other optical disk storage, magnetic casquettes, magnetic tapes, magnetic disk storage, or other magnetic storage devices, or any other tangible media that can be used to store desired information and can be accessed by the cache coherence systems disclosed herein. In contrast to tangible computer-readable storage media, non-tangible computer-readable communication signals can embody computer-readable instructions, data structures, program modules, or other data present in modulated data signals such as carrier waves or other signal transport mechanisms. The term “modulated data signal” means a signal having one or more features that are set or modified to encode information into a signal. Non-tangible communication signals include, but are not limited to, signals traveling through wired media such as wired networks or direct wired connections, and signals traveling through wireless media such as acoustic, RF, infrared, and other wireless media.
[0064]
[0073] The described technology provides a method for receiving a request from an agent to access coherence granularity and assign an agent ID to one of several SFT entries in a snoop filter (SFT), and for performing a tag lookup function on the coherence granularity tag in the SFT to find a matching SFT entry, the matching SFT entry is tracking the coherence granularity tag, determining the number of agents n tracked by the matching SFT entry, and in response to determining that the number of agents n tracked by the matching entry exceeds a threshold, storing a DVT index in the tracking information field of the matching SFT entry, the DVT index is configured to select a DVT entry in a non-aggregated vector table (DVT), and the selected DVT entry is configured to hold a tracking vector for tracking agents that cache the coherence granularity of the matching SFT entry.
[0065]
[0074] One implementation includes one or more physically manufactured computer-readable storage media that encode computer-executable instructions for executing a computer process on a computer system, wherein the computer process receives a request from an agent to access coherence granularity and assign an agent ID to one of a plurality of SFT entries in a snoop filter (SFT), and performs a tag lookup function on the coherence granularity tag in the SFT to find a matching SFT entry, the matching SFT entry is tracking the coherence granularity tag, find, determine the number n agents tracked by the matching SFT entry, and in response to determining that the number n agents tracked by the matching entry exceeds a threshold, store a DVT index in the tracking information field of the matching SFT entry, the DVT index is configured to select a DVT entry in a non-aggregated vector table (DVT), the selected DVT entry is configured to hold a tracking vector for tracking agents that cache the coherence granularity of the matching SFT entry.
[0066]
[0075] Another implementation provides a system comprising memory, one or more processor units, and a cache coherence system stored in memory and executable by one or more processor units, the cache coherence system encoding computer executable instructions in memory for executing computer processes on one or more processor units, wherein the computer process receives requests from agents to access coherence granularity and assign an agent ID to one of multiple SFT entries in a snoop filter (SFT), and performs a tag lookup function on the tags of the coherence granularity in the SFT to find matching SFT entries. The DVT index is stored in the tracking information field of the matching SFT entry in response to finding that a matching SFT entry is tracking a coherence granularity tag, determining the number of agents n being tracked by the matching SFT entry, and determining that the number of agents n being tracked by the matching entry exceeds a threshold, wherein the DVT index is stored by selecting a DVT entry in a non-aggregated vector table (DVT), and the selected DVT entry is configured to hold a tracking vector for tracking agents that cache the coherence granularity of the matching SFT entry.
[0067]
[0076] The implementations disclosed herein are implemented as logical steps in one or more computer systems. Logical operations may be implemented (1) as a sequence of processor implementation steps executed in one or more computer systems, and (2) as interconnected machines or circuit modules within one or more computer systems. The implementation is selectable depending on the performance requirements of the computer system being used. Therefore, the logical operations constituting the implementations described herein may be referred to as operations, steps, objects, or modules. Furthermore, it should be understood that logical operations may be performed in any order unless otherwise explicitly required or unless a specific order is essentially required by the claims. The foregoing specification, examples, and data, together with the accompanying appendices, provide a detailed description of the structure and use of exemplary implementations.
Claims
1. The agent receives a request to access the coherence granularity and to assign the agent ID to one of the multiple SFT entries (152) in the snoop filter (SFT (150a) (150)), The process involves performing a tag lookup function on the coherence granularity tags in the SFT(150a)(150) to find matching SFT entries(200)(152a), wherein the matching SFT entries(200)(152a) track the coherence granularity tags, and so on. Determine the number of agents n(102) being tracked by the matching SFT entries (200)(152a), In response to determining that the number n(102) of agents being tracked by the matching entries exceeds a threshold, the DVT index(212b)(212a) is stored in the tracking information fields(400)(270)(170) of the matching SFT entries(200)(152a), wherein the DVT index(212b)(212a) is configured to select the DVT(220)(190) entries in the non-aggregated vector table(DVT(304)(220)(190)), and the selected DVT(220)(190) entries are configured to hold tracking vectors for tracking the agents that cache the coherence granularity of the matching SFT entries(200)(152a). A method that includes this.
2. The method according to claim 1, further comprising: determining that the number n of agents being tracked by the matching SFT is below a threshold, storing the agent ID (AID) of the agent accessing the coherence granularity in the tracking information field of the matching SFT entry.
3. The method according to claim 1, further comprising enabling a valid bit in the tracking vector of the DVT entry that corresponds to the agent accessing the coherence granularity.
4. The method according to claim 1, further comprising changing the value of the tracking mode field of a matching SFT entry to DVT in response to storing the DVT index in the tracking information field of the matching SFT entry.
5. Calculating the error correction code (ECC) for the DVT entry, The calculated ECC is compared with the matching value of the protection field of the DVT entry, In response to determining that the calculated ECC does not match the value of the protection field of the DVT entry, the value of the tracking mode field of the matching SFT entry is changed to an imprecise value. The method according to claim 1, further comprising:
6. In response to determining that the number n of agents being tracked by the matching entry exceeds a threshold, it is determined whether a DVT entry is available. In response to determining that the DVT entry is unavailable, the tracking mode of the SFT entry is set to non-precise, In response to determining that a DVT entry is available, the tracking mode of the SFT entry is set to DVT. The method according to claim 1, further comprising:
7. In response to determining that the number n of agents being tracked by the matching entry exceeds a threshold, it is determined whether a DVT entry is available. In response to determining that a DVT entry is available, The tracking mode of the aforementioned SFT entry is set to DVT, Replacing one or more AIDs in the aforementioned tracking information field with the aforementioned DVT index The method according to claim 1, further comprising:
8. Determining that the agent needs to be removed from the SFT entry, In response to determining that the agent needs to be removed from the SFT entry, it is determined whether the tracking mode of the SFT entry is DVT, In response to determining that the tracking mode of the SFT entry is DVT, Reading the precise tracking vector from the DVT entry associated with the aforementioned SFT entry, To clear the tracking vector bit position of the read DVT entry corresponding to the agent removed from the SFT entry. The method according to claim 1, further comprising:
9. The method according to claim 8, further comprising determining whether any precision tracking vector bits are still set in the DVT entry associated with the SFT entry, and if the precision tracking vector bits are no longer set in the DVT entry associated with the SFT entry, returning the DVT entry to the free list of available DVT entries.
10. One or more physically manufactured computer-readable storage media for encoding computer-executable instructions for executing a computer process in a computer system (1100), wherein the computer process is The agent receives a request to access the coherence granularity and to assign the agent ID to one of the multiple SFT entries (152) in the snoop filter (SFT (150a) (150)), The method involves performing a tag lookup function on the coherence granularity tag in the SFT(150a)(150) to find a matching SFT entry(200)(152a), wherein the matching SFT entry(200)(152a) tracks the coherence granularity tag, Determine the number of agents n(102) being tracked by the matching SFT entries (200)(152a), In response to determining that the number n(102) of agents being tracked by the matching entries exceeds a threshold, the DVT index(212b)(212a) is stored in the tracking information fields(400)(270)(170) of the matching SFT entries(200)(152a), wherein the DVT index(212b)(212a) is configured to select the DVT(220)(190) entries in the non-aggregated vector table(DVT(304)(220)(190)), and the selected DVT(220)(190) entries are configured to hold tracking vectors for tracking the agents that cache the coherence granularity of the matching SFT entries(200)(152a). One or more physically manufactured computer-readable storage media, including [the specified term].
11. One or more physically manufactured computer-readable storage media according to claim 10, further comprising the computer process determining that the number n of agents being tracked by the matching SFT is below a threshold, and storing the agent ID (AID) of the agent accessing the coherence granularity in the tracking information field of the matching SFT entry.
12. The computer process further comprises enabling a valid bit in the tracking vector of the DVT entry corresponding to the agent accessing the coherence granularity, one or more physically manufactured computer-readable storage media according to claim 10.
13. The computer process further comprises changing the value of the tracking mode field of the matching SFT entry to DVT in response to storing the DVT index in the tracking information field of the matching SFT entry, one or more physically manufactured computer-readable storage media according to claim 10.
14. The aforementioned computer process Calculating the error correction code (ECC) for the DVT entry, The calculated ECC is compared with the matching value of the protection field of the DVT entry, In response to determining that the calculated ECC does not match the value of the protection field of the DVT entry, the value of the tracking mode field of the matching SFT entry is changed to an imprecise value. One or more physically manufactured computer-readable storage media according to claim 10, further comprising:
15. The aforementioned computer process In response to determining that the number n of agents being tracked by the matching entry exceeds a threshold, it is determined whether a DVT entry is available. In response to determining that the DVT entry is unavailable, the tracking mode of the SFT entry is set to non-precise, In response to determining that a DVT entry is available, the tracking mode of the SFT entry is set to DVT. One or more physically manufactured computer-readable storage media according to claim 10, further comprising:
16. The aforementioned computer process In response to determining that the number n of agents being tracked by the matching entry exceeds a threshold, it is determined whether a DVT entry is available. In response to determining that a DVT entry is available, The tracking mode of the aforementioned SFT entry is set to DVT, Replacing one or more AIDs in the aforementioned tracking information field with the aforementioned DVT index One or more physically manufactured computer-readable storage media according to claim 10, further comprising:
17. Memory and One or more processor units, A cache coherence system (100) stored in the memory (22) and executable by one or more processor units (21), wherein the cache coherence system (100) encodes in the memory (22) computer executable instructions for executing computer processes on one or more processor units. A system including, wherein the computer process is The agent receives a request to access the coherence granularity and to assign the agent ID to one of the multiple SFT entries (152) in the snoop filter (SFT (150a) (150)), The method involves performing a tag lookup function on the coherence granularity tag in the SFT(150a)(150) to find a matching SFT entry(200)(152a), wherein the matching SFT entry(200)(152a) tracks the coherence granularity tag, Determine the number of agents n(102) being tracked by the matching SFT entries (200)(152a), In response to determining that the number n(102) of agents being tracked by the matching entries exceeds a threshold, the DVT index(212b)(212a) is stored in the tracking information fields(400)(270)(170) of the matching SFT entries(200)(152a), wherein the DVT index(212b)(212a) is configured to select the DVT(220)(190) entries in the non-aggregated vector table(DVT(304)(220)(190)), and the selected DVT(220)(190) entries are configured to hold tracking vectors for tracking the agents that cache the coherence granularity of the matching SFT entries(200)(152a). A system that includes this.
18. The system according to claim 17, further comprising the computer process determining that the number n of agents being tracked by the matching SFT is below a threshold, by storing the agent ID (AID) of the agent accessing the coherence granularity in the tracking information field of the matching SFT entry.
19. The system according to claim 17, further comprising the computer process changing the valid bits in the tracking vector of the DVT entry to enable the valid bits corresponding to the agent accessing the coherence granularity.
20. The system according to claim 17, further comprising the computer process changing the value of the tracking mode field of the matching SFT entry to DVT in response to storing the DVT index in the tracking information field of the matching SFT entry.