Method for adaptively and jointly using cache coherence directory entries, and computer program product
By adaptively managing the set of directory entries, the problem of insufficient utilization of directory entries and excessive listening operations in the existing technology is solved, realizing a more efficient cache coherence protocol, reducing hardware overhead and improving lookup efficiency.
Patent Information
- Authority / Receiving Office
- WO · WO
- Patent Type
- Applications
- Current Assignee / Owner
- BEIJING KAPULA TECH CO LTD
- Filing Date
- 2025-11-06
- Publication Date
- 2026-07-02
AI Technical Summary
Existing technologies struggle to effectively utilize the capacity of directory entries and reduce listening operations on all processor cores, resulting in performance bottlenecks and excessive hardware overhead for cache coherence protocols in many-core CPUs.
An adaptive federation method for cache-consistent directory entries is adopted, which utilizes the capacity of directory entries and improves lookup efficiency by adaptively managing the set of directory entries. This includes dynamically allocating and releasing entries in response to processor core requests, and using homogeneous and heterogeneous sets of directory entries to optimize shared records.
Make full use of the directory capacity, reduce listening operations on all processor cores, improve directory entry lookup efficiency, and optimize the performance and hardware overhead of the cache coherence protocol.
Smart Images

Figure CN2025133015_02072026_PF_FP_ABST
Abstract
Description
An adaptive federation method and computer program product for cache consistency catalog entries Technical Field
[0001] This disclosure relates to the field of computers, and in particular to an adaptive federation method and computer program product for cached consistency catalog entries. Background Technology
[0002] Each processor core on a modern many-core CPU has a cache to improve data access speed and ensure ease of program writing. When running a parallel program, multiple processes or threads running on multiple processor cores may read and write data in the same shared memory region. To ensure the correctness of parallel program execution, cache coherence is required between different processor cores of the same CPU and between different CPUs within the same compute node. This ensures that data in the same cache block (cache line) remains consistent across the private caches of multiple processor cores within the same compute node.
[0003] Currently, there are two main types of cache coherence protocols: a snooping-based coherence protocol (snooping protocol) and a directory-structure-based coherence protocol (directory protocol); specifically:
[0004] The implementation of the listening protocol relies on a bus or bus-like network connection (including mesh networks, etc.). Based on this network connection, requests issued by the private cache of a single processor core will be broadcast to the private caches of all other processor cores in the system. Access requests from all processor cores can also be ordered on this bus to achieve the memory access order requirements in the cache consistency model and storage identity model, and to handle multiple conflicting requests for the same data block.
[0005] The directory protocol uses a directory structure to manage the shared access to each cache line. In this protocol, memory access requests from the processor core's private cache are first sent to the directory structure that owns the corresponding cache line. This directory structure records the current sharing status of the cache line, and the controller determines the processor core's private cache or memory to respond to the request based on this current sharing status.
[0006] Both snooping and directory protocols have their advantages and disadvantages. Snooping protocols have lower hardware implementation overhead and lower power consumption, but the competitive access and ordered response of all processor cores to the bus can easily become a bottleneck for parallel performance. Directory protocols enable parallel consistency maintenance of cache blocks at different addresses, thus achieving efficient cache coherence; however, on the one hand, the need to record the shared access states of a large number of cache blocks incurs hardware and power overhead; on the other hand, the process of finding the corresponding record for a cache block in the directory also introduces significant latency.
[0007] Currently, a single CPU can contain hundreds of cores (AMD has released a commercial CPU with 192 cores), and a supercomputer node typically has at least two CPUs, requiring the support of cache coherency across nearly 400 or more processor cores. Neither snooping protocols nor directory protocols can adequately support such a large number of processor cores for cache coherency. Therefore, many-core CPUs mostly use a hybrid directory and snooping approach, which can be represented as a snoop filter. This snoop filter method allows a single directory entry to manage the sharing of multiple contiguous cache blocks, while also allowing for less precise recording of cache block sharing information: for example, recording the processor core number of the owner of one or a few copies of a cache block, and the number of times it has been effectively shared. Without precise recording of cache block sharing information, a cache coherency operation may require snooping across all processor cores. For example, when a directory entry can only record the core numbers of two cache blocks at most, if a cache block is being read and shared by three or more cores, and a core is about to modify this cache block, then it is necessary to listen to almost all processor cores on the CPU.
[0008] Whether using a directory protocol or a hybrid approach, the number of directory entries typically needs to match the total number of private caches across all processor cores. For example, when each directory entry is responsible for recording the sharing of a cache block, the number of directory entries should not be less than the total number of cache blocks in the private cache across all processor cores; otherwise, cache blocks in the private cache may be swapped out due to insufficient directory entries. The method of having each directory entry record the sharing of multiple contiguous cache blocks aims to reduce the number of directory entries and hardware overhead. However, when the memory access behavior of parallel programs is highly random, there may be a situation where there are far too few directory entries. Therefore, it is rare for a CPU to have a surplus of directory entries.
[0009] In current technology, data access by threads in parallel programs is mostly limited to private variables. Although some methods have been published to avoid redundant cache coherence operations, these methods allow access to private variables to bypass the cache coherence protocol. This results in previously insufficient directory entries becoming available, and the more redundant cache coherence operations are reduced, the more available directory entries will become.
[0010] Furthermore, a contradiction may arise in the future: there is a lot of free space in the directory entries, but each directory entry cannot accurately record the sharing status of cache blocks. This makes it necessary to listen to almost all processor cores on the shared cache blocks frequently, but the capacity of the directory cannot be fully utilized. Summary of the Invention
[0011] To address the shortcomings described in the background section and reduce snooping operations on all processor cores, this invention proposes an adaptive federation method for cache coherent catalog entries, an adaptive federation system for cache coherent catalog entries, and a computer program product.
[0012] At least one embodiment of this application provides an adaptive federation method for cache consistency catalog entries, the method comprising:
[0013] In response to a request from the current processor to verify the read ownership of the current cache block, a set of first target directory entries corresponding to the address of the current cache block is obtained from a preset directory;
[0014] If there is a target entry in the first target directory entry set that meets the preset free space conditions, the number information of the current processor core is stored in the target entry;
[0015] If no entry exists in the first target directory entry set, or if no target entry exists in the first target directory entry set with free space for storing the current processor core's ID information, a new entry is requested from the preset directory. If the new entry is successfully requested, the current processor core's ID information is stored in the new entry, and the new entry is stored in the first target directory entry set.
[0016] At least one embodiment of this application also provides an adaptive federated system for cache-coherent directory entries, characterized in that it includes:
[0017] The first target directory entry set determination module is used to obtain the first target directory entry set corresponding to the address of the current cache block from a preset directory in response to a request from the current processor to verify the read ownership of the current cache block.
[0018] The first storage module is used to store the current processor core number information in the target entry when there is a target entry in the first target directory entry set that meets the preset free space conditions.
[0019] The second storage module is used to apply for a new entry from the preset directory when there is no entry in the first target directory entry set, or when there is no target entry in the first target directory entry set with free space for storing the current processor core number information. If the new entry is successfully obtained, the current processor core number information is stored in the new entry, and the new entry is stored in the first target directory entry set.
[0020] At least one embodiment of this application also provides an electronic device, including a memory, a processor, and a computer program stored in the memory, wherein the processor executes the computer program to implement the steps of the method described above.
[0021] At least one embodiment of this application also provides a computer-readable storage medium having a computer program stored thereon that, when executed by a processor, implements the steps of the method described above.
[0022] At least one embodiment of this application also provides a computer program product, including a computer program that, when executed by a processor, implements the steps of the fiber optic detection method as described above.
[0023] The adaptive federation method, system, and computer program product for cache-coherent directory entries provided in this application have the following advantages over existing technologies: they fully utilize the capacity of the directory and reduce listening operations on all processor cores; moreover, since each entry in the target directory entry set is arranged in an ordered manner according to a preset storage strategy, the lookup efficiency for entries in the target directory entry set is high.
[0024] In some optional embodiments, the method further includes:
[0025] In response to the current processor's request to verify the write ownership of the current cache block, a set of second target directory entries corresponding to the address of the current cache block is obtained from the preset directory;
[0026] The corresponding processor core is determined based on the processor core number information stored in each entry of the second target directory entry set, and an invalid copy instruction is issued so that each processor core performs the operation of invalidating the copy of the current cache block;
[0027] Issue an entry update instruction so that the second target directory entry set contains only one initial directory entry, and store the current processor core number information in the initial directory entry.
[0028] In some optional embodiments, the method further includes:
[0029] In response to the current processor core's request to swap out the current cache block, the address of the current cache block and the target directory entry containing the current processor core's ID information are obtained from the preset directory.
[0030] If the target directory entry exists, delete the current processor core number information from the target directory entry.
[0031] In some optional embodiments, after deleting the current processor core number information from the target directory entry, the method further includes:
[0032] From the set of target directory entries that store the target directory entries, delete entries that meet preset conditions; wherein, the preset conditions include: no processor core numbering information is stored in the entry.
[0033] In some optional embodiments, the method further includes:
[0034] If the application for the new entry fails, an entry update instruction is issued so that the first target directory entry set contains only one initial directory entry, and the number information of the current processor core is saved in the initial directory entry in a preset basic content format. After that, the first target directory entry set becomes a non-precise record sharing state; the directory entry set in the non-precise record sharing state contains only a unique directory entry.
[0035] In some optional embodiments, the target directory entry set includes:
[0036] Homogeneous catalog entry sets and heterogeneous catalog entry sets; where:
[0037] For entries located in the same set of homogeneous directory entries, the content format of each entry is the same;
[0038] For entries located in the same heterogeneous directory entry set, the content format of the entry may include one or more formats.
[0039] In some optional embodiments, the content format of the table entry includes: bitmap enumeration or number enumeration; wherein:
[0040] The bitmap enumeration method records the ownership of cache block copies by several consecutive processor cores with processor core numbers in a bitmap format;
[0041] The number enumeration method records the ownership of cache block copies by several processor cores using processor core number values.
[0042] In some optional embodiments, each entry in the first target directory entry set and the second target directory entry set is arranged in an orderly manner according to a preset storage strategy.
[0043] In some optional embodiments, the preset saving strategy includes:
[0044] For any two adjacent first and second entries in the target directory entry set, the value of the processor core number information stored in the first entry is less than or greater than the value of the processor core number information stored in the second entry.
[0045] In some alternative embodiments, the system further includes:
[0046] The second target directory entry set determination module is used to obtain the second target directory entry set corresponding to the address of the current cache block from the preset directory in response to the current processor's request to verify the write ownership of the current cache block;
[0047] The invalidation module is used to determine the corresponding processor core based on the processor core number information stored in each entry of the second target directory entry set, and issue an invalid copy instruction so that each processor core performs the operation of invalidating the copy of the current cache block;
[0048] The update module is used to issue an entry update instruction so that the second target directory entry set contains only one initial directory entry, and the number information of the current processor core is stored in the initial directory entry. Attached Figure Description
[0049] One or more embodiments are illustrated by way of example with reference numerals in the accompanying drawings. These illustrations do not constitute a limitation on the embodiments. Elements with the same reference numerals in the drawings are denoted as similar elements. Unless otherwise stated, the figures in the drawings are not to be limited by scale.
[0050] Figure 1 is a flowchart of an adaptive federation method for cache consistency catalog entries provided in an embodiment of this disclosure;
[0051] Figure 2 is a flowchart of another adaptive federation method for cache consistency catalog entries provided in an embodiment of this disclosure;
[0052] Figure 3 is a flowchart of another adaptive federation method for cache consistency catalog entries provided in an embodiment of this disclosure. Detailed Implementation
[0053] To make the objectives, technical solutions, and advantages of the embodiments of the present invention clearer, the various embodiments of the present invention will be described in detail below with reference to the accompanying drawings. However, those skilled in the art will understand that many technical details have been presented in the various embodiments of the present invention to enable the reader to better understand the present invention. However, the technical solutions claimed in the present invention can be implemented even without these technical details and various changes and modifications based on the following embodiments.
[0054] To address the shortcomings of existing technologies, the present invention aims to provide an adaptive federation method, system, and computer program product for cache-coherent directory entries. Compared to existing technologies, this method fully utilizes the directory capacity and reduces listening operations on all processor cores. Furthermore, since each entry in the target directory entry set is arranged in an ordered manner according to a preset storage strategy, the lookup efficiency for entries in the target directory entry set is high.
[0055] Example 1:
[0056] The embodiments of the present invention relate to an adaptive federation method for cache consistency catalog entries.
[0057] The following section provides a detailed explanation of the implementation details of the adaptive federation method for cache consistency catalog entries in this embodiment, focusing on the reading of cache blocks. The following content is provided for ease of understanding and is not essential for implementing this solution.
[0058] The adaptive federation method for cache-coherent directory entries in this embodiment can be applied to electronic devices with communication, computing, and data storage capabilities. As shown in Figure 1, the adaptive federation method for cache-coherent directory entries provided in this embodiment includes the following steps:
[0059] Step 110: In response to the current processor's request to verify the read ownership of the current cache block, obtain the first target directory entry set corresponding to the address of the current cache block from the preset directory.
[0060] Specifically, in response to the current processor core's request for ownership of the current cache block, the preset directory queries the first target directory entry set corresponding to the address of the current cache block; the target directory entry set stores entries, where the entries are used to record the core numbers of all cores in the current cache block.
[0061] Step 120: If there is a target entry in the first target directory entry set that meets the preset free space conditions, save the number information of the current processor core in the target entry.
[0062] The preset free space conditions include: the available free space in the table entry is not less than the space required to store the current processor core number information.
[0063] Specifically, when there is a table entry in the first target directory table entry set that has free space to record the number information of the current processor core, the number information of the current processor core is recorded in that table entry.
[0064] Step 130: If there is no entry in the first target directory entry set, or if there is no target entry in the first target directory entry set with free space for storing the current processor core number information, apply for a new entry from the preset directory. If the new entry is successfully obtained, store the current processor core number information in the new entry and store the new entry in the first target directory entry set.
[0065] Specifically, when the first target directory entry set has no entries, or when all entries in it have no free space to record the current processor core number information, a new entry is requested from the directory. After successfully requesting a new entry, the new entry is added to the first target directory entry set, and the current processor core number information is recorded in the new entry.
[0066] Example 2:
[0067] Based on the above embodiments, the following describes the implementation details of the adaptive federation method for cache consistency catalog entries in this embodiment from the perspective of cache block writing. The following content is only for the convenience of understanding the implementation details and is not necessary for implementing this solution.
[0068] The adaptive federation method for cache-coherent directory entries in this embodiment can be applied to electronic devices with communication, computing, and data storage capabilities. As shown in Figure 2, the adaptive federation method for cache-coherent directory entries provided in this embodiment includes the following steps:
[0069] Step 210: In response to the current processor's request to verify the write ownership of the target cache block, obtain the set of second target directory entries corresponding to the address of the target cache block from the preset directory.
[0070] Specifically, in response to the current processor's request for write ownership of the current cache block, the preset directory queries the set of second target directory entries corresponding to the address of the current cache block.
[0071] Step 220: Determine the corresponding processor core based on the processor core number information stored in each entry of the second target directory entry set, and issue an invalid copy instruction to cause each processor core to perform the operation of invalidating the copy of the target cache block.
[0072] Specifically, based on the second target directory entry set, an operation is initiated to invalidate the valid copy of the current cache block to each relevant processor core.
[0073] Step 230: Issue an entry update instruction to ensure that the second target directory entry set contains only one initial directory entry and removes redundant directory entries, and save the current processor core number information in the initial directory entry.
[0074] Specifically, after invalidating the valid copy of the current cache block, the second target directory entry set retains only one initial directory entry, and the current processor core number information is recorded in the initial directory entry.
[0075] Example 3:
[0076] Based on the above embodiments, the present invention relates to an adaptive federation method for cache consistency catalog entries.
[0077] The following is a detailed description of the implementation details of the adaptive federation method for cache consistency catalog entries in this embodiment. The following content is only for the convenience of understanding and is not necessary for implementing this solution.
[0078] The adaptive federation method for cache-coherent directory entries in this embodiment can be applied to electronic devices with communication, computing, and data storage capabilities. As shown in Figure 3, the adaptive federation method for cache-coherent directory entries provided in this embodiment includes the following steps:
[0079] Step 310: In response to the current processor's request to verify the read ownership of the target cache block, obtain the first target directory entry set corresponding to the address of the target cache block from the preset directory;
[0080] Step 320: If there is a target entry in the first target directory entry set that meets the preset free space conditions, save the number information of the current processor core in the target entry;
[0081] Step 330: If there is no entry in the first target directory entry set, or if there is no target entry in the first target directory entry set with free space for storing the current processor core number information, apply for a new entry from the preset directory. If the new entry is successfully obtained, store the current processor core number information in the new entry and store the new entry in the first target directory entry set.
[0082] Step 340: In response to the current processor's request to verify the write ownership of the target cache block, obtain the second target directory entry set corresponding to the address of the target cache block from the preset directory;
[0083] Step 350: Determine the corresponding processor core based on the processor core number information stored in each entry of the second target directory entry set, and issue an invalid copy instruction to cause each processor core to perform the operation of invalidating the copy of the target cache block;
[0084] Step 360: Issue an entry update instruction to ensure that the second target directory entry set contains only one initial directory entry and removes redundant directory entries, and save the current processor core number information in the initial directory entry.
[0085] As an example, the method disclosed in this embodiment may specifically include the following processing flow:
[0086] In response to the current processor core's request for ownership of the current cache block, the preset directory queries the first target directory entry set corresponding to the address of the current cache block; wherein, the entry is used to record the core numbers of all cores in the current cache block;
[0087] When there is a table entry in the first target directory table entry set that has free space to record the number information of the current processor core, the number information of the current processor core is recorded in that table entry;
[0088] When the first target directory entry set has no entries, or when all entries in it have no free space to record the current processor core number information, a new entry is requested from the directory. After successfully requesting a new entry, the new entry is added to the first target directory entry set, and the current processor core number information is recorded in the new entry.
[0089] In response to the current processor core's request for write ownership of the current cache block, the preset directory queries the second target directory entry set corresponding to the address of the current cache block, initiates the operation of invalidating the valid copies of the current cache block to each relevant processor core according to the second target directory entry set, and then keeps only one initial directory entry in the second target directory entry set, and records the number information of the current processor core in the initial directory entry.
[0090] Example 4:
[0091] Based on the above embodiments, this embodiment further explains and illustrates the adaptive federation method for cache consistency catalog entries provided in the above embodiments.
[0092] In related technologies, regardless of how many valid copies of the same cache block exist in the private caches of various processor cores, the address of that cache block can only correspond to at most one entry in the directory (in a two-level directory protocol, an entry for the same cache block in the first-level directory and an entry in the second-level directory of each core group are actually the same entry). One of the most important technical features of this invention is that the address of the same cache block corresponds to a set of entries in the directory. The number of entries in this set changes as the parallel program runs, and all entries in the set collectively record the sharing of the same cache block among all processor cores; that is, each entry can record partial sharing.
[0093] In step 310: In response to the current processor's request to verify the read ownership of the target cache block, a first target directory entry set corresponding to the address of the target cache block is obtained from a preset directory.
[0094] In some embodiments, the target directory entry set includes:
[0095] Homogeneous catalog entry sets and heterogeneous catalog entry sets; where:
[0096] For entries located in the same set of homogeneous directory entries, the content format of each entry is the same;
[0097] For entries located in the same heterogeneous directory entry set, the content format of the entry may include one or more formats.
[0098] In some optional embodiments, the content format of the table entry includes: bitmap enumeration or number enumeration, and the conversion between the two methods; wherein:
[0099] The bitmap enumeration method records the ownership of cache block copies by several consecutive processor cores with processor core numbers in a bitmap format;
[0100] The number enumeration method records the ownership of cache block copies by several processor cores using processor core number values.
[0101] Optionally, all entries in the same set of directory entries can have identical content formats; this is called a homogeneous set of directory entries. Conversely, all entries in the same set of directory entries can have multiple content formats; this is called a heterogeneous set of directory entries, meaning a directory entry can choose one of several modalities to store the processor core number information of a shared cache block. When the number of processor cores in a current compute node reaches hundreds, a single directory entry cannot accurately record the sharing of any cache block across all processor cores, and a simplified approach is often used.
[0102] For example, a directory entry has 64 bits, of which 34 bits are the address marker bits for the cache block, 10 bits record the number of currently valid copies, and the other two 10 bits record the processor core numbers of the owners of the two valid copies. In this application, the above simplification method is referred to as the default basic content format.
[0103] The following section provides further explanation of homogeneous and heterogeneous directory entry sets:
[0104] 1) A set of homogeneous directory entries based on a basic content format. Each entry in the set uses the basic content format, so all entries have the same address tag, and each entry can record the processor core number of two valid copy owners; therefore, when there are N entries in the set, the maximum number of valid copy owners that can be accurately recorded is 2*N.
[0105] 2) Heterogeneous directory entry sets based on linked list structures. All entries in the set are organized into a linked list structure or a linked or indexed structure similar to file storage. Taking a linked list structure as an example, only the entry at the head of the list records the address marker, and each entry has several bits to mark the next entry. When the linked list structure is a doubly linked list (doubly linked lists help to sort and deduplicate entries in the set), each entry also has several bits to mark the previous entry. Generally speaking, directories also use a set-associative mapping strategy similar to cache to speed up lookups based on cache block addresses, and the number of directory entries in the same set is usually not too large (e.g., no more than 64). Since all entries in a directory entry set come from the same set, only a maximum of 7 bits are needed to mark the previous or next entry, while the remaining 50 bits of a non-head entry can record the processor core numbers of the 5 valid copy owners (called number enumeration method). Furthermore, a non-linked list entry can also record the ownership of several consecutive core copies of cache blocks using a bitmap (called bitmap enumeration). In this case, 10 of the remaining 50 bits can record the core number (or core group number) of the starting processor core of the bitmap, and the other 40 bits can record the ownership of the corresponding 40 processor core copies of cache blocks. In a heterogeneous directory entry set, both numbered enumeration entries and bitmap enumeration entries can coexist. Numbered enumeration is suitable for cases where core numbers are sparse among processor cores, while bitmap enumeration is suitable for cases where core numbers are dense among processor cores sharing the same cache block. As the sharing of the same cache block among processor cores changes during parallel program execution, entries can adaptively switch between numbered enumeration and bitmap enumeration methods to achieve accurate recording of the sharing status of the same cache block using as few entries as possible.
[0106] In step 320: if there is a target entry in the first target directory entry set that meets the preset free space conditions, the number information of the current processor core is saved in the target entry.
[0107] Optionally, whether in a homogeneous or heterogeneous directory entry set, a single directory entry can typically record the sharing of the same cache block by multiple processor cores simultaneously. Before recording the current processor core's ID information into the target directory entry set, it's necessary to check if any entry in the target directory entry set has available space to record the current processor core's ID information (e.g., using ID enumeration or bitmap enumeration) to improve the utilization efficiency of directory entries.
[0108] In step 330: if there is no entry in the first target directory entry set, or if there is no target entry in the first target directory entry set with free space for storing the current processor core number information, a new entry is requested from the preset directory. If the new entry is successfully requested, the current processor core number information is stored in the new entry, and the new entry is stored in the first target directory entry set.
[0109] In some embodiments, the method further includes:
[0110] If the application for the new entry fails, an entry update instruction is issued to ensure that the first target directory entry set contains only one initial directory entry and removes redundant directory entries. The number information of the current processor core is then stored in the initial directory entry in a preset basic content format. After this, the first target directory entry set becomes a non-precise record sharing state. The directory entry set in the non-precise record sharing state contains only a unique directory entry.
[0111] Optionally, after acquiring a new entry, it is added to the target directory entry set, and the current processor core number is recorded in the new entry—a natural process. The new entry request usually succeeds when there are free entries in the target directory entry set (when the directory uses a group-associative mapping strategy, there are free entries within the corresponding group). When there are no free entries in the directory, the new entry request must succeed if the target directory entry set has no entries (i.e., it's the first time cache block data is read into the cache). However, when there are no free entries in the directory, the new entry request may or may not succeed if the target directory entry set has entries; success or failure mainly depends on the relevant priority strategy (similar to a cache replacement algorithm). When the new entry request fails, the target directory entry set cannot accurately record the sharing of cache blocks across all processor cores. In this case, a basic content format can be used for recording, and often only one entry needs to be retained in the target directory entry set (the remaining entries are released). When the request for a new entry causes an active entry to be preempted, the set of directory entries to which this active entry belongs can no longer accurately record the sharing of cache blocks across all processor cores.
[0112] In step 340: In response to the current processor's request to verify the write ownership of the target cache block, a second target directory entry set corresponding to the address of the target cache block is obtained from the preset directory.
[0113] In step 350: the corresponding processor core is determined according to the processor core number information stored in each entry of the second target directory entry set, and an invalid copy instruction is issued so that each processor core performs the operation of invalidating the copy of the target cache block.
[0114] In step 360: an entry update instruction is issued to ensure that the second target directory entry set contains only one initial directory entry and removes redundant directory entries, and the number information of the current processor core is stored in the initial directory entry.
[0115] As an example, the specific process of the method disclosed in this embodiment includes:
[0116] In response to the current processor core's request for ownership of the current cache block, the preset directory queries the first target directory entry set corresponding to the address of the current cache block; wherein, the entry is used to record the core numbers of all cores in the current cache block;
[0117] When there is a table entry in the first target directory table entry set that has free space to record the number information of the current processor core, the number information of the current processor core is recorded in that table entry;
[0118] When the first target directory entry set has no entries, or when all entries in it have no free space to record the current processor core number information, a new entry is requested from the directory. After successfully requesting a new entry, the new entry is added to the first target directory entry set, and the current processor core number information is recorded in the new entry.
[0119] In response to the current processor core's request for write ownership of the current cache block, the preset directory queries the second target directory entry set corresponding to the address of the current cache block, initiates the operation of invalidating the valid copies of the current cache block to each relevant processor core according to the second target directory entry set, and then keeps only one initial directory entry in the second target directory entry set, and records the number information of the current processor core in the initial directory entry.
[0120] When the second target directory entry set accurately records all relevant processor cores sharing the current cache block, invalidation operations are initiated on each of these relevant processor cores to invalidate valid copies of the current cache block (corresponding to the classic implementation of the cache coherence protocol, invalidation also allows the current processor core to obtain a latest copy of the cache block). If the second target directory entry set does not accurately record all relevant processor cores sharing the current cache block, then invalidation operations on all valid copies of the current cache block need to be initiated on almost all processor cores. If the second target directory entry set is empty, no invalidation operations are needed, and a new directory entry needs to be allocated. The write request ensures that only the current processor core has a valid copy of the current cache block among all processor cores; therefore, the second target directory entry set ultimately only needs to retain one entry.
[0121] In some embodiments, the method further includes:
[0122] In response to the current processor core's request to swap out the target cache block, the address of the current cache block and the target directory entry containing the current processor core's ID information are obtained from the preset directory.
[0123] If the target directory entry exists, delete the current processor core number information from the target directory entry.
[0124] In some embodiments, after deleting the current processor core number information from the target directory entry, the method further includes:
[0125] Delete entries that meet preset conditions from the set of target directory entries that store the target directory entries.
[0126] In some embodiments, the preset conditions include:
[0127] No processor core numbering information is stored in the table entry.
[0128] Optionally, regardless of whether the target directory entry set accurately records the sharing status of the current cache block, the relevant content in the target directory entry set needs to be updated before swapping out the current cache block. Specifically, when a target directory entry exists (corresponding to the case of accurate recording), the current processor core number information needs to be removed from the content of the target directory entry. Furthermore, the target directory entry set can be optimized, for example, by ensuring that at most one entry in the set has free space, removing entries without substantial content, or converting a certain entry between bitmap enumeration and number enumeration methods.
[0129] In some embodiments, each entry in the first target directory entry set and the second target directory entry set is arranged in an orderly manner according to a preset storage strategy.
[0130] In some embodiments, the preset saving strategy includes:
[0131] For any two adjacent first and second entries in the target directory entry set, the value of the processor core number information stored in the first entry is less than or greater than the value of the processor core number information stored in the second entry.
[0132] Optionally, if the set contains multiple entries, it is necessary to find the entry related to the current processor core's ID from all entries in the set. If there is no order relationship between the entries, it is usually necessary to traverse all entries. To speed up the search process, the entries can always maintain a certain order, such as arranging the processor core IDs in ascending or descending order, and maintaining the order by adjusting the contents of the entries after the contents of the set change.
[0133] Example 5:
[0134] Another embodiment of this application relates to an adaptive federated system for cache-consistent catalog entries.
[0135] The following describes the implementation details of the adaptive federated system for cache-coherent catalog entries in this embodiment from the perspective of reading cache blocks. The following content is only for ease of understanding and is not necessary for implementing this solution. The adaptive federated system for cache-coherent catalog entries provided in this embodiment includes:
[0136] The first target directory entry set determination module is used to obtain the first target directory entry set corresponding to the address of the target cache block from a preset directory in response to a request from the current processor to verify the read ownership of the target cache block;
[0137] The first storage module is used to store the current processor core number information in the target entry when there is a target entry in the first target directory entry set that meets the preset free space conditions.
[0138] The second storage module is used to apply for a new entry from the preset directory when there is no entry in the first target directory entry set, or when there is no target entry in the first target directory entry set with free space for storing the current processor core number information. If the new entry is successfully obtained, the current processor core number information is stored in the new entry, and the new entry is stored in the first target directory entry set.
[0139] Those skilled in the art will clearly understand that, for the sake of convenience and brevity, the specific working process of each module in the adaptive federated system of this cache consistency catalog entry can be referred to the corresponding process in the aforementioned method embodiment, and will not be repeated here.
[0140] It is worth mentioning that all modules involved in this embodiment are logical modules. In practical applications, a logical unit can be a physical unit, a part of a physical unit, or a combination of multiple physical units. Furthermore, to highlight the innovative aspects of this application, this embodiment does not introduce units that are not closely related to solving the technical problems proposed in this application; however, this does not mean that other units are absent in this embodiment.
[0141] Example 6:
[0142] Based on the above embodiments, another embodiment of this application relates to an adaptive federated system for cache-consistent catalog entries.
[0143] The following describes the implementation details of the adaptive federated system for cache coherence catalog entries in this embodiment from the perspective of cache block writing. The following content is only for ease of understanding and is not necessary for implementing this solution. The adaptive federated system for cache coherence catalog entries provided in this embodiment includes:
[0144] The second target directory entry set determination module is used to obtain the second target directory entry set corresponding to the address of the target cache block from a preset directory in response to a request from the current processor to verify the write ownership of the target cache block.
[0145] The invalidation module is used to determine the corresponding processor core based on the processor core number information stored in each entry of the second target directory entry set, and issue an invalid copy instruction so that each processor core performs the operation of invalidating the copy of the target cache block;
[0146] The update module is used to issue an entry update instruction so that the second target directory entry set contains only one initial directory entry and removes redundant directory entries, and saves the number information of the current processor core in the initial directory entry.
[0147] Those skilled in the art will clearly understand that, for the sake of convenience and brevity, the specific working process of each module in the adaptive federated system of this cache consistency catalog entry can be referred to the corresponding process in the aforementioned method embodiment, and will not be repeated here.
[0148] It is worth mentioning that all modules involved in this embodiment are logical modules. In practical applications, a logical unit can be a physical unit, a part of a physical unit, or a combination of multiple physical units. Furthermore, to highlight the innovative aspects of this application, this embodiment does not introduce units that are not closely related to solving the technical problems proposed in this application; however, this does not mean that other units are absent in this embodiment.
[0149] Example 7:
[0150] Based on the above embodiments, another embodiment of this application relates to an adaptive federated system for cache-consistent catalog entries.
[0151] The following provides a detailed description of the implementation details of the adaptive federated system for cache coherence catalog entries in this embodiment. The following content is provided for ease of understanding and is not essential for implementing this solution. The adaptive federated system for cache coherence catalog entries provided in this embodiment includes:
[0152] The first target directory entry set determination module is used to obtain the first target directory entry set corresponding to the address of the target cache block from a preset directory in response to a request from the current processor to verify the read ownership of the target cache block;
[0153] The first storage module is used to store the current processor core number information in the target entry when there is a target entry in the first target directory entry set that meets the preset free space conditions.
[0154] The second storage module is used to apply for a new entry from the preset directory when there is no entry in the first target directory entry set, or when there is no target entry in the first target directory entry set with free space for storing the current processor core number information. If the new entry is successfully obtained, the number information of the current processor core is stored in the new entry, and the new entry is stored in the first target directory entry set.
[0155] The second target directory entry set determination module is used to obtain the second target directory entry set corresponding to the address of the target cache block from the preset directory in response to the current processor's request to verify the write ownership of the target cache block;
[0156] The invalidation module is used to determine the corresponding processor core based on the processor core number information stored in each entry of the second target directory entry set, and issue an invalid copy instruction so that each processor core performs the operation of invalidating the copy of the target cache block;
[0157] The update module is used to issue an entry update instruction so that the second target directory entry set contains only one initial directory entry and removes redundant directory entries, and saves the number information of the current processor core in the initial directory entry.
[0158] Those skilled in the art will clearly understand that, for the sake of convenience and brevity, the specific working process of each module in the adaptive federated system of this cache consistency catalog entry can be referred to the corresponding process in the aforementioned method embodiment, and will not be repeated here.
[0159] It is worth mentioning that all modules involved in this embodiment are logical modules. In practical applications, a logical unit can be a physical unit, a part of a physical unit, or a combination of multiple physical units. Furthermore, to highlight the innovative aspects of this application, this embodiment does not introduce units that are not closely related to solving the technical problems proposed in this application; however, this does not mean that other units are absent in this embodiment.
[0160] Example 8:
[0161] Another embodiment of this application relates to an electronic device comprising: at least one processor; and a memory communicatively connected to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the methods described in the above embodiments.
[0162] The memory and processor are connected via a bus, which can include any number of interconnecting buses and bridges, connecting various circuits of one or more processors and memories. The bus can also connect various other circuits, such as peripheral devices, voltage regulators, and power management circuits, which are well known in the art and will not be described further herein. The bus interface provides an interface between the bus and the transceiver. The transceiver can be a single element or multiple elements, such as multiple receivers and transmitters, providing a unit for communicating with various other devices over a transmission medium. Data processed by the processor is transmitted over the wireless medium via an antenna, which further receives data and transmits it to the processor.
[0163] The processor manages the bus and general processing, and also provides various functions, including timing, peripheral interfaces, voltage regulation, power management, and other control functions. Memory is used to store data used by the processor during operation.
[0164] Example 9:
[0165] Another embodiment of this application relates to a computer-readable storage medium storing a computer program. When executed by a processor, the computer program implements the method embodiments described above.
[0166] That is, those skilled in the art will understand that all or part of the steps in the methods of the above embodiments can be implemented by a program instructing related hardware. This program is stored in a storage medium and includes several instructions to cause a device (which may be a microcontroller, chip, etc.) or processor to execute all or part of the steps of the methods described in the various embodiments of this application. The aforementioned storage medium includes various media capable of storing program code, such as USB flash drives, portable hard drives, read-only memory (ROM), random access memory (RAM), magnetic disks, or optical disks.
[0167] In some embodiments of this application, a computer program product is also provided, including a computer program that, when executed by a processor, implements the steps of the methods described in the above embodiments.
[0168] Those skilled in the art will understand that the above embodiments are specific embodiments for implementing this application, and in practical applications, various changes can be made to them in form and detail without departing from the spirit and scope of this application.
Claims
1. An adaptive coalescing method for cache coherence directory table entries, comprising: The method includes: In response to a request from the current processor core to read ownership of the current cache block, a first target directory entry set corresponding to the address of the current cache block is obtained from a preset directory; wherein, all entries in the first target directory entry set are combined to record the sharing status of the current cache block among all processor cores; If there is a target entry in the first target directory entry set that meets the preset free space conditions, the number information of the current processor core is stored in the target entry; If no entry exists in the first target directory entry set, or if no target entry exists in the first target directory entry set with free space for storing the current processor core's ID information, a new entry is requested from the preset directory. If the new entry is successfully requested, the current processor core's ID information is stored in the new entry, and the new entry is stored in the first target directory entry set.
2. The method of claim 1, wherein, The method further includes: in response to a request from the current processor to verify write ownership of the current cache block, obtaining a set of second target directory entries corresponding to the address of the current cache block from the preset directory; The corresponding processor core is determined based on the processor core number information stored in each entry of the second target directory entry set, and an invalid copy instruction is issued so that each processor core performs the operation of invalidating the copy of the current cache block; Issue an entry update instruction so that the second target directory entry set contains only one initial directory entry, and store the current processor core number information in the initial directory entry.
3. The method of claim 1, wherein, The method further includes: in response to the current processor core's request to swap out the current cache block, obtaining the address of the current cache block and a target directory entry storing the current processor core's ID information from the preset directory; if the target directory entry exists, deleting the current processor core's ID information from the target directory entry.
4. The method of claim 3, wherein, After deleting the current processor core number information from the target directory entry, the method further includes: deleting entries that meet preset conditions from the target directory entry set that stores the target directory entries; wherein the preset conditions include: no processor core number information is stored in the entry.
5. The method of claim 1, wherein, The method further includes: if the application for the new entry is unsuccessful, issuing an entry update instruction to ensure that the first target directory entry set contains only one initial directory entry and removes redundant directory entries, and storing the current processor core number information in the initial directory entry in a preset basic content format.
6. The method according to any one of claims 1 to 5, characterized in that, The target directory entry set includes: Homogeneous set of directory entries and heterogeneous set of directory entries; where: For entries located in the same homogeneous directory entry set, the content format of each entry is the same; for entries located in the same heterogeneous directory entry set, the content format of the entry includes one or more formats.
7. The method of claim 6, wherein, The content format of the table entries includes: bitmap enumeration method or number enumeration method; wherein: the bitmap enumeration method records the ownership of cache block copies by several processor cores with consecutive processor core numbers in a bitmap format; the number enumeration method records the ownership of cache block copies by several processor cores in the form of processor core number values.
8. The method of claim 2, wherein, In the first target directory entry set and the second target directory entry set, each entry is arranged in an orderly manner according to a preset storage strategy.
9. The method of claim 8, wherein, The preset saving strategy includes: For any two adjacent first and second entries in the target directory entry set, the value of the processor core number information stored in the first entry is less than or greater than the value of the processor core number information stored in the second entry.
10. An adaptive coalescing system for cache coherence directory table entries, characterized in that, include: The first target directory entry set determination module is used to, in response to a request from the current processor core to read ownership of the current cache block, obtain the first target directory entry set corresponding to the address of the current cache block from a preset directory; wherein, all entries in the first target directory entry set are combined to record the sharing status of the current cache block among all processor cores; The first storage module is used to store the current processor core number information in the target entry when there is a target entry in the first target directory entry set that meets the preset free space conditions; The second storage module is used to apply for a new entry from the preset directory when there is no entry in the first target directory entry set, or when there is no target entry in the first target directory entry set with free space for storing the current processor core number information. If the new entry is successfully obtained, the current processor core number information is stored in the new entry, and the new entry is stored in the first target directory entry set.
11. The system of claim 10, wherein, Also includes: The second target directory entry set determination module is used to obtain the second target directory entry set corresponding to the address of the current cache block from the preset directory in response to the current processor's request to verify the write ownership of the current cache block; The invalidation module is used to determine the corresponding processor core based on the processor core number information stored in each entry of the second target directory entry set, and issue an invalid copy instruction so that each processor core performs the operation of invalidating the copy of the current cache block; The update module is used to issue an entry update instruction so that the second target directory entry set contains only one initial directory entry, and the number information of the current processor core is stored in the initial directory entry.
12. A computer program product comprising a computer program, characterized in that, When executed by a processor, the computer program implements the steps of the method according to any one of claims 1 to 9.