Cache replacement methods, apparatus, electronic devices, storage media, and program products
By associating attribute information and frequency counters with cache lines and dynamically adjusting status flags, cache lines with high access frequency are prioritized for retention. This solves the problem of important data being removed due to long periods of inaccessibility in existing technologies, thereby improving the cache hit rate and performance of GPGPU.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- SHANGHAI BIREN TECH CO LTD
- Filing Date
- 2026-03-27
- Publication Date
- 2026-06-30
AI Technical Summary
The existing cache replacement strategy will remove data with high access frequency if it has not been accessed for a long time, which will affect GPGPU performance.
Associate cache lines with attribute information, including replacement order information and status identifiers. Dynamically adjust the status identifiers of cache lines through a frequency counter, and prioritize the retention of cache lines with high cumulative access frequency to build a priority replacement mechanism.
It improves the cache hit rate of GPGPU in complex data access modes, reduces performance bottlenecks caused by unnecessary cache misses, and optimizes overall operating performance.
Smart Images

Figure CN122309401A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of artificial intelligence chip technology, and in particular to a cache replacement method, apparatus, electronic device, storage medium, and program product. Background Technology
[0002] In general-purpose graphics processing units (GPGPUs), caches mostly employ cache replacement strategies to allocate and replace cache lines. A well-designed cache replacement strategy can significantly reduce hardware implementation complexity and area overhead without sacrificing too much performance, thereby improving performance.
[0003] In related technical solutions, existing cache replacement strategies may remove certain data in GPGPU that has a high total number of accesses because it has not been accessed for a long time. This would cause the cached data to need to be retrieved from the lower-level cache, affecting the performance of GPGPU. Summary of the Invention
[0004] This invention provides a cache replacement method, apparatus, electronic device, storage medium, and program product to address the shortcomings of existing cache replacement strategies that remove cached data with high total access frequency due to long periods of inactivity, thereby ensuring the retention of cached data with high access frequency and thus guaranteeing GPGPU performance.
[0005] This invention provides a cache replacement method, comprising the following steps.
[0006] Each cache line in the graphics processing unit's cache is associated with an attribute information, which includes replacement order information and a status identifier. The replacement order information is used to indicate the relative order in which multiple cache lines belonging to the same status are replaced. The status identifier is a first identifier or a second identifier. The first identifier is used to indicate that the cumulative access frequency of the cached data has reached a preset access number, and the second identifier is used to indicate that the cumulative access frequency of the cached data has not reached the preset access number. In response to an access request, if a cache missing is detected, and if a cache line with the status identifier of the second identifier exists in the graphics processing unit, the target cache line is selected from the cache lines with the status identifier of the second identifier for replacement based on the replacement order information; If the cache line with the status identifier of the second identifier does not exist in the graphics processing unit, the target cache line is selected from the cache lines with the status identifier of the first identifier for replacement based on the replacement order information.
[0007] According to a cache replacement method provided by the present invention, the attribute information further includes a frequency counter, and the cache replacement method further includes: In response to an access request, if a cache hit is detected, determine the cache line that was hit; Based on the status identifier of the hit cache line as the first identifier, keep the count value of the frequency counter and the status identifier unchanged, and update the replacement order information in the hit cache line; Based on the status identifier of the hit cache line as the second identifier, update the count value of the frequency counter, and If the updated frequency counter value is greater than or equal to the preset number of accesses, the status identifier of the hit cache line is modified to the first identifier, and the replacement order information in the hit cache line is updated. If the updated frequency counter value is less than the preset access count, update the replacement order information in the hit cache line.
[0008] According to a cache replacement method provided by the present invention, the cache replacement method further includes: In response to an access request, if a cache missing is detected, and if there are unused cache lines in the graphics processing unit, an available cache line is selected from the unused cache lines for use based on the replacement order information; The status identifier of the available cache lines is marked as the second identifier, and the count value of the frequency counter is updated.
[0009] According to a cache replacement method provided by the present invention, the attribute information includes a dirty cache line identifier and a frequency counter, and the step of selecting a target cache line for replacement based on the replacement order information among the cache lines whose status identifier is the second identifier further includes: Based on the validity of the dirty cache line identifier corresponding to the target cache line, the cached data stored in the target cache line is written to the lower-level cache; and Update the status identifier of the target cache line to the second identifier, and update the count value and replacement order information of the frequency counter.
[0010] According to a cache replacement method provided by the present invention, the attribute information includes a dirty cache line identifier and a frequency counter, and the step of selecting a target cache line for replacement based on the replacement order information among the cache lines whose status identifier is the first identifier further includes: Based on the validity of the dirty cache line identifier corresponding to the target cache line, the cached data stored in the target cache line is written to the lower-level cache; and Update the status identifier of the target cache line to the second identifier, and update the count value and replacement order information of the frequency counter.
[0011] According to a cache replacement method provided by the present invention, the replacement order information is a PLRU value determined based on a pseudo Least Recently Used replacement algorithm.
[0012] The present invention also provides a cache replacement device, comprising the following modules: The calibration module is used to associate an attribute information with each cache line in the cache of the graphics processing unit. The attribute information includes replacement order information and a status identifier. The replacement order information is used to indicate the relative order in which multiple cache lines belonging to the same status are replaced. The status identifier is a first identifier or a second identifier. The first identifier is used to indicate that the cumulative access frequency of the cached data has reached a preset access number, and the second identifier is used to indicate that the cumulative access frequency of the cached data has not reached the preset access number. The processing module is configured to respond to an access request and, if a cached line with the second identifier exists in the graphics processing unit, select a target cache line from the cache lines with the second identifier based on the replacement order information for replacement; if the cache line with the second identifier does not exist in the graphics processing unit, select a target cache line from the cache lines with the first identifier based on the replacement order information for replacement.
[0013] The present invention also provides an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the computer program to implement the cache replacement method as described above.
[0014] The present invention also provides a non-transitory computer-readable storage medium having a computer program stored thereon, which, when executed by a processor, implements the cache replacement method as described above.
[0015] The present invention also provides a computer program product, including a computer program that, when executed by a processor, implements the cache replacement method as described above.
[0016] This invention provides a cache replacement method, apparatus, electronic device, storage medium, and program product. By introducing status identifiers to stratify cache lines based on their value, and combining this with replacement order information, a priority-based replacement mechanism is constructed within each layer. This mechanism prioritizes replacement of cache lines marked as less important, effectively protecting cache lines with high cumulative access frequency and long-term importance, preventing them from being incorrectly evicted due to short-term inactivity. This significantly improves the cache hit rate of GPGPUs when handling complex data access patterns, reduces performance bottlenecks caused by unnecessary cache misses, and ultimately optimizes the overall performance of GPGPUs. Attached Figure Description
[0017] To more clearly illustrate the technical solutions in this invention or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are some embodiments of this invention. For those skilled in the art, other drawings can be obtained from these drawings without creative effort.
[0018] Figure 1 This is one of the flowcharts illustrating the cache replacement method provided by the present invention.
[0019] Figure 2 This is the second flowchart of the cache replacement method provided by the present invention.
[0020] Figure 3 This is a schematic diagram of the attribute information provided by the present invention.
[0021] Figure 4 This is the third flowchart of the cache replacement method provided by the present invention.
[0022] Figure 5 This is a schematic diagram of the cache replacement device provided by the present invention.
[0023] Figure 6 This is a schematic diagram of the structure of the electronic device provided by the present invention.
[0024] Figure label: 501: Calibration module; 602: Processing module; 610: Processor; 620: Communication interface; 630: Memory; 640: Communication bus. Detailed Implementation
[0025] To make the objectives, technical solutions, and advantages of this invention clearer, the technical solutions of this invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some, not all, of the embodiments of this invention. All other embodiments obtained by those skilled in the art based on the embodiments of this invention without creative effort are within the scope of protection of this invention.
[0026] The following is combined Figures 1 to 6 The present invention describes a cache replacement method, apparatus, electronic device, storage medium, and program product.
[0027] This invention provides a cache replacement method, apparatus, electronic device, and storage medium applicable to the cache of a general-purpose graphics processing unit (GPGPU), specifically its L2 cache or higher-level cache. The execution entity of this method can be a hardware logic unit within the GPGPU, such as a cache controller, or it can be implemented by a processor executing corresponding computer program instructions.
[0028] Figure 1 This is one of the flowcharts illustrating the cache replacement method provided by the present invention, such as... Figure 1 As shown, the method includes the following: Step 101: Associate an attribute information with each cache line in the graphics processing unit's cache. The attribute information includes replacement order information and a status identifier. The replacement order information is used to indicate the relative order in which multiple cache lines belonging to the same status are replaced. The status identifier is either a first identifier or a second identifier. The first identifier is used to indicate that the cumulative access frequency of the cached data has reached the preset access number, and the second identifier is used to indicate that the cumulative access frequency of the cached data has not reached the preset access number.
[0029] Among them, a cache line is the basic unit for data storage, management and replacement in the cache, and in some architectures it can also be called a way.
[0030] The replacement order information serves to indicate the relative order in which multiple cache lines belonging to the same state are replaced.
[0031] In one embodiment, the replacement order information can be determined and updated based on a pseudo-least recently used (PLRU) replacement algorithm. For example, by maintaining a binary tree structure associated with a set of cache lines, the state of this structure (i.e., the PLRU value) can approximate the recent usage of each cache line with low hardware implementation complexity. When it is necessary to select one from multiple candidate cache lines, the cache line with the highest replacement priority can be determined based on this replacement order information, i.e., the cache line indicated by the PLRU algorithm as the least recently accessed cache line. Of course, those skilled in the art will understand that any other identifier or value that can characterize the recent access of a cache line, such as sorting information generated based on other LRU approximation algorithms, can fall within the protection scope of the replacement order information of this invention.
[0032] Status identifiers are used to classify cache lines based on their historical access characteristics. These identifiers can be either a first identifier or a second identifier. A first identifier indicates that the cumulative access frequency of the cached data has reached a preset number of accesses. In other words, cache lines assigned the first identifier are considered high-frequency access cache lines or warm cache lines with high long-term retention value. This means that although the data stored in these lines may not be currently in high demand, their historical access patterns indicate a high probability of being accessed again in the future.
[0033] In some embodiments, the preset number of accesses may be a threshold parameter that can be configured by system software or hardware.
[0034] Correspondingly, the second identifier is used to indicate that the cumulative access frequency of cached data has not reached the preset access count. Cache lines assigned the second identifier can be regarded as ordinary cache lines or non-warm cache lines. Their importance has not yet been verified, so they have a lower retention priority in replacement decisions.
[0035] In some embodiments, the status identifier is a warm bit, where the warm status corresponds to a first identifier and the non-warm status corresponds to a second identifier.
[0036] In a specific hardware implementation, the status identifier can be represented by a single bit. For example, logic 1 can represent the first identifier, and logic 0 can represent the second identifier.
[0037] Step 102: In response to the access request, if a cache missing is detected, if a cache line with the second identifier exists in the graphics processing unit, the target cache line is selected from the cache lines with the second identifier for replacement based on the replacement order information; if no cache line with the second identifier exists in the graphics processing unit, the target cache line is selected from the cache lines with the first identifier for replacement based on the replacement order information.
[0038] In this embodiment, it is determined whether a cache line with a status identifier of the second identifier exists in the cache. If a cache line with a status identifier of the second identifier exists in the graphics processing unit, the replacement candidate range is limited to a set consisting of all cache lines with the second identifier. Subsequently, a target cache line is selected from the cache lines with the second identifier for replacement based on the replacement order information. Specifically, the cache controller queries the replacement order information (e.g., PLRU value) of all cache lines in the candidate set and selects the one indicated as the least accessed as the target cache line for loading newly acquired data.
[0039] Conversely, if there is no cache line with the second status identifier in the graphics processing unit, meaning all cache lines in the current cache have the first status identifier, this indicates that the cache is full of data deemed long-term important. In this case, the replacement candidate range is all cache lines. Then, based on the replacement order information, the target cache line is selected from the cache lines with the first status identifier for replacement. This step ensures that even when all cache lines are considered important, the replacement decision can still eliminate the least used data among the important data based on its relative access proximity, thus freeing up space for new data.
[0040] In this embodiment, a priority-based replacement mechanism is constructed by introducing status identifiers to stratify cache lines based on their value and combining this with replacement order information to select replacements within each layer. This mechanism prioritizes replacement of cache lines marked as less important, effectively protecting cache lines with high cumulative access frequency and long-term importance, preventing them from being incorrectly evicted due to short-term inactivity. This significantly improves the cache hit rate of GPGPUs when handling complex data access patterns, reduces performance bottlenecks caused by unnecessary cache misses, and ultimately optimizes the overall performance of GPGPUs.
[0041] In some embodiments, based on replacement order information, selecting a target cache line for replacement from cache lines with a status identifier of the second identifier includes: Based on the replacement order information, select the unused cache line as the target cache line from the cache lines with the second status identifier, and replace it; If there are no unused cache lines in the cache lines with the second status identifier, based on the replacement order information, select the used cache line as the target cache line from the cache lines with the second status identifier and replace it.
[0042] In some embodiments, the attribute information further includes a frequency counter, and the cache replacement method further includes: In response to an access request, if a cache hit is detected, determine the cache line that was hit; Based on the status identifier of the hit cache line as the first identifier, keep the count value of the frequency counter and the status identifier unchanged, and update the replacement order information in the hit cache line; Based on the status identifier of the hit cache line as the second identifier, update the count value of the frequency counter, and If the updated frequency counter value is greater than or equal to the preset number of accesses, the status identifier of the hit cache line is changed to the first identifier, and the replacement order information in the hit cache line is updated. If the updated frequency counter value is less than the preset access count, update the replacement order information in the hit cache line.
[0043] The frequency counter is a hardware counting unit used to quantify the frequency of access to cache lines. In its physical implementation, it can be a set of bits associated with the metadata of each cache line; for example, a 4-bit counter can record 0 to 15 accesses. The core function of this frequency counter is to provide objective evidence for determining whether the cumulative access frequency of a cache line has reached a preset number of accesses.
[0044] In this embodiment, the status identifier of the hit cache line is the first identifier, indicating that a cache line already marked as frequently accessed or warmly accessed has been hit. In this case, the frequency counter value and status identifier remain unchanged, and the replacement order information in the hit cache line is updated. During this process, once a cache line is identified as long-term important data (first identifier), its high retention value is established, and there is no need to further accumulate its frequency. However, updating its replacement order information (e.g., updating the PLRU binary tree to mark it as recently accessed) is still necessary. This ensures that even among a group of cache lines with the same first identifier, when replacement is unavoidable, the one that has not been accessed for the longest time can still be selected for replacement based on the latest access proximity.
[0045] The status identifier for a cache line hit is the second identifier, indicating a hit on a normal or non-warm cache line. In this case, the frequency counter is updated first. Typically, this update operation involves incrementing the frequency counter by one. This step is crucial for accumulating access history. After updating the counter, a check is performed: When the updated frequency counter value is greater than or equal to the preset access count, it indicates that the cache line has been accessed multiple times, its importance has been verified, and it meets the conditions for being promoted to long-term retention data. At this time, the status identifier of the hit cache line is changed to the first identifier, that is, its status is promoted from non-warm to warm. At the same time, the replacement order information in the hit cache line is updated.
[0046] In one embodiment, after modifying the status identifier to the first identifier, the count value of the frequency counter can also be cleared to prepare for possible subsequent policy adjustments (such as status degradation) or to save hardware state.
[0047] If the updated frequency counter value is less than the preset access count, it indicates that the cumulative access count of the cache line has not yet reached the standard for being considered long-term important data. Therefore, its status identifier remains unchanged. In this case, only the replacement order information in the hit cache line is updated to reflect this access behavior.
[0048] In this embodiment, a mechanism for dynamic promotion from a second identifier to a first identifier is provided. This mechanism precisely quantifies the access history of cache lines through a frequency counter, making the division of status identifiers no longer static but adaptively evolving based on actual program access behavior. This makes the identification of long-term important data more accurate and dynamic, further enhancing the intelligence and efficiency of the cache replacement strategy, thereby more effectively improving the cache hit rate.
[0049] In some embodiments, such as Figure 2 As shown, the cache replacement method also includes: Step 201: In response to the access request, if a cache missing is detected, and there are unused cache lines in the graphics processing unit, an available cache line is selected from the unused cache lines for use based on the replacement order information; Step 202: Mark the status identifier of the available cache lines as the second identifier, and update the count value of the frequency counter.
[0050] In one specific embodiment, in response to an access request, if a cache miss is detected, the cache controller first checks if there are any available free resources in the current cache group. Specifically, if there are unused cache lines in the graphics processing unit, these free resources will be utilized first. An unused cache line can refer to a cache line whose valid bit is invalid, or a cache line that has never been filled with data since initialization.
[0051] In this scenario, an available cache line will be selected from among the unused cache lines based on replacement order information. Even when multiple cache lines are unused, a deterministic mechanism is needed to select one. Replacement order information (e.g., PLRU logic) can be extended to this scenario to select one as the target from all unused cache lines. For example, the PLRU mechanism can indicate the next available free cache line according to a preset order or its internal state. This ensures the orderliness and predictability of the allocation process.
[0052] After selecting an available cache line, its attribute information needs to be initialized so that it can be seamlessly integrated into the replacement strategy framework of this invention. Specifically, the following operations will be performed: The available cache line is marked with the second identifier. The rationale behind this step is that newly loaded data has not yet demonstrated any access history, and its long-term importance is unknown. Therefore, setting its initial state to the second identifier (i.e., non-warm or normal state) is logical. This ensures that new data will have a lower retention priority until it proves its access value, and will become a priority candidate for replacement if subsequent cache pressure increases. This operation corresponds to setting the warm bit of the new cache line to 0.
[0053] Simultaneously, the frequency counter is updated. For a newly allocated cache line, this update operation typically means initializing its frequency counter to an initial value. In one embodiment, this operation may manifest as incrementing the frequency counter from zero to one (i.e., freq_cnt++), counting this allocation and load as the first valid access.
[0054] In this embodiment, by prioritizing the use of unused cache lines, unnecessary data write-back and replacement overhead is avoided. Simultaneously, by standardizing the attribute initialization of newly allocated cache lines (marking them with a second identifier and updating the frequency counter), it is ensured that newly arriving data can fairly participate in the subsequent dynamic replacement decision process based on access frequency and proximity, thus guaranteeing the integrity and consistency of the entire cache replacement system.
[0055] In some embodiments, the attribute information includes a dirty cache line identifier and a frequency counter. The method for selecting a target cache line for replacement from cache lines with a second identifier based on replacement order information further includes: Based on the validity of the dirty cache line identifier corresponding to the target cache line, the cached data stored in the target cache line is written to the lower-level cache; and Update the status identifier of the target cache line to the second identifier, and update the count value of the frequency counter and the replacement order information.
[0056] The dirty cache line identifier is a standard technical feature in caches, used to indicate whether data in a cache line has been modified and whether the modified content has not yet been written back to the next-level cache or main memory. When this identifier is valid (e.g., a bit value of 1), it indicates that the data is dirty; otherwise, it is clean data. The function of the frequency counter has been described in previous embodiments and will not be repeated here.
[0057] In this embodiment, after selecting a target cache line from the set of cache lines with a second identifier based on the replacement order information, it is necessary to process any dirty data that may exist in the target cache line. Specifically, the dirty cache line identifier corresponding to the target cache line will be checked.
[0058] If the dirty cache line identifier corresponding to the target cache line is valid, the cached data stored in the target cache line is written to the lower-level cache. This operation, also known as write-back or eviction, is a critical step in maintaining cache consistency. If the system does not perform this operation and directly overwrites the dirty data with new data, the modifications to the original data will be permanently lost. Therefore, before overwriting the target cache line, its contents must first be synchronized to the lower-level storage hierarchy. If the dirty cache line identifier corresponding to the target cache line is invalid, meaning its data is clean, then there is no need to perform a write-back operation, and the cache line can be directly overwritten, thus saving bus bandwidth and processing time.
[0059] After processing the write-back logic for the replaced data, the target cache line is ready to load the new data requested due to the current cache miss. During or after data loading, the attribute information of the cache line needs to be initialized to adapt it to the replacement strategy of this invention. The specific operations are as follows: The target cache line's status identifier is updated to the second identifier, and the frequency counter and replacement order information are updated. This step reinitializes the metadata of the reused cache line. Specifically, it is broken down as follows: First, the target cache line's status identifier is updated to the second identifier. This is consistent with the logic of newly allocated cache lines; newly loaded data, due to its unknown access history, has its initial importance set to the lowest level (i.e., non-warm state). Second, the frequency counter is updated. This update operation aims to initialize the access count for the newly loaded data. In a specific embodiment, this operation can be to set the frequency counter value to 1 (e.g., by executing the `freq_cnt++` instruction, where the counter's initial value is 0) to record access caused by this cache miss. Finally, the replacement order information is updated. Because this cache line has just been accessed and filled, it should be marked as a recently used unit. The system updates the corresponding PLRU status bit or other recent records to accurately reflect its latest access status.
[0060] In this embodiment, by introducing the judgment and processing of dirty cache line identifiers, the consistency and integrity of cached data are ensured. At the same time, by standardizing the initialization of the attributes of newly loaded data, it is ensured that it can be seamlessly integrated into the dynamic priority replacement framework based on access frequency and proximity constructed in this invention, making the entire cache replacement method more robust and complete.
[0061] In some embodiments, the attribute information includes a dirty cache line identifier and a frequency counter. The method for selecting a target cache line for replacement from cache lines with the first identifier based on replacement order information further includes: Based on the validity of the dirty cache line identifier corresponding to the target cache line, the cached data stored in the target cache line is written to the lower-level cache; and Update the status identifier of the target cache line to the second identifier, and update the count value of the frequency counter and the replacement order information.
[0062] In some embodiments, the replacement order information is based on the PLRU value determined by a pseudo least recently used replacement algorithm.
[0063] In some embodiments, associating an attribute with each cache line in the graphics processing unit's cache can be achieved by configuring a dedicated storage bit in the metadata storage area corresponding to each cache line, so that each cache line carries its own unique attribute information.
[0064] For example, such as Figure 3 As shown, the attribute information of the i-th cache line, cacheline_i, includes, but is not limited to, other cacheline bits, plrubits bits, freq_cnt_bits bits, and warm bits.
[0065] Specifically, other bits in the cacheline mainly include, but are not limited to, Tag (address tag), valid (valid cache line identifier), and dirty (dirty cache line identifier).
[0066] The plruits bits are, but are not limited to, 15 bits, representing the replacement order information in this invention. They simulate a binary tree (the plru implementation uses a binary tree structure) and are used for allocating the last cacheline line. The plru bits are updated each time a cacheline line is hit or misses but is allocated.
[0067] The freq_cnt_bits field, which is 4 bits in this document, is used to store the count value of the frequency counter. Here, freq_cnt is the frequency counter. When the count of freq_cnt reaches a certain number, it indicates that this cache line may be frequently accessed in the future, and it should be kept in the cache as much as possible.
[0068] The term "warm bit" in this paper refers to, but is not limited to, 1 bit, which is the status identifier in this invention. This paper defines, but is not limited to, cache line data that may be frequently queried in the future and will not be evicted in the short term as "warm bit". That is, the data does not need to be frequently evicted or loaded into the cache in a short period of time, but is kept in the cache for a relatively long period of time.
[0069] Specifically, a Streaming Multiprocessor (SM) will issue an access request that results in a hit (data is in the cache) or a miss (data is not in the cache and needs to be read from a lower-level cache and stored there). In the case of a miss, this article may include, but is not limited to, the following scenarios: a request is missed, but an unused cache line is allocated; a request is missed, but a used non-warm cache line is allocated; a request is missed, but a used warm cache line is allocated.
[0070] For ease of description, some terms are explained below.
[0071] Hit indicates a cache hit, freq_cnt represents the frequency counter, warm represents the first identifier of the status flag, nonwarm represents the second identifier of the status flag, non_warm_msk is a bitmask used to identify whether non-warm cache lines in the cache are allocable, where non-warm cache lines are cache lines with the second identifier in this invention, freq_cnt++ indicates that the frequency counter is incremented by one, warm=1 indicates that the second identifier of the status flag is valid, warm=0 indicates that the second identifier of the status flag is invalid, freq_cnt=0 indicates that the frequency counter is reset to zero, msk_empty indicates whether there are unused cache lines in the cache, nonwarmvalid way indicates a cache line with the second identifier that can be replaced, warmvalid way indicates a cache line with the first identifier that can be replaced, and miss indicates a cache line missing.
[0072] Specifically, such as Figure 4 As shown, the cache replacement methods include: Step 401: SM sends an access request; Step 402: Determine if the access request has been hit. If the result is yes, proceed to step 403. If the result is no, proceed to step 404.
[0073] Step 403: Determine if the hit cache line is warm. If the result is yes, proceed to step 405. If the result is no, proceed to step 406.
[0074] Step 404: Find non_warm_msk.
[0075] Step 405: Keep freq_cnt and keep warm.
[0076] Step 406, freq_cnt++.
[0077] Step 407: If freq_cnt ≥ freq_cnt_req_thrd, proceed to step 408 if the result is yes, and proceed to step 409 if the result is no.
[0078] Step 408, freq_cnt=0 warm=1.
[0079] Step 409: Update the plru value.
[0080] Step 410, msk_empty: If the result is yes, proceed to step 411; if the result is no, proceed to step 412.
[0081] Step 411: Locate warm_msk.
[0082] Step 412, alloc_unused way: If the result is yes, proceed to step 413; if the result is no, proceed to step 414.
[0083] Step 413: Find the cacheline that can be allocated based on the msk and plru values.
[0084] Step 414: Replace the nonwarmvalid way.
[0085] Step 415: Replace warmvalid way.
[0086] Step 416: Is it dirty? If the result is yes, proceed to step 417.
[0087] Step 417: Evict dirty data to the next level cache.
[0088] Step 418: warm=0 freq_cnt++.
[0089] For cases where the SM sends an access request that is hit: If the cached line for the access request hit is warm, then neither freq_cnt nor warm will be updated; simply update the plru value, then read the hit data from the cache and return it to SM.
[0090] If the cached line that the access request hits is not warm, then freq_cnt is incremented. If the updated freq_cnt is greater than or equal to freq_cnt_req_thrd, then the warm value of this cached line is set to 1 (indicating that this cacheline will be accessed frequently), and freq_cnt is set to 0; otherwise, the warm value remains unchanged. Finally, the plru value is updated, and the hit data is read from the cache and returned to SM.
[0091] For cases where the SM issues a missing access request but allocates an unused cache line: When a request for access fails, the system will first look for non_warm_way (a way is a cacheline). If a non_warm_msk is found (which represents the identifier of a way that can be allocated, where each bit represents a way and a value of 1 indicates that it is valid. For example, if each set has a maximum of 16 ways, then non_warm_msk is defined as a 16-bit width, with the low bits representing way0 to way15 respectively from the high bits), it will be considered a valid way.
[0092] If non_warm_msk is not empty and there are one or more unused_way (empty way that has never been occupied), then plru allocates an available way from the unused way for the missed request. The warm value of the new cache line is set to 0, and freq_cnt is incremented.
[0093] For cases where the SM issues a missing access request, but allocates a used non_warm cache line: When a request misses, if `non_warm_msk` is not empty and there is no `unused_way`, then a valid non-warm way needs to be selected from the cache lines. The selection strategy is that PLRU selects a way from the non-warm ways for replacement. If the data in the replaced way is dirty, the dirty data needs to be written to the next-level cache. The `warm` value of the new cache line is set to 0, and `freq_cnt` is incremented.
[0094] For cases where the SM issues a missing access request, but allocates an already used warm cache line: When a request misses, if `non_warm_msk` is empty, a valid cache line that is warm needs to be selected. The selection strategy is for PLRU to choose a way from the warm_way list for replacement. If the data in the replaced way is dirty, the dirty data needs to be written to the next-level cache. The warm value of the new cache line is set to 0, and `freq_cnt` is incremented.
[0095] The above embodiments ensure that data that has not been used for a long time in the short term is removed as much as possible, while data that may not be used frequently in the short term but may be used in the long term is retained in the cache as much as possible, avoiding erroneous eviction due to infrequent access. This approach can improve caching efficiency and overall hit rate.
[0096] The cache replacement apparatus provided by the present invention will be described below. The cache replacement apparatus described below can be referred to in correspondence with the cache replacement method described above.
[0097] The present invention also provides a cache replacement device, such as Figure 5 As shown, it includes the following modules: The calibration module 501 is used to associate an attribute information with each cache line in the cache of the graphics processing unit. The attribute information includes replacement order information and a status identifier. The replacement order information is used to indicate the relative order of replacement among multiple cache lines belonging to the same status. The status identifier is a first identifier or a second identifier. The first identifier is used to indicate that the cumulative access frequency of the cached data has reached a preset access number, and the second identifier is used to indicate that the cumulative access frequency of the cached data has not reached the preset access number. The processing module 502 is configured to respond to an access request and, in the event of a cache missing, if a cache line with the status identifier of the second identifier exists in the graphics processing unit, select a target cache line from the cache lines with the status identifier of the second identifier for replacement based on the replacement order information; if a cache line with the status identifier of the second identifier does not exist in the graphics processing unit, select a target cache line from the cache lines with the status identifier of the first identifier for replacement based on the replacement order information.
[0098] Figure 6 An example is a schematic diagram of the physical structure of an electronic device, such as... Figure 6As shown, the electronic device may include: a processor 610, a communication interface 620, a memory 630, and a communication bus 640, wherein the processor 610, the communication interface 620, and the memory 630 communicate with each other through the communication bus 640. The processor 610 can invoke logical instructions in the memory 630 to execute a cache replacement method. This method includes: associating attribute information with each cache line in the graphics processing unit's cache, the attribute information including replacement order information and a status identifier. The replacement order information is used to indicate the relative order of replacement among multiple cache lines belonging to the same status. The status identifier is either a first identifier or a second identifier. The first identifier indicates that the cumulative access frequency of the cached data has reached a preset access count, and the second identifier indicates that the cumulative access frequency of the cached data has not reached the preset access count. In response to an access request, if a cache miss is detected, if a cache line with the second identifier exists in the graphics processing unit, a target cache line is selected from the cache lines with the second identifier for replacement based on the replacement order information. If a cache line with the second identifier does not exist in the graphics processing unit, a target cache line is selected from the cache lines with the first identifier for replacement based on the replacement order information.
[0099] Furthermore, the logical instructions in the aforementioned memory 630 can be implemented as software functional units and, when sold or used as independent products, can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present invention, in essence, or the part that contributes to the prior art, or a part of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute all or part of the steps of the methods of the various embodiments of the present invention. The aforementioned storage medium includes various media capable of storing program code, such as USB flash drives, portable hard drives, read-only memory (ROM), random access memory (RAM), magnetic disks, or optical disks.
[0100] On the other hand, the present invention also provides a computer program product, which includes a computer program that can be stored on a non-transitory computer-readable storage medium. When the computer program is executed by a processor, the computer can execute the cache replacement method provided by the above methods. The method includes: associating an attribute information with each cache line in the cache of a graphics processing unit, the attribute information including replacement order information and a status identifier, the replacement order information being used to indicate the relative order of replacement among multiple cache lines belonging to the same status, the status identifier being a first identifier or a second identifier, the first identifier being used to indicate that the cumulative access frequency of the cached data has reached a preset access number, and the second identifier being used to indicate that the cumulative access frequency of the cached data has not reached the preset access number; in response to an access request, if a cache missing is detected, if a cache line with the status identifier of the second identifier exists in the graphics processing unit, a target cache line is selected from the cache lines with the status identifier of the second identifier for replacement based on the replacement order information; if a cache line with the status identifier of the second identifier does not exist in the graphics processing unit, a target cache line is selected from the cache lines with the status identifier of the first identifier for replacement based on the replacement order information.
[0101] In another aspect, the present invention also provides a non-transitory computer-readable storage medium storing a computer program thereon, which, when executed by a processor, implements the cache replacement method provided by the above methods. The method includes: associating each cache line in the cache of a graphics processing unit with attribute information, the attribute information including replacement order information and a status identifier, the replacement order information indicating the relative order of replacement among multiple cache lines belonging to the same status, the status identifier being a first identifier or a second identifier, the first identifier indicating that the cumulative access frequency of the cached data has reached a preset access count, and the second identifier indicating that the cumulative access frequency of the cached data has not reached the preset access count; in response to an access request, upon detecting a cache missing, if a cache line with the second identifier exists in the graphics processing unit, a target cache line is selected from the cache lines with the second identifier for replacement based on the replacement order information; if a cache line with the second identifier does not exist in the graphics processing unit, a target cache line is selected from the cache lines with the first identifier for replacement based on the replacement order information.
[0102] The device embodiments described above are merely illustrative. The units described as separate components may or may not be physically separate, and the components shown as units may or may not be physical units; that is, they may be located in one place or distributed across multiple network units. Some or all of the modules can be selected to achieve the purpose of this embodiment according to actual needs. Those skilled in the art can understand and implement this without any creative effort.
[0103] Through the above description of the embodiments, those skilled in the art can clearly understand that each embodiment can be implemented by means of software plus necessary general-purpose hardware platforms, and of course, it can also be implemented by hardware. Based on this understanding, the above technical solutions, in essence or the part that contributes to the prior art, can be embodied in the form of a software product. This computer software product can be stored in a computer-readable storage medium, such as ROM / RAM, magnetic disk, optical disk, etc., and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute the methods described in the various embodiments or some parts of the embodiments.
[0104] Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention, and not to limit them; although the present invention has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that modifications can still be made to the technical solutions described in the foregoing embodiments, or equivalent substitutions can be made to some of the technical features; and these modifications or substitutions do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of the present invention.
Claims
1. A cache replacement method, characterized in that, include: Each cache line in the graphics processing unit's cache is associated with an attribute information, which includes replacement order information and a status identifier. The replacement order information is used to indicate the relative order in which multiple cache lines belonging to the same status are replaced. The status identifier is a first identifier or a second identifier. The first identifier is used to indicate that the cumulative access frequency of the cached data has reached a preset access number, and the second identifier is used to indicate that the cumulative access frequency of the cached data has not reached the preset access number. In response to an access request, if a cache missing is detected, and if a cache line with the status identifier of the second identifier exists in the graphics processing unit, the target cache line is selected from the cache lines with the status identifier of the second identifier for replacement based on the replacement order information. If the cache line with the status identifier of the second identifier does not exist in the graphics processing unit, the target cache line is selected from the cache lines with the status identifier of the first identifier for replacement based on the replacement order information.
2. The cache replacement method according to claim 1, characterized in that, The attribute information also includes a frequency counter, and the cache replacement method further includes: In response to an access request, if a cache hit is detected, determine the cache line that was hit; Based on the status identifier of the hit cache line as the first identifier, keep the count value of the frequency counter and the status identifier unchanged, and update the replacement order information in the hit cache line; Based on the status identifier of the hit cache line as the second identifier, update the count value of the frequency counter, and If the updated frequency counter value is greater than or equal to the preset number of accesses, the status identifier of the hit cache line is modified to the first identifier, and the replacement order information in the hit cache line is updated. If the updated frequency counter value is less than the preset access count, update the replacement order information in the hit cache line.
3. The cache replacement method according to claim 2, characterized in that, The cache replacement method further includes: In response to an access request, if a cache missing is detected, and if there are unused cache lines in the graphics processing unit, an available cache line is selected from the unused cache lines for use based on the replacement order information; The status identifier of the available cache lines is marked as the second identifier, and the count value of the frequency counter is updated.
4. The cache replacement method according to claim 1, characterized in that, The attribute information includes a dirty cache line identifier and a frequency counter. The step of selecting a target cache line for replacement based on the replacement order information from the cache lines with the second identifier also includes: Based on the validity of the dirty cache line identifier corresponding to the target cache line, the cached data stored in the target cache line is written to the lower-level cache; and Update the status identifier of the target cache line to the second identifier, and update the count value and replacement order information of the frequency counter.
5. The cache replacement method according to claim 1, characterized in that, The attribute information includes a dirty cache line identifier and a frequency counter. The step of selecting a target cache line for replacement based on the replacement order information from the cache lines whose status identifier is the first identifier further includes: Based on the validity of the dirty cache line identifier corresponding to the target cache line, the cached data stored in the target cache line is written to the lower-level cache; and Update the status identifier of the target cache line to the second identifier, and update the count value and replacement order information of the frequency counter.
6. The cache replacement method according to claim 2, characterized in that, The replacement order information is a PLRU value determined based on a pseudo least recently used replacement algorithm.
7. A cache replacement device, characterized in that, include: The calibration module is used to associate an attribute information with each cache line in the cache of the graphics processing unit. The attribute information includes replacement order information and a status identifier. The replacement order information is used to indicate the relative order in which multiple cache lines belonging to the same status are replaced. The status identifier is a first identifier or a second identifier. The first identifier is used to indicate that the cumulative access frequency of the cached data has reached a preset access number, and the second identifier is used to indicate that the cumulative access frequency of the cached data has not reached the preset access number. The processing module is configured to respond to an access request and, if a cache missing is detected, if a cache line with the status identifier of the second identifier exists in the graphics processing unit, select a target cache line from the cache lines with the status identifier of the second identifier and replace it based on the replacement order information. If the cache line with the status identifier of the second identifier does not exist in the graphics processing unit, the target cache line is selected from the cache lines with the status identifier of the first identifier for replacement based on the replacement order information.
8. An electronic device comprising a memory, a processor, and a computer program stored in the memory and running on the processor, characterized in that, When the processor executes the computer program, it implements the cache replacement method as described in any one of claims 1 to 6.
9. A non-transitory computer-readable storage medium having a computer program stored thereon, characterized in that, When the computer program is executed by the processor, it implements the cache replacement method as described in any one of claims 1 to 6.
10. A computer program product, comprising a computer program, characterized in that, When the computer program is executed by the processor, it implements the cache replacement method as described in any one of claims 1 to 6.