A method of managing cache memory and a cache management system

By storing delayed CMO hints and dataset identifiers in cache line metadata, combined with controller management, flexible and efficient cache management is achieved, solving the problems of low efficiency and data loss risk of traditional CMO, and adapting to the management needs of multi-process and large datasets.

CN122309397APending Publication Date: 2026-06-30MEDIATEK INC

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
MEDIATEK INC
Filing Date
2025-12-29
Publication Date
2026-06-30

Smart Images

  • Figure CN122309397A_ABST
    Figure CN122309397A_ABST
Patent Text Reader

Abstract

A method for managing a cache store is provided. The method includes associating a dataset with a cache maintenance operation (CMO) mode, wherein the CMO mode is a deferred mode. The method also includes storing a deferred CMO hint in the metadata portion of a cache line within the cache store; storing a dataset identifier in the metadata portion of the cache line to associate the cache line with a dataset; receiving a trigger command associated with the dataset identifier; and performing a deferred CMO on one or more cache lines in the cache store associated with the dataset identifier based on the trigger command.
Need to check novelty before this filing date? Find Prior Art

Description

[Technical Field] Embodiments of the present invention relate to a cache management mechanism, and more specifically, to a more efficient, flexible and secure cache management mechanism. [Background Technology] In modern computing systems, cache memory temporarily stores data to improve system performance. Cache maintenance operations (CMO) are used to manage cache contents to improve efficiency or data consistency. To improve efficiency, a "flush" operation can be used to evict cache lines and write any dirty data back to main memory, while a "discard" operation evicts cache lines without writing them back, which is suitable for data that is no longer needed.

[0003] Traditional CMO is address-based and executes immediately upon receipt. This CMO approach has several limitations. First, it is inefficient for large data volumes. For example, a user must issue a separate command for each address range, requiring numerous commands to traverse a large dataset (e.g., 1 gigabyte), even if only a small portion resides in the cache (e.g., 1 megabyte). Therefore, traditional CMO processes are both inefficient and power-intensive.

[0004] Secondly, users face practical difficulties when applying these CMOs. Typically, users only realize a dataset is discardable after the entire job is complete, by which time the specific address associated with the job may have been forgotten. Furthermore, in jobs that may require multiple processes to complete, data is shared among these processes. A single user or process cannot determine whether it is the final entity using that data. This means that issuing a discard command in such a situation is risky, as it could lead to the loss of data for other concurrent users. However, once the entire job is complete, the system can determine that the dataset associated with that job is no longer needed and can be safely discarded.

[0005] Therefore, a more efficient, flexible, and secure cache management mechanism needs to be designed to overcome the challenges posed by traditional, real-time, address-based CMO. [Summary of the Invention] In one embodiment, a method for managing a cache memory is disclosed. The method includes associating a dataset with a cache maintenance operation (CMO) mode, wherein the CMO mode is a delayed mode; storing a delayed CMO hint in the metadata portion of a cache line within the cache memory; storing a dataset identifier in the metadata portion of the cache line to associate the cache line with the dataset; receiving a trigger command associated with the dataset identifier; and performing the delayed CMO on one or more cache lines in the cache memory associated with the dataset identifier based on the trigger command.

[0007] In another embodiment, a cache management system is disclosed. The cache management system includes a cache store and a controller. The cache store includes multiple cache lines. The metadata portion of at least one of the multiple cache lines is configured to store a Delayed Cache Maintenance Operation (CMO) hint and a dataset identifier to associate at least one cache line with a dataset. The controller is coupled to the cache store. The controller is configured to associate a dataset with a Delayed CMO pattern. The controller is configured to receive a trigger command associated with the dataset identifier. The controller is configured to perform a Delayed CMO on one or more of the multiple cache lines associated with the dataset identifier based on the trigger command.

[0008] These, and other objectives of the invention, will become apparent to those skilled in the art upon reading the preferred embodiments described below in detail. [Attached Image Description] Figure 1 This is a block diagram of a cache management system 100 according to an embodiment of the present invention.

[0010] Figure 2 It's a concept diagram that illustrates... Figure 1 Two exemplary scenarios for triggering delayed CMO in the cache management system 100.

[0011] Figure 3 It's a flowchart that illustrates... Figure 1 The method for handling cache maintenance operations (CMO) hints in the cache management system.

[0012] Figure 4 It is a state machine diagram, illustrating that in Figure 1 The cache management system 100 in the system transitions between four states in each cache line based on 2-bit CMO hints.

[0013] Figure 5 It is a state machine diagram, illustrating that in Figure 1 The conversion of the 1-bit CMO hint in the cache management system 100.

[0014] Figures 6 to 10 It's a flowchart that illustrates... Figure 1 The cache management system 100 has an integrity check protection mechanism.

[0015] Figure 11 This is a flowchart illustrating the execution of the hardware garbage collection (GC) engine. Figure 1 The method of pending delay CMO in the cache management system.

[0016] Figure 12 It is a detailed data flow diagram that illustrates the data flow in the data flow diagram. Figure 1A working example of a delayed "buffer discard" operation performed by the GC engine in the cache management system 100.

[0017] Figure 13 It's a flowchart that illustrates... Figure 1 The process of executing the eviction mechanism in the cache management system 100.

Detailed Implementation Methods

[0019] Cache memory 20 is a hardware storage component, such as a system-level cache (SLC), configured to temporarily store data to reduce memory access latency. Cache memory 20 comprises multiple cache lines, which are the basic units for data storage and management. Each cache line includes a metadata section configured to store information critical to the Delayed CMO mechanism. The metadata includes at least a dataset identifier (e.g., a buffer ID) that associates the cache line with a specific logical dataset, and a Delayed CMO hint indicating future maintenance operations for the data in that cache line. The Delayed CMO hint can be represented by multiple status bits (e.g., 2 bits) or a single flag (e.g., a 1-bit non-discardable flag).

[0020] Controller 10 is hardware control logic coupled to cache memory 20. Controller 10 is configured to manage and execute the entire deferred CMO process. Furthermore, controller 10 is configured to receive commands from one or more upstream users (e.g., GPU, CPU, or device drivers). In standard computing systems, applications do not directly allocate hardware resources, such as memory buffers. Instead, applications send requests to the operating system (OS) or device drivers. Device drivers, as logical entities, allocate datasets (e.g., buffers) for applications and also assign corresponding dataset identifiers (buffer IDs) used by cache management system 100. Commands received by controller 10 include initial CMO hints for specific memory addresses, which controller 10 uses to encode deferred CMO hints into the metadata of the corresponding cache line.

[0021] Furthermore, controller 10 is configured to receive trigger commands (e.g., buffer flush or buffer discard commands) associated with a specific dataset identifier. Upon receiving a trigger command, controller 10 performs a delayed CMO on all cache lines in cache memory 20 associated with the specified dataset identifier. CMO execution can be performed through various embodiments managed by controller 10, such as by initiating an active background traversal process (i.e., garbage collection or GC engine) to scan cache memory 20, or passively when a cache line is selected as an eviction victim. Controller 10 also implements protection mechanisms to ensure data integrity when dataset identifiers are reused. Operational details of the cache management system 100 are shown below.

[0022] To provide flexibility, the cache management system 100 is configured to support both traditional immediate CMO and delayed CMO. The desired CMO behavior can be dynamically selected on a per-dataset basis, for example, by associating a specific CMO mode with a per-dataset identifier (or buffer ID). In one embodiment, these modes include an "immediate CMO" mode and a "delayed CMO" mode. The immediate CMO mode corresponds to traditional cache maintenance operations. When a buffer ID is configured in this mode, any received CMO command associated with that buffer ID is executed immediately without delay. The immediate CMO mode is particularly suitable for operations where data consistency is a primary concern.

[0023] Deferred CMO mode is a core aspect of the implementation. When the buffer ID is configured in this mode, the received CMO command is not executed immediately. Instead, the corresponding CMO hint is logged or encoded into the metadata of the target cache line for future execution. Deferred CMO mode can be further classified into at least two subtypes to provide different levels of operational security and efficiency, such as deferred, flush only (Type 1) and deferred, flush or discard (Type 2).

[0024] In Type 1 (Flush Only and Delayable), this is a conservative delay mode that does not allow discard-type operations. If a buffer in this mode receives a discard-type CMO, controller 10 will downgrade the request to a flush-type operation. This ensures that data, even if marked for discard, is safely written back to downstream memory, which is useful for data that may be shared or used in a mode with uncertain patterns. Type 2 (Delayable, Flushed, or Discardable) is a more aggressive delay mode that allows both flush-type and discard-type operations. CMO operations in Type 2 achieve maximum efficiency by enabling true discarding (i.e., evicting dirty cache lines without writing them back) when the user is certain that the data is no longer needed. Table T1 illustrates exemplary configurations of these CMO modes assigned to various upstream users or their corresponding buffers.

[0025] Table T1 As shown in Table T1, the system can maintain a mapping to define the CMO behavior for each buffer ID. For example, buffers associated with CPU or Multimedia (MM) logic can be set to "Instant CMO". For some users, such as the CPU, CMO is initiated by software programs. In these scenarios, the software logic has already determined that CMO needs to be performed at that precise moment, so the operation cannot be delayed. Instantaneous requirements are often related to ensuring program-level data consistency. Therefore, to meet the explicit operational requirements of the program, these requests are processed in "Instant CMO" mode for immediate execution. In contrast, buffers used by the GPU can be configured to a safer "Delayed Mode, Flushe Only". Buffers for the Neural Processing Unit (NPU) may handle large amounts of rapidly stale intermediate data and can be set to "Delayed Mode, Flushe or Discard" to maximize cache efficiency. The per-buffer configuration in Table T1 enables the cache management system 100 to dynamically adapt to the specific needs of different applications and data types in the SoC.

[0026] Figure 2 These are conceptual diagrams illustrating two exemplary scenarios of delayed CMO triggering in the cache management system 100. Both scenarios depict a timeline of data buffers used in concurrent jobs, demonstrating flexible triggering granularity. Figure 2 The top illustration in the diagram illustrates a "buffer-based discard" scenario, where task A uses a first set of data buffers, including buffer #0 and buffer #1. In this mode of operation, a user or system can issue a separate trigger command for each individual data buffer as its specific task ends. For example, the timeline shows that a specific "discard buffer 0" trigger command was issued after buffer #0 finished using it. Subsequently, at a later point in time after buffer #1 finished using it, a separate "discard buffer 1" trigger command was issued. The buffer-based discard method provides fine-grained control over the lifecycle of each data buffer in cache memory 20.

[0027] Figure 2The diagram at the bottom illustrates a "task-based discard" scenario, where task B uses a second set of data buffers, including buffer #2 and buffer #3. In this mode, a trigger command is issued for a set of buffers logically associated with a larger task or job. As shown in the timeline, individual tasks in buffers #2 and #3 may finish at different times, but the trigger command is delayed until the entire task B is completed. Upon completion of task B, a single trigger command, "Discard Buffers 2+3," is issued to perform a delayed CMO on all buffers used by task B. The task-based discard approach offers convenience by allowing users to release all resources associated with a task with a single command without having to track the state of each buffer.

[0028] therefore, Figure 2 The cache management system 100 demonstrates how the granularity of trigger commands can be flexibly defined. The system supports cache cleanup based on the specific requirements of the application and the user's capabilities, either on a per-dataset (buffer-based) or grouped-dataset (task-based) basis.

[0029] Figure 3 This is a flowchart illustrating a method for processing CMO prompts in a cache management system 100. Steps S301 to S311 are executed by the controller 10 to determine whether the CMO should be executed immediately or delayed, and how to record the prompt in the cache memory 20. Any technical or hardware modifications are within the scope of this embodiment. Steps S301 to S311 are shown below.

[0030] The process begins in step S301, where controller 10 receives commands with CMO hints from an upstream user. Each command may be one of several types: a standalone CMO command, or a memory access command accompanied by a CMO hint, such as a read request or a write request. Regardless of the command type, each received command and its associated CMO hint are associated with a specific memory address and its corresponding buffer ID. In step S302, controller 10 first checks the operating mode associated with the buffer ID of the incoming CMO hint. The check determines whether the dataset is configured for delayed operation. The mapping from buffer ID to CMO mode can be configured as shown in Table T1. If the buffer ID is not configured to allow delayed CMO (from the "No" branch of S302), the process continues to step S311. In step S311, controller 10 immediately performs the CMO, consistent with conventional caching operations, after which the process ends in step S307.

[0031] If the buffer ID allows for a delayed CMO (from the "Yes" branch of S302), controller 10 performs a security check in step S303. Step S303 determines whether there is a conflict between the received CMO prompt and the buffer's configuration mode. Specifically, it checks whether the CMO prompt is a "Discard" type CMO, while the buffer's mode is the more conservative "Flush Only". If such a conflict exists (from the "Yes" branch of S303), the process proceeds to step S304, where controller 10 overwrites or downgrades the CMO prompt from "Discard" to "Flush" to ensure that data is not accidentally lost. If there is no conflict (from the "No" branch of S303), step S304 is skipped.

[0032] Next, in step S305, controller 10 checks whether the data corresponding to the CMO hint address is currently stored in cache memory 20 (i.e., a cache hit). If a cache hit occurs (from the "Yes" branch of S305), the process continues to step S306. In this step, controller 10 updates the metadata portion of the existing cache line. For example, it may update a 2-bit CMO state machine or a 1-bit non-discardable flag to reflect the received (potentially degraded) CMO hint. The process then ends in step S307.

[0033] If a cache miss occurs (from the "No" branch of S305), the process continues to step S308 to determine whether a new cache line should be allocated for the data. The decision may depend on the type of memory access (e.g., a write operation results in allocation). If controller 10 decides to allocate a new cache line (from the "Yes" branch of S308), the process continues to step S309. In step S309, the CMO hint is written to the metadata of the newly allocated cache line as the data is brought into cache memory 20. If controller 10 decides not to allocate a new line (from the "No" branch of S308), the process continues to step S310, and the CMO hint is ignored or discarded because there is no cache line to store it. After step S309 or S310, the current hint process is completed in step S307.

[0034] The cache management system 100 can be implemented in various ways to store deferred CMO hints in the metadata of each cache line. One embodiment utilizes a 2-bit field, hereinafter referred to as the "CMO bit," to define the deferred CMO state of each cache line. The definitions of these CMO bits are shown in table T3, and user-triggered commands to perform deferred operations are displayed in table T4.

[0035] Table T3 Table T3 illustrates the four possible states represented by the two CMO bits in the cache line metadata. State "0b00" indicates that the cache line does not support deferred CMO. This state applies to data associated with buffer IDs set to "Instant CMO" mode, where any CMO must be performed immediately. State "0b01" indicates a neutral or initial state for a deferred cache line. It indicates that deferred operations are allowed for the line, but no specific flush or discard prompt has been received. State "0b10" indicates a "Deferred, Flushe Only" pending state. When a cache line is in this state, it is marked for a future flush operation, requiring its data to be written back to downstream memory upon eviction. State "0b11" indicates a "Deferred, Discard Allowed" pending state. This state indicates that the cache line is marked for a future discard operation, allowing controller 10 to invalidate the line without writing back its contents, even if they are dirty.

[0036] Table T4 It should be understood that Table T3 defines the state of the state machine maintained in the metadata of each individual cache line. These states are represented by a 2-bit field called the "CMO bit," which records the pending deferred operations for that particular cache line. In contrast to the per-cache-line state, Table T4 defines advanced trigger commands that users can issue to initiate the execution of all pending deferred operations across the entire dataset. These commands are stored as flags in the buffer ID state table. For example, the execution of a pending deferred CMO can be initiated by setting a flag in the state table for a specific buffer ID. When both "Do Flush" and "Do Discard" are 0, the cache management system 100 remains idle and does not trigger any execution. Setting "Do Discard" to 1 (while "Do Flush" is 0) corresponds to the "Buffer Discard" trigger. The controller 10 then iterates through the cache and executes pending operations, attempting to discard cache lines with a state of "0b11" and flush cache lines with a state of "0b10." Setting "Do Flush" to 1 (while "Do Discard" is 0) corresponds to the "Buffer Flush" trigger. This is a safer command that forces a refresh operation on all pending operations (including "0b10" and "0b11" states). If both flags are set to 1, the state is considered undefined. To ensure data security, the cache management system 100 performs the most conservative operation by default, i.e., only a refresh operation.

[0037] Figure 4This is a state machine diagram illustrating the transitions between four states defined in Table T3, based on the 2-bit CMO hints (flush request or drop request) received by the cache management system 100 for each cache line during job execution. State "0b01" is the initial state of the cache line in the deferable buffer. Upon receiving a flush request, the cache line transitions from state "0b01" to state "0b10" ("Deferral, flush only"). Upon receiving a drop request, it transitions from "0b01" to state "0b11" ("Deferral, drop allowed"). State "0b10" is a conservative pending state. Once a cache line enters this state, it cannot become more aggressive. Figure 4 As shown, receiving a discard request while in state "0b10" does not result in a transition to "0b11"; the state remains "0b10". This ensures that cache lines marked as safe to refresh do not accidentally transition to a discardable state. State "0b11" is an aggressive pending state. If a refresh request is received while a cache line is in state "0b11", its state will be "degraded" to the safer state "0b10". This transition provides users with a mechanism to reverse previous discard decisions and force a write-back if conditions change. Finally, state "0b00" is the absorb state for non-delayed cache lines. Once a cache line is in this state, any incoming request (e.g., all requests) will not change its state, thus reinforcing its only immediate behavior.

[0038] Another implementation for storing deferred CMO hints is to use a single bit in the metadata of each cache line, hereinafter referred to as the non-discardable bit. In the single-bit CMO hint scenario, all deferred cache lines are considered discardable by default unless explicitly marked as non-discardable via an incoming request or system event. The definition of the non-discardable bit is given in Table T5. The execution of the deferred operation is then initiated by the trigger command previously defined in Table T4.

[0039] Table T5 Table T5 illustrates the two possible states represented by a single non-discardable bit. State "0b0" is the default state for a deferred cache line, indicating that the line is discardable. When the bit is "0", it indicates that either no specific hint is pending ("waiting") or a discard request has been received. In this state, a "buffer discard" trigger command can invalidate the line without writing it back. State "0b1" indicates that the cache line is non-discardable. When the bit is set to "1", any pending or future deferred CM0s for the line must be treated as flushes, ensuring that its data is written back to downstream memory. This state can be considered a "safe" flag for the cache line.

[0040] Figure 5This is a state machine diagram illustrating the transitions of a 1-bit CMO hint in the cache management system 100. A deferred cache line starts in an initial state "0b0", indicating that it is discardable. Receiving a discard request in this state does not change the state. It remains "0b0" because the line has already been considered discardable. The transition from the discardable state "0b0" to the non-discardable state "0b1" is a one-way, irreversible transition that occurs under specific conditions to ensure data safety. These conditions include: (a) receiving a flush request and (b) an exception occurring. For (a), if a flush request is received in state "0b0", the line must be written back. Therefore, its state transitions to "0b1" to enforce this requirement. For (b), this transition can also be triggered by an exceptional system event, such as when data is "touched by an exceptional user". For example, if data in a buffer expected to be used only by the GPU is also accessed by the CPU, this unexpected sharing introduces uncertainty. As a precaution, the controller 10 will enforce the state of the cache line to "0b1" to prevent unsafe discarding.

[0041] State "0b1" is an absorbing or immutable state. For example... Figure 5 As shown, once the non-discardable bit is set to "1", any subsequent request (e.g., all requests) containing a refresh request or a discard request will not change the state back to "0b0". This ensures that once a cached line is marked as needing to be safely written back, that requirement will not be revoked by subsequent, more aggressive CMO hints.

[0042] To further elaborate on the implementation of the delayed CMO mechanism, the following paragraphs describe a specific implementation based on the previously introduced 2-bit CMO. This implementation relies on specific data structures in the cache memory 20 and the corresponding processing logic in the controller 10.

[0043] Table T6 Table T6 shows an example structure of metadata stored in the tag SRAM for each cache line in cache memory 20. In addition to standard fields such as tag address and cache state, this embodiment includes several fields, as shown below. The Buffer ID field stores an identifier that associates the cache line with a specific dataset or user. The Priority Eviction field is a 1-bit flag that, when set, signals the cache replacement policy to prioritize the eviction of that cache line. The Priority Eviction bit can facilitate efficient, non-blocking execution of deferred operations by avoiding large amounts of immediate write-back. The CMO bit field is a 2-bit field that stores the deferred CMO state of the cache line, corresponding to one of the four states defined in Table T3 (“0b00” to “0b11”). The LRU bit field stores the Least Recently Used (LRU) information for the cache line. The LRU bit is used by the cache replacement policy to determine which cache line should be evicted when a new line needs to be allocated.

[0044] Table T7 Table T7 shows an example buffer ID attribute and status table that controller 10 can maintain. For each buffer ID, the table stores its pre-configured delay mode and its runtime status flags, "Execute refresh" and "Execute discard". These flags are set to "1" when a user issues the corresponding trigger command. For example, Table T7 shows that the "buffer discard" trigger command is currently active for NPU_Buf0 because its "Execute discard" flag is set to "1". When a trigger command for a given buffer ID is active, controller 10 performs specific logic on each associated cache line during background traversal. The processing logic is as follows. When the "buffer discard" trigger command for a given buffer ID is active (i.e., "Execute discard" is "1" and "Execute refresh" is "0"), the specific operation performed by controller 10 on the associated cache line depends on the state of the cache line's "CMO bit" metadata. Specifically, controller 10 will only set the dirty bit of the cache line to "0" and discard the data without writing it back if the cache line's "CMO bit" is in a state that allows discarding (e.g., "delay, allow discard" state "0b11"). Conversely, if the "CMO bit" of a cache line indicates a refresh-only policy (e.g., state "0b10"), controller 10 will perform a refresh operation instead of discarding, even under a "buffer discard" command. If the command is "buffer refresh" (execute refresh is "1"), controller 10 performs a safe refresh by ensuring that the line becomes non-discardable (e.g., by changing the CMO bit from "0b11" to "0b10"), overriding the original discard prompt.

[0045] It should be understood that for any cache line associated with a triggered buffer ID and having a pending delay hint (e.g., its CMO bit is in state "0b10" or "0b11"), controller 10 sets the priority eviction bit of that cache line to "1". This operation marks that the behavior as a priority eviction, but does not immediately stop the process to perform a write-back. The priority eviction flag can be seen as a hint to a cache replacement policy that will select this behavior as the preferred victim when space is needed for future allocation.

[0046] A key issue with delayed CMO mechanisms is ensuring data integrity under certain contention conditions. Controller 10 implements a protection mechanism, or "integrity check," to handle specific protection conditions that may occur during operation. Integrity check conditions occur via a sequence of events. For example, first, an upstream user (such as an NPU) determines that the dataset associated with a specific buffer ID (e.g., BUF1) is no longer needed. Then, the user issues a trigger command to controller 10, such as a "buffer discard" command. In response, the cache management system 100 begins executing a delayed CMO process, for example, by initiating a background traversal to find and manipulate all cache lines marked BUF1. This process requires non-zero time to complete.

[0047] Next, before the delayed CMO process of BUF1 is complete, the NPU reuses the exact same buffer ID, BUF1, for a new task and begins writing or allocating new data to cache memory 20. This is possible because the user does not need to poll the completion status of the background cleanup. At this time, cache memory 20 may simultaneously contain old data marked as discarded and new valid data. Specifically, both old and new data are associated with the same buffer ID (BUF1), and controller 10 cannot distinguish between them. This ambiguity poses a significant risk. That is, new valid data belonging to the reused buffer ID may be unintentionally and incorrectly discarded by the still-ongoing delayed CMO process, resulting in data corruption. Therefore, a protection mechanism is needed to prevent such unintentional data loss. Details are as follows.

[0048] When controller 10 receives a new CMO hint for a buffer ID that is already undergoing a delayed discard process, one embodiment of the protection mechanism is triggered. To ensure data integrity, controller 10 first performs an integrity check on the state of the buffer ID associated with the incoming CMO hint. If controller 10 determines that the buffer ID is currently in a "buffer discard" triggered state, it applies protection measures to the newly received CMO hint to prevent the data associated with this new hint from being accidentally discarded. Controller 10 can be configured to apply one of the following solutions.

[0049] In the first solution, controller 10 is configured to simply ignore or discard newly received CMO hints. By not encoding this new hint into the metadata of any cache line, cache management system 100 ensures that the data corresponding to the new CMO hint is not affected by ongoing old dataset discarding operations. The first solution provides a simple method to prevent data corruption.

[0050] In the second solution, controller 10 is configured to convert new CMO hints to a "refresh-only" state, regardless of their original type. For example, if the incoming CMO hint is a "discard request," controller 10 will downgrade it and force the metadata of the corresponding cache line into a safe, non-discardable state (e.g., state "0b10" in the 2-bit embodiment). This second solution is advantageous because it still preserves the user's intent to perform cache maintenance operations while completely eliminating the risk of discarding erroneous data.

[0051] When controller 10 receives a new memory allocation request for a buffer ID that is already undergoing a delayed discard process, an embodiment of another protection mechanism is triggered. The allocation request may be, for example, part of a write operation to an address that is not currently in the cache. Upon receiving the allocation request, controller 10 performs an integrity check on the state of the relevant buffer ID. If controller 10 determines that the buffer ID currently has a "buffer discard" trigger active, it applies protection measures to the newly allocated data to prevent it from being erroneously discarded by the ongoing process. Controller 10 can be configured to apply one of the following solutions.

[0052] In the first solution, controller 10 is configured to handle allocation requests by forcing new data to be non-cacheable (or "non-allocation"). This means the data will not be written to cache memory 20. Instead, it may be written directly to downstream memory. This first solution effectively isolates the new data from the cleanup operations being performed by cache management system 100.

[0053] In the second solution, new data is allocated to a new cache line within cache memory 20, but controller 10 immediately forces the metadata of that new line into a non-discardable state. For example, controller 10 may set its non-discardable bit to "1" (in a 1-bit embodiment) or its CMO bit to state "0b10" (in a 2-bit embodiment). The second solution is advantageous because new data can still benefit from cache residency while being fully protected against accidental discarding.

[0054] Figures 6 to 10 This is an explanation Figure 1 A flowchart of the integrity check and protection mechanism of the cache management system 100. Figure 6 This is a flowchart illustrating a first embodiment of the protection mechanism, triggered when controller 10 receives a new command with a CMO prompt while a delayed discard operation with the associated buffer ID is already in progress. This mechanism ensures data integrity by preventing new, valid data from being unintentionally discarded.

[0055] The process begins in step S601, where controller 10 receives one or more new commands, each associated with a specific buffer ID and containing a corresponding CMO prompt. Subsequently, in step S602, controller 10 evaluates whether the received CMO prompt is valid. If the prompt is determined to be invalid (from the "No" branch of S602), the prompt is ignored, and the process ends in step S605.

[0056] If the prompt is valid (from the "Yes" branch of S602), the process continues to the core integrity check in step S603. In step S603, controller 10 checks the status of the buffer ID associated with the incoming command to determine if a "buffer discard" operation is currently active for that buffer ID (i.e., whether the "execute discard" flag is set to "1"). If no discard operation is active (from the "No" branch of S603), this means there is no immediate risk of data corruption for the new command, and the check ends in step S605.

[0057] However, if the buffer ID is currently undergoing "buffer discard" (from the "Yes" branch of S603), the protection condition is met, and the controller 10 takes protective measures in step S604. In step S604, to prevent data associated with the incoming command from being unintentionally discarded, the controller 10 applies protective measures by discarding or ignoring newly received CMO prompts. Then, the protection check process is completed in step S605.

[0058] Figure 7 This is a flowchart illustrating the second embodiment of the protection mechanism, and... Figure 6 Compared to the embodiment shown, this method applies different protection measures. The method is also triggered when the controller 10 receives a new command during an active delayed discard operation, but instead of discarding the prompt, the controller 10 transitions the prompt to a safer operating state.

[0059] The process begins in step S701, where controller 10 receives one or more new commands, each associated with a specific buffer ID and containing a corresponding CMO hint. In step S702, controller 10 evaluates the type of the incoming CMO hint to determine whether it constitutes a "drop request". If the CMO hint is not a "drop request" (from the "No" branch of S702), then that particular protection mechanism does not apply, and the check ends in step S705.

[0060] If the CMO prompt is a "drop request" (from the "Yes" branch of S702), the process continues to the integrity check in step S703. In step S703, the controller 10 determines whether a "buffer drop" operation is currently in progress for the buffer ID associated with the new command (i.e., whether the "Execute drop" flag is set to "1"). If no drop operation is active for the buffer ID (from the "No" branch of S703), the new "drop request" is considered safe, and the check ends in step S705.

[0061] However, if the protection condition is met, meaning a new "drop request" has been received for a buffer ID that is being discarded (from the "Yes" branch of S703), controller 10 performs protection measures in step S704. In step S704, controller 10 overrides or downgrades the CMO prompt, changing it from a "drop request" to a "refresh request." This operation ensures that the data associated with the new command will be safely written back, thus preventing data loss, while still respecting the user's general intent to perform cache maintenance operations. After the prompt is converted, the protection check process is completed in step S705.

[0062] Figure 8 This is a flowchart illustrating a third embodiment of the protection mechanism. The process begins in step S801, where controller 10 receives one or more new commands, each associated with a specific buffer ID and containing a corresponding CMO hint. In step S802, controller 10 evaluates whether the incoming command requires the allocation of a new cache line. If the command does not require allocation, for example in a cache hit case, then no protection mechanism is needed, and the process ends in step S805.

[0063] However, if the command does indeed require the allocation of a cache line (from the "Yes" branch of S802), the process continues to the integrity check in step S803. In step S803, the controller 10 performs a core integrity check by determining whether a "buffer discard" operation is currently active with the associated buffer ID (i.e., whether the "Execute Discard" flag is set to "1"). If no discard operation is in progress (from the "No" branch of S803), the allocation request is considered safe, the process ends in step S805, and the allocation proceeds normally.

[0064] Conversely, if the protection condition is met, i.e., a request is made to allocate a buffer ID that is being actively discarded (from the "Yes" branch of S803), controller 10 performs protection measures in step S804. In step S804, controller 10 overwrites the original request and forces the operation to "buffer non-allocation". This operation prevents new data from being written to the cache memory, thereby protecting it from the ongoing discarding process. For example, data could be written directly to downstream memory components. After this overwrite is completed, the protection check process is completed in step S805.

[0065] Figure 9 This is a flowchart illustrating the fourth embodiment of the protection mechanism, presenting... Figure 8 This is a variant of the allocation control mechanism shown. This method is also triggered by a command that requires allocating a cache for a buffer ID that is currently undergoing a delayed discard operation.

[0066] The process begins in step S901, where controller 10 receives one or more new commands, each associated with a specific buffer ID and containing a corresponding CMO hint. In step S902, controller 10 determines whether the command requires the allocation of a new cache line. If no allocation is required, the process ends in step S905.

[0067] If allocation is required (from the "Yes" branch of S902), controller 10 continues with an integrity check in step S903 to determine if the "buffer discard" operation with the associated buffer ID is active. If the operation is inactive (from the "No" branch of S903), the allocation request is considered safe, and the check ends in step S905.

[0068] However, if the protection condition is met (from the "Yes" branch of S903), controller 10 performs protection measures in step S904. In step S904, controller 10 modifies the attribute by overriding the cache allocation request as a "non-cacheable operation". Figure 8 Similar to the "non-allocation" operation, this ensures that new data is not stored in the cache memory, thus preventing it from being affected by the ongoing discarding process. After this modification is completed, the protection check process is finished in step S905.

[0069] The "cache non-allocation" mentioned in step S804 and the "non-cacheable operation" mentioned in step S904 both describe protective measures designed to prevent new data from being written to the cache memory under protected conditions, although they may represent different implementation-level details. Specifically, Figure 8 The "cache non-allocation" strategy mentioned refers to the decision made by the cache controller in response to a specific command that misses in the cache. For example, under the "write non-allocation" strategy, a missed write command is sent directly to the downstream memory component without allocating a new cache line for that data in the cache. In contrast, Figure 9 The "non-cached operation" mentioned means that the transaction itself is marked as an attribute indicating that the data it carries must completely bypass the cache, whether it hits or misses. Therefore, the difference may lie in the scope of the mechanism: "non-allocation" is usually applied to caching strategies for misses, while "non-cached" can be an inherent attribute of the operation, indicating that it should not be served by the cache.

[0070] Figure 10 This is a flowchart illustrating the fifth embodiment of the protection mechanism, providing... Figure 8 and Figure 9 An alternative to the allocation control method. In this method, new data is allowed to be allocated to the cache, but its metadata is immediately modified to ensure its safety from any ongoing discard operations.

[0071] The process begins in step S1001, where controller 10 receives a new command associated with the buffer ID. In step S1002, controller 10 determines whether the command requires the allocation of a new cache line; if no allocation is required, the process ends in step S1005. If allocation is required (from the "Yes" branch of S1002), controller 10 continues with an integrity check in step S1003. In step S1003, controller 10 checks whether the "buffer discard" operation for the buffer ID is currently active. If the operation is inactive (from the "No" branch of S1003), the allocation is considered safe, and the check ends in step S1005.

[0072] However, if the protection condition is met (from the "Yes" branch of S1003), i.e., a request is made to allocate a buffer ID that is currently undergoing a discard process, controller 10 performs protection measures in step S1004. In step S1004, controller 10 allows new data to be allocated to a new cache line, but immediately overwrites the metadata of the newly allocated line, specifically setting its "CMO bit" to "non-discardable state". This operation ensures that new data can benefit from cache residency, but it is protected against erroneous removal by the ongoing discard operation. For example, its state can be set to "0b10" ("delayed, refresh only"). The protection check process is then completed in step S1005.

[0073] The above embodiments demonstrate how to store details of delayed CMO hints and their associated buffer IDs in the metadata of a cache line, for example, using a 2-bit CMO bit field or a single non-discardable bit. Furthermore, protection mechanisms have been described to ensure data integrity during the delayed CMO process. The following embodiments describe a mechanism for performing delayed CMO stored in cache memory 20. A first implementation of performing delayed CMO is through an active background traversal process implemented by a hardware garbage collection (GC) engine, for example, within controller 10.

[0074] The GC engine is configured to periodically and automatically scan cache memory 20 to identify and process cache lines with active latency CM0. The GC engine scanning mechanism provides a "fire-and-forget" function, transparent to upstream users, eliminating the need for users to poll for cleanup task completion. Detailed operation of the GC engine is shown below.

[0075] To implement the GC engine, the cache management system 100 utilizes specific data structures to store information about various deferred operations. The GC engine provides a general framework for periodically scanning the cache and performing triggered, dataset-based operations. These operations can include not only cache maintenance operations (CMO) but also other tasks such as data compression or priority management. Tables T8 and T9 illustrate exemplary data structures for this mechanism.

[0076] Table T8 Table T8 shows an example structure of metadata that can be stored for each cache line within cache memory 20. This structure is designed to support a variety of deferred operations. For example, there is an opcode field for storing a specific deferred operation hint intended for this cache line. For instance, the opcode could be set to "Do Compress" to mark the line for a future compression operation. Users can trigger this operation after the data is no longer frequently accessed, reducing the bandwidth required to write back to DRAM. In other cases, this field can store deferred CMO hints, such as discard hints. A priority offset field can be used to implement another type of deferred operation, such as allowing a user to trigger priority adjustments for all cache lines with a specific buffer ID.

[0077] Table T9 Table T9 shows a sample structure of the status table monitored by the GC engine. Table T9 is configured to store all pending high-level requests based on datasets issued by upstream users. Columns represent different categories of pending operations that can be triggered based on each buffer ID.

[0078] When a user wants to trigger a delayed operation for the entire dataset, a corresponding flag is set in this table. For example, to trigger a priority change for the "CPU" buffer ID, its pending priority operation entry is set to Set -1. Similarly, to trigger a "buffer discard," the pending opcode entry for the target buffer ID is set to indicate that a discard operation is pending. The GC engine detects these pending commands in the status table and then initiates a cache traversal process to execute the corresponding operation based on the opcode stored in each cache line.

[0079] Figure 11This is a flowchart illustrating how the hardware garbage collection (GC) engine executes pending latency CMOs of the cache management system 100. The GC engine provides an automated and user-transparent cache cleanup mechanism.

[0080] The process begins at step S1101, where the GC engine periodically checks the status table, for example, at the time of a regular “heartbeat” signal, to determine if any pending commands exist. At decision step S1102, if no pending commands are found (“No” branch), the GC engine enters an idle state at step S1109 and waits for the next heartbeat to repeat the check.

[0081] If one or more pending commands are detected in the status table (from the "Yes" branch of S1102), the process continues to step S1103. Step S1103 is crucial for the "issue and discard" function. The GC engine takes a snapshot of the entire status table, saving the state of all trigger flags (e.g., Do Flush, Do Discard) for all buffer IDs at a specific moment. The snapshot can be viewed as a record of the operations to be fully completed in the current GC cycle. After taking the snapshot, the GC engine begins a full traversal of the cache memory 20 in step S1104, systematically scanning all cache sets and methods.

[0082] During the traversal, for each cache line encountered, the process moves to step S1105. The engine checks if the buffer ID of the current cache line has a corresponding pending command in the current state table (not a snapshot). Step S1105 allows new commands arriving midway through the traversal to be processed immediately. If a pending command exists (from the "Yes" branch of S1105), the process continues to step S1106, where controller 10 performs appropriate operations on the metadata of the cache line. The specific operation is based on the trigger command in the state table and the delayed CMO hint (or opcode) stored in the cache line itself. The process then continues, looping through all cache lines until the traversal is complete, as determined in step S1107.

[0083] Once the entire traversal is complete (from the "Yes" branch in S1107), the process continues to step S1108. In this step, controller 10 updates the state table based on the previous snapshot taken in step S1103. Specifically, it only clears those trigger flags set in the snapshot. Any new trigger commands issued during the traversal are not in the snapshot, so their flags remain set in the state table. This method ensures that the partial completion of new commands will be fully processed in the "next GC cycle". The snapshot-based update mechanism makes the entire process transparent to the user. Finally, the current GC cycle ends, and the process moves to step S1109 to wait for the next heartbeat.

[0084] To further explain Figure 11 The snapshot mechanism and user-transparent update process shown in steps S1103 and S1108 are illustrated in the following specific operation example, with reference to Tables T10 and T11.

[0085] Table T10 Table T10 represents the state of the state table at the start of the GC cycle, when a snapshot is taken in step S1103. In this example, the "buffer discard" trigger command is active for buffer IDs "B0" and "B1". At this time, there is no active trigger command for buffer ID "B2". The snapshot records that the operations of "B0" and "B1" are the specified tasks to be completed in this cycle. As the GC engine continues to traverse the cache memory 20 (steps S1104-S1107), the following new events occur in this example.

[0086] Event 1: A new "buffer discard" trigger command was issued for buffer ID "B2".

[0087] Event 2: An extreme case event occurred with buffer ID "B1". The upstream user reused buffer ID "B1" to execute a new task without waiting for the original discard operation to complete, and then issued a new "buffer flush" trigger command for "B1".

[0088] Table T11 After applying the update logic of step S1108 at the end of the traversal, controller 10 compares the current state with the snapshot taken in table T10 to determine which flags to clear. For example, for buffer ID "B0", the discard flag is "1" in the snapshot, and no new command has been received. Therefore, the operation is considered complete, and its discard flag is cleared from "1" to "0". For buffer ID "B2", a new discard command arrives during the traversal. Since this command is not in the original snapshot, its discard flag is not cleared and remains "1". This ensures that the command for "B2" will be fully processed in the next GC cycle. For buffer ID "B1", the original discard command is in the snapshot, so it is cleared from "1" to "0". However, a new refresh command arrives midway through the cycle and is not in the snapshot. Therefore, its refresh flag remains set to "1".

[0089] This embodiment demonstrates how a snapshot-based update mechanism correctly handles new commands arriving mid-cycle. The snapshot-based update mechanism ensures that only the initially requested operation is marked as complete, while any new requests remain pending in the next cycle. Therefore, it guarantees the integrity of all operations and makes the entire process transparent to the user, who can issue new commands at any time without polling to complete previous commands.

[0090] As illustrated in the implementation example of the 2-bit and 1-bit CMO hints in the cache line embodiment, the GC engine, as part of controller 10, is configured to periodically monitor the buffer ID attribute and the status table (table T7 for example). When the GC engine detects an active trigger command, such as "Perform flush" or "Perform discard" flags being set to "1", it initiates a traversal process to scan the cache memory 20. During the scan, the GC engine reads the metadata for each cache line, structured as shown in the previously mentioned table (e.g., table T6), and applies the corresponding processing logic to perform pending deferred operations.

[0091] It should be understood that the delayed CMO of the embodiments allows for a more flexible and efficient execution approach. One embodiment of the flexible approach involves using a single priority eviction bit in the metadata of each cache line to mark that data should be evicted first. Instead of immediately invalidating the cache line when it is processed for its pending delayed CMO, the controller 10 can simply set the priority eviction bit of that line to "1". This operation can be seen as a hint to the cache replacement policy. Subsequently, when a new allocation request requires space in the cache, the replacement policy will prioritize evicting the cache action victim with the priority eviction hint. The priority eviction mechanism thus achieves the goal of freeing up space for other data without causing immediate performance stagnation, as physical eviction and any associated write-back traffic are spread out over a period of time. The priority eviction mechanism is an integral part of the GC engine execution process. When the GC engine traverses the cache memory to perform delayed CMOs, it applies the following logic: For any pending refresh or discard type CMO, the controller 10 sets the priority eviction bit of the corresponding cache line. If the cache line is marked as non-discardable, the controller 10 performs a refresh type operation. During a flush, it also ensures that the row is marked as non-discardable (e.g., by setting a flag or updating its state) to prevent it from being discarded in the future. When a discard-type CMO is performed on a discardable row (e.g., its non-discardable bit is "0"), controller 10 sets the dirty bit of that row to "0" to prevent a write-back during eviction.

[0092] The advantage of the GC engine implementation is that its operation is designed to be "transparent" to upstream software (SW) or the user. This is also known as a "issue-and-forget" mechanism, simplifying the software's responsibility for cache management. From the user's perspective, "transparency" means that once a trigger command is issued to execute a delayed COMMIT for a specific buffer ID, the user can assume the task has been handed over and that the buffer ID is outdated for previous tasks. The user does not need to poll the status register to check completion, and neither the controller 10 nor the GC engine sends any feedback signals (e.g., interrupts) after the operation is complete. The user can immediately perform other tasks. This user-transparent behavior is enabled by the GC engine's snapshot-based update process. Figure 11 As described in the flowchart, this mechanism performs the following steps. First, the GC engine takes a snapshot of the status table to record which buffer IDs have active trigger commands at the start of the traversal cycle. After the cache traversal is complete, the GC engine automatically resets the trigger flags of those buffer IDs recorded in the snapshot in the status table. This reset logic can be implemented using bit operations, such as `Status-table[n:0]&= ~Snapshot[n:0]`, which clears the bits set in the snapshot. By automatically managing the lifecycle of trigger commands, this mechanism reduces the burden on users to track the cleanup process, thereby simplifying software design and improving overall system efficiency.

[0093] Figure 12 This is a detailed data flow diagram illustrating a working example of the GC engine performing a delayed "buffer discard" operation of the cache management system 100. The diagram shows the comprehensive data and control flow from the initial user command that logs the delay warning, to the advanced trigger that initiates the background cleanup process, and finally to the automatic reset that makes the mechanism transparent to the user.

[0094] This process involves an upstream user, such as a processing unit (e.g., a GPU), which interacts with the system in two distinct phases. First, during job execution, the processing unit issues normal read / write commands accompanied by a CMO hint and a buffer ID. The system can record the anticipated future operation (e.g., flush or discard) in its metadata when allocating the corresponding cache line. Second, once the user determines that the dataset is no longer needed (e.g., after job completion), it issues a high-level trigger command, such as a "buffer discard" for a specific buffer ID. Instead of operating on a specific address, the trigger command modifies a flag in the status table (e.g., setting the "perform discard" flag to "1") indicating the intent to perform the previously recorded operation on the entire dataset.

[0095] Specifically, the GC initiator periodically monitors the state table (step 1, "Check State Table") to detect if any "Perform Refresh" or "Perform Discard" flags become active. Upon detecting an active request, a GC cycle begins. As its first action (step 2, "Perform Snapshot"), controller 10 captures a complete snapshot of the current state of the state table. Snapshots are important because they decouple long-running traversal processes from the ability of the user to issue new commands, thus preventing race conditions where a new, irrelevant command might be incorrectly purged when the current cycle completes.

[0096] After the snapshot is taken, the GC initiator instructs the GC operation engine to begin step 3, "Execute Traversal". The GC operation engine systematically scans the entire cache memory. For each cache line, it reads the line's metadata ("Cache Line Per Set Information"), including its buffer ID and its "non-discardable" status bit. It uses the buffer ID to look up the corresponding trigger command flags ("Execute Flush" and "Execute Discard") from the live status table. The input set is fed into the "GC Operation Table", which contains the processing logic. A GC operation table with a 1-bit CMO hint can be represented as table GCOP1. A GC operation table with a 2-bit CMO hint can be represented as table GCOP2.

[0097] Table GCOP1 Table GCOP1 illustrates in detail the processing logic of the GC operation engine under a 1-bit CMO hint implementation. Table GCOP1 specifies how controller 10 modifies the tag metadata of cache lines based on the combination of trigger commands in the state table (specifically the "Perform refresh" and "Perform discard" flags) and the inherent state of the cache line (specifically its "non-discardable" bit). The logic defined in the table addresses four main operational cases.

[0098] The first row of table GCOP1 corresponds to the idle state. This occurs when the "Perform flush" and "Perform discard" flags of the associated buffer ID are both inactive (i.e., set to '0'). In this state, there are no operations pending. Therefore, controller 10 does not make any changes to the metadata of the cache line. The "Dirty", "Priority eviction", and "Non-discardable" bits all remain in their current state.

[0099] The second row of Table GCOP1 defines the behavior of the buffer flush command. This is triggered when the "Perform Flush" flag is active (set to '1'). This command performs a safe, unconditional flush. To achieve this, controller 10 maintains the "dirty" bit to ensure that if data is dirty, it will be written back to downstream memory. Simultaneously, it sets the "Priority Eviction" bit to '1' to signal the replacement policy to prioritize the row for eviction. Crucially, it also forces the "Non-Droptable" bit to '1', permanently marking this behavior as non-droptable to ensure its integrity and prevent any subsequent, more aggressive drop commands.

[0100] The third row of table GCOP1 handles the primary buffer discard scenario. This applies when the "Buffer Discard" command is active ("Execute Discard" is '1') and the cache line is discardable (its "Non-Dropable" bit is '0'). Here, controller 10 performs a core discard operation by setting the "Dirty" bit to '0', which prevents the data of that line from being written back during eviction. The "Priority Eviction" bit is also set to '1' to ensure that the cache line is quickly available for new data. The "Non-Dropable" state remains unchanged.

[0101] The fourth row of Table GCOP1 describes the protective behavior of buffer discard commands for non-discardable rows. This situation is triggered when the "Buffer Discard" command is active ("Execute Discard" is '1'), but the "Non-Discardable" bit of the cached row is set to '1', indicating that it has been previously marked as requiring a safe write-back. In this case, controller 10 follows the protective "Non-Discardable" state. It keeps the "Dirty" bit unchanged, thus preventing data from being discarded and effectively degrading the operation to a flush. It still sets the "Priority Eviction" bit to '1' to mark the action as a safe eviction. The "Non-Discardable" state remains at '1'.

[0102] Table GCOP2 Table GCOP2 illustrations detail the processing logic of the GC operation engine under the 2-bit CMO hint implementation. In the first case, when both the "Perform Flush" and "Perform Drop" flags are inactive ('0'), there are no pending operations, and controller 10 does not modify the "dirty," "priority eviction," or "non-discardable" metadata of cache lines. When the "Buffer Flush" command is active ("Perform Flush" is '1'), the system performs a safe flush (i.e., input state "0b1x") on any line in a pending deferable state. In this case, controller 10 reserves the "dirty" bit for potential write-back, sets the "priority eviction" bit to '1', and forces the output "non-discardable" state to "0b10" ("deferable, flush only"), thereby downgrading any cache line previously marked as discardable to a safer, flush only state. In the case of the "Buffer Drop" command ("Perform Drop" is '1'), the operation depends on the specific 2-bit state. If a cache line is in the "Deferred, Allow Drop" state ("0b11"), controller 10 performs a drop by setting the "Dirty" bit to '0' and the "Priority Eviction" bit to '1'. However, if a cache line is already in the more protective "Deferred, Flush Only" state ("0b10"), controller 10 follows this state by keeping the "Dirty" bit unchanged and only setting the "Priority Eviction" bit to '1', treating the drop command as a flush for that specific line.

[0103] After completing the full traversal, controller 10 performs a final cleanup as step 4, "reset after traversal." It uses previously captured snapshots to accurately identify which trigger flags were active at the start of this particular cycle. It then clears only those specific flags from the real-time status table, for example, via a bitwise operation (Status Table &= ~Snapshot). This snapshot-based precise reset is key to the mechanism's "once-and-forget" nature. It ensures that only fully completed requests are cleared, while any new requests arriving midway through the cycle remain pending. By using this mechanism, processing units can reuse buffer IDs at any time without polling for background cleanup completion, making the entire process robust and transparent.

[0104] The above embodiments illustrate the execution of delayed CMO, in which an active, hardware-based garbage collection (GC) engine periodically traverses the cache memory to process pending operations. While the GC engine provides a robust solution, a situation may arise where a replacement policy selects a cache behavior with pending delayed CMO to evict the victim before the GC engine has a chance to process a particular cache line. In this case, the pending delayed CMO hints may be lost when the cache line is evicted, resulting in the inability to perform the intended operation.

[0105] To address this situation and provide a more comprehensive solution, a second implementation of delayed CMO is disclosed. This mechanism can be implemented as an alternative to the GC engine, or as a supplementary "remedy" or "workaround." The second implementation is a passive "eviction-on-demand" mechanism. In contrast to the proactive scanning of the GC engine, the eviction-on-demand mechanism integrates delayed CMO checks directly into the standard eviction path of the cache. The detailed operation of this passive mechanism will be described later.

[0106] Figure 13 This is a flowchart illustrating the process by which the cache management system 100 executes the eviction-on-demand (EDM) mechanism. An alternative or complementary embodiment to the delayed CMO is a passive EDM mechanism. Instead of actively scanning the cache, this mechanism integrates the check for pending CMOs into the standard eviction path of the cache memory. The passive EDM mechanism utilizes the exemplary data structures shown in Tables T12, T13, and T14.

[0107] Table T12 is an exemplary state table maintained by controller 10 for controlling the passive "execute on eviction" mechanism. Structurally and functionally, table T12 is similar to the state table used by the active GC engine, as it contains "execute flush" and "execute discard" flags for each buffer ID to specify pending cache maintenance operations. However, unlike the GC engine's table, which is periodically polled to initiate a full cache traversal, table T12 is only queried when a particular cache line has been selected as an eviction victim. Table T13 defines the state of a 2-bit "CMO bit" field stored in the metadata of each individual cache line to indicate its specific deferred CMO state. This state is queried during the "execute on eviction" event to determine the appropriate action to be performed on the victim cache line. As shown in the table, state "0b00" indicates that deferred CMO is not allowed for the cache line. State "0b01" is a neutral state, indicating that while deferred CMO is allowed, no specific action is currently pending. State "0b10" indicates a pending deferred flush, as it requires writing back dirty data before the cache line expires. Finally, the status "0b11" indicates a pending delay for discarding, where cached lines can be invalidated without being written back.

[0108] exist Figure 13 In this process, the eviction execution mechanism begins at step S1301 during the marking operation. This mechanism is triggered in step S1302 when a new data allocation request misses the cache, requiring the eviction of an existing cache line. In step S1303, the cache replacement policy finds a victim cache line for eviction.

[0109] In step S1304, controller 10 can check whether the victim cache line contains valid data. If the cache line is invalid, it can be directly overwritten without further action. If the cache line is valid (from the "Yes" branch of S1304), the process proceeds to the core decision in step S1305. Here, controller 10 can check whether an operation is allowed on the buffer ID of the victim line. It reads the buffer ID from the victim line (e.g., "GPU_0" from Set0 / Way1 in table T14) and looks up its status in the allowed operation code table (table T12).

[0110] If execution is permitted (from the "Yes" branch of S1305, e.g., "GPU_0" is set to "Execute Drop=1" in table T12), the process proceeds to step S1306. Controller 10 reads the opcode from the metadata of the victim row (e.g., "11" for Set0 / Way1 in table T14) and performs the corresponding operation according to the definition in table T13. For example, for opcode "11", controller 10 performs drop by clearing the dirty bits of the row before overwriting. The process then proceeds to the replacement handler in step S1307.

[0111] If execution on the victim row's buffer ID is currently not permitted (from the "No" branch of S1305, such as "GPU_1" set to "Do Discard & Do Flush = 0" in Table T12), the process proceeds to step S1308. In this case, to prevent the loss of the delayed CMO hint, controller 10 may choose to propagate the hint (opcode) and its buffer ID as part of the write-back data to the downstream memory component. The propagated hint can then be stored in the metadata of the corresponding data entry in the downstream memory. It should be understood that the benefit of propagating the delayed CMO hint is that it preserves the opportunity for cache efficiency optimization at lower levels of the storage hierarchy. If propagation is not performed, once a cache line is evicted from the current cache, its delayed state (e.g., the data is discardable) will be lost. Therefore, any downstream memory component will be forced to handle the data conservatively, such as performing a full write-back on dirty data. However, by propagating the hint from the second-level cache (L2) to the third-level cache (L3) in a multi-level caching system, the L3 cache inherits the knowledge that the data may be discardable. Then, the L3 cache can perform a discard operation at this level when the appropriate triggering conditions are met, thereby extending the power and bandwidth savings of the delayed CMO mechanism throughout the storage hierarchy. The process then continues to the replacement process in step S1307.

[0112] Figure 13 The eviction-on-demand mechanism offers several advantages. The primary benefit of this passive approach is its efficiency in terms of power and performance. Compared to an active GC engine that must periodically traverse the entire cache memory, the eviction-on-demand mechanism does not incur the overhead associated with a full scan, as it is activated only when eviction has occurred. This makes the mechanism simpler and more energy-efficient. Furthermore, the eviction-on-demand mechanism ensures that if a cache behavior victim with pending delayed CMO is selected before the GC engine processes it, the pending operation is still processed during eviction, rather than being lost. Therefore, the eviction-on-demand mechanism makes the overall delayed CMO system more robust and comprehensive.

[0113] In summary, the various embodiments described above illustrate a cache management system and its operation method. This method offers a fundamental difference from conventional approaches by storing delayed CMO hints and dataset identifiers (buffer IDs) in the metadata of each cache line. Subsequently, upon receiving a single high-level trigger command associated with the dataset identifier, the controller efficiently performs the pending delayed CMO on all corresponding cache lines within the cache memory.

[0114] The dataset-based delayed CMO mechanism overcomes significant drawbacks of existing technologies. It eliminates the need for users to issue numerous address-based commands to traverse the entire data footprint, thus significantly improving efficiency and power consumption. Furthermore, it frees users from the impractical burden of tracking specific memory addresses, allowing for simple and intuitive cleanup operations (e.g., "buffer discard") after job completion.

[0115] The advantages of these implementations are further enhanced by their flexibility and robustness. Cache management systems can support various operating modes (e.g., "flush only," "flush or discard") and trigger granularity (buffer-based or job-based) to meet diverse application needs. Prioritized eviction mechanisms ensure smooth execution without performance stagnation, while snapshot-based GC engines provide a transparent "once-and-forget" experience for the user. Passive "execute-on-eviction" mechanisms provide a complementary safety net, and race condition protection guarantees data integrity even when buffer IDs are reused. Therefore, the capabilities of these cache management systems offer a more advanced, efficient, and significantly more secure cache management solution than traditional methods.

[0116] Those skilled in the art will readily observe that numerous modifications and alterations can be made to the apparatus and method while retaining the teachings of the invention. Therefore, the above disclosure should be interpreted only within the scope and limits of the appended claims.

Claims

1. A method for managing a cache memory, comprising: Associate the dataset with a cache maintenance operation (CMO) mode, where the CMO mode is a delayed mode; The latency CMO hint is stored in the metadata portion of the cache line within the cache memory; The dataset identifier is stored in the metadata portion of the cache line to associate the cache line with the dataset; Receive a trigger command associated with the dataset identifier; as well as Based on the trigger command, the delayed CMO is performed on one or more cache lines in the cache memory associated with the dataset identifier.

2. The method of claim 1, wherein the delay mode is selected for the dataset from a plurality of available CMO modes, the plurality of modes further including an instant mode, and an instant CMO is performed when the dataset is associated with the instant mode.

3. The method of claim 1, wherein the delayed CMO hint is represented by two bits stored in the cache line, and the two bits define the state machine of the cache line as including at least: A refresh pending state, where the delay CMO is pending and needs to be written back for refresh, and a discard pending state, where the delay CMO is pending and can be discarded without being written back.

4. The method of claim 1, wherein the delayed CMO hint is represented by a single bit stored in the cache line, and the single bit indicates a first state when a first value is set and a second state when a second value is set, the first state allowing a delayed discard operation on the cache line, and the second state disallowing the delayed discard operation or forcing a delayed refresh operation on the cache line.

5. The method of claim 1, wherein the delayed CMO hint indicates whether the cache line can be discarded without a write-back or requires a write-back flush, and performing the delayed CMO comprises: If the delayed CMO hint indicates that the cache line is discardable, the dirty bits of the cache line are selectively cleared.

6. The method of claim 5, wherein after selectively clearing the dirty bit, the cache line is converted to a clean state without being written back to further memory components.

7. The method of claim 5, further comprising: In response to the delayed CMO prompt indicating that the cache line needs to be written back and flushed, the write back of the cache line is initiated.

8. The method of claim 1, wherein performing the delayed CMO comprises: Set a priority eviction indicator in the metadata section of the one or more cached lines; The priority eviction indicator signals the cache replacement policy to prioritize the one or more cache lines for subsequent eviction.

9. The method of claim 1, wherein performing the delayed CMO comprises: The process initiates a traversal of multiple cache lines within the cache memory via a background process. as well as During this traversal, one or more cache lines are identified based on the dataset identifier.

10. The method of claim 9, further comprising: Store the trigger command in the status table; as well as Create a snapshot of the state table before initiating the traversal.

11. The method of claim 10, further comprising: After the traversal is completed, the snapshot based on the state table is cleared from the state table to make the background process transparent to the user who issued the trigger command.

12. The method of claim 1, wherein performing the delayed CMO comprises: Select a victim cache line from the cache memory for eviction; Determine whether the victim's cached line is associated with the dataset identifier that has received the trigger command; as well as In response to determining that the victim cache line is associated with the dataset identifier, the delayed CMO is performed on the victim cache line before its eviction.

13. The method of claim 12, wherein the delayed CMO of the victim cache line is performed before the victim cache line is overwritten by new data.

14. The method of claim 12, further comprising: In response to determining that the victim cache line is not associated with a dataset identifier that has received a trigger command, the delayed CMO hint and the dataset identifier are propagated to downstream storage components during the write-back of the victim cache line.

15. The method of claim 14, further comprising: The propagated delayed CMO hints and propagated dataset identifiers are stored in the metadata section of the downstream storage component within the corresponding data entry.

16. The method of claim 1, further comprising: Detect a race condition in which a new allocation request associated with a dataset identifier is received to allocate a new cache line while a delayed CMO is being performed on one or more cache lines; as well as In response to the detection of a race condition, a protection mechanism is applied to the new cache line.

17. The method of claim 16, wherein the application protection mechanism includes forcing a memory allocation request for a new cache line to become a non-allocation operation, such that new data associated with the non-allocation operation is not written to the cache memory.

18. The method of claim 16, wherein the application protection mechanism includes at least one of: forcing new cache lines to be non-discardable, or forcing new cache lines to be non-cacheable.

19. The method of claim 18, wherein the application protection mechanism includes ignoring the performance of delayed CMO on one or more cache lines.

20. The method of claim 18, wherein the application protection mechanism includes converting a delayed CMO hint for one or more cached lines into an indication to refresh only the state, which requires a write-back operation.

21. The method of claim 1, wherein a dataset identifier is assigned to the dataset by an operating system or device driver in response to an application request to assign the dataset.

22. A cache management system, comprising: A cache store includes multiple cache lines, wherein the metadata portion of at least one cache line is configured to store a Delayed Cache Maintenance Operation (CMO) hint and a dataset identifier to associate at least one cache line with a dataset; as well as A controller coupled to the cache memory; The controller is configured to associate a dataset with a delayed CMO mode, the controller is configured to receive a trigger command associated with a dataset identifier, and the controller is configured to perform delayed CMO on one or more cache lines associated with the dataset identifier based on the trigger command.