Cache updating method and system
By identifying and synchronizing affected cached data through a dependency graph, the problems of cached data consistency and resource waste in distributed systems are solved, and efficient cache updates are achieved.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- LENOVO (BEIJING) LTD
- Filing Date
- 2026-03-24
- Publication Date
- 2026-06-23
AI Technical Summary
In modern distributed systems, there are complex dependencies between cached data. When the underlying data changes, the relevant technologies struggle to accurately identify and invalidate all affected caches, leading to data consistency issues or wasted cache resources.
The dependency graph determines the dependencies between cached data. In response to invalidation commands, it identifies and synchronizes all affected cached data to the distributed cache, avoiding omissions or excessive invalidation.
It achieves improved consistency of cached data and resource utilization efficiency, avoids data inconsistency issues, and reduces cache resource waste.
Smart Images

Figure CN122268942A_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of distributed technology, and in particular to a cache update method and system. Background Technology
[0002] In modern distributed systems, there are complex dependencies between cached data. When the underlying data changes, the relevant technologies struggle to accurately identify and invalidate all affected caches, leading to data consistency issues or wasted cache resources. Summary of the Invention
[0003] In view of this, this application provides a cache update method and system.
[0004] According to a first aspect of this application, a cache update method is provided, comprising: in response to an invalidation instruction, determining a first node in a dependency graph corresponding to the first cached data indicated by the invalidation instruction; the dependency graph is a data structure of data dependencies, including multiple nodes connected by edges; the edges represent the dependencies between nodes, and the nodes represent cached data; determining a target node that has a dependency relationship with the first node; determining the first cached data corresponding to the first node and the target cached data corresponding to the target node as invalidated; and synchronizing the first cached data and the target cached data as invalidated to a distributed cache.
[0005] A second aspect of this application provides a cache update system, comprising: a node determination module, configured to, in response to an invalidation instruction, determine a first node in a dependency graph corresponding to the first cached data indicated by the invalidation instruction; the dependency graph is a data structure of data dependencies, including multiple nodes connected by edges; edges represent dependencies between nodes, and nodes represent cached data; a relationship determination module, configured to determine a target node that has a dependency relationship with the first node; an invalidation handling module, configured to determine the first cached data corresponding to the first node and the target cached data corresponding to the target node as invalid; and a state synchronization module, configured to synchronize the first cached data and the target cached data determined as invalid to a distributed cache.
[0006] It should be understood that the description in this section is not intended to identify key or essential features of the embodiments of this application, nor is it intended to limit the scope of this application. Other features of this application will become readily apparent from the following description. Attached Figure Description
[0007] The above and other objects, features and advantages of this application will become clearer from the following description of embodiments with reference to the accompanying drawings, in which:
[0008] Figure 1The illustration shows an application scenario of a cache update method and system provided in the embodiments of this application;
[0009] Figure 2 A flowchart illustrating a cache update method provided in an embodiment of this application is shown;
[0010] Figure 3 A schematic diagram illustrating the architecture of a cache update system provided in an embodiment of this application is shown.
[0011] Figure 4 A schematic diagram illustrating the module structure of a cache update system provided in an embodiment of this application is shown.
[0012] Figure 5 This is a block diagram of an electronic device provided in an embodiment of this application. Detailed Implementation
[0013] The embodiments of this application will now be described with reference to the accompanying drawings. However, it should be understood that these descriptions are exemplary only and are not intended to limit the scope of this application. In the following detailed description, numerous specific details are set forth to provide a thorough understanding of the embodiments of this application for ease of explanation. However, it will be apparent that one or more embodiments may be implemented without these specific details. Furthermore, descriptions of well-known structures and technologies are omitted in the following description to avoid unnecessarily obscuring the concepts of this application.
[0014] This application provides a cache update method and system. The cache update method includes:
[0015] In response to the invalidation command, determine the first node in the dependency graph corresponding to the first cached data indicated by the invalidation command; the dependency graph is a data structure of data dependencies, including multiple nodes connected by edges; edges represent the dependencies between nodes, and nodes represent cached data; determine the target node that has a dependency relationship with the first node; determine the first cached data corresponding to the first node and the target cached data corresponding to the target node as invalid; synchronize the first cached data and the target cached data as invalid to the distributed cache.
[0016] By adopting the technical solution of this application, in response to an invalidation instruction, the first node corresponding to the first cached data indicated by the invalidation instruction is first determined in the dependency graph. Since the dependency graph represents cached data through nodes and the dependencies between nodes through edges, the target node that has a dependency relationship with the first node can be determined based on the edges in the graph structure. Then, the first cached data corresponding to the first node and the target cached data corresponding to the target node are uniformly determined to be in an invalidated state and synchronized to the distributed cache. The above method does not require judging or traversing all cached data one by one in the business code. Instead, it directly traces to all affected target nodes through the edges in the dependency graph, thereby accurately identifying all cached data that needs to be invalidated, avoiding omissions or excessive invalidation. This ensures data consistency, reduces the waste of cache resources, and solves the problem in related technologies of difficulty in accurately identifying and invalidating all affected caches.
[0017] The cache update method provided in this application embodiment can be applied to, for example, Figure 1 The application scenario 100 is shown. For example... Figure 1 As shown, the data source 102 connects to the Internet through the first network node set 110, and the data destination 104 connects to the Internet through the third network node set 130. The first network node set 110 and the third network node set 130 are connected via the second network node set 120. Specifically, the first network node set 110, the second network node set 120, and the third network node set 130 may include cache management nodes and cache storage nodes, each of which may have at least two nodes. Figure 1 The first network node set 110 shown includes a first network node 110a and a second network node 110b; the second network node set 120 includes a third network node 120a, a fourth network node 120b, a fifth network node 120c, and a sixth network node 120d; and the third network node set 130 includes a seventh network node 130a and an eighth network node 130b. The cache update method provided in this application embodiment can be executed at the data source 102, the network node set (e.g., the first network node set 110, the second network node set 120, or the third network node set 130), or the data destination 104.
[0018] In one embodiment, the data source 102 generates an invalidation instruction, which indicates that the first cached data needs to be invalidated. The first network node 110a in the first network node set 110 maintains a dependency graph, which is a data structure representing data dependencies. This graph includes multiple nodes connected by edges, where edges represent dependencies between nodes, and nodes represent cached data. In response to the invalidation instruction, the first network node 110a determines the first node corresponding to the first cached data indicated by the invalidation instruction in the dependency graph, identifies the target node that has a dependency relationship with the first node, and determines the first cached data corresponding to the first node and the target cached data corresponding to the target node as invalid. The invalidated first cached data and target cached data are then synchronized to the seventh network node 130a and the eighth network node 130b in the third network node set 130, which serve as distributed caches.
[0019] In one embodiment, the data source 102 detects a change in the underlying data and sends an invalidation command to the third network node 120a in the second network node set 120. After receiving the invalidation command, the third network node 120a executes the cache update method provided in this embodiment, determines the first node corresponding to the first cached data in its maintained dependency graph, determines the target node that has a dependency relationship with the first node based on the edges in the dependency graph, determines the first cached data and the target cached data as invalid, and synchronizes the invalidation status to the third network node set 130 through the fourth network node 120b, the fifth network node 120c, and the sixth network node 120d.
[0020] For example, the data source 102 and the data destination 104 can be terminals or servers. Terminals can include, but are not limited to, various personal computers, laptops, smartphones, tablets, IoT devices, and portable wearable devices. IoT devices can include smart speakers, smart TVs, smart air conditioners, smart in-vehicle devices, etc. Portable wearable devices can include smartwatches, smart bracelets, head-mounted devices, etc. Servers can be servers providing various services, such as a backend management server supporting websites browsed by users using the terminal (this is just an example). The backend management server can analyze and process received user requests and other data, and feed back the processing results (such as web pages, information, or data obtained or generated according to user requests) to the terminal device. Servers can be independent physical servers, server clusters or distributed systems composed of multiple physical servers, or cloud servers providing cloud computing services such as cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, content delivery networks, and big data. Servers can also be backend servers for cache management applications, providing backend services to clients of cache management applications.
[0021] The following will be based on Figure 1 The following describes the cache update method of this application embodiment in detail, based on the described scenario.
[0022] Figure 2 A flowchart illustrating a cache update method provided in an embodiment of this application is shown.
[0023] like Figure 2 As shown, the cache update method may specifically include the following operations.
[0024] Operation S210, in response to the failure instruction, determines the first node in the dependency graph corresponding to the first cached data indicated by the failure instruction; the dependency graph is a data structure of data dependencies, including multiple nodes, which are connected by edges; the edges represent the dependencies between nodes, and the nodes represent cached data.
[0025] Operation S220 determines the target node that has a dependency relationship with the first node.
[0026] Operation S230 determines the first cached data corresponding to the first node and the target cached data corresponding to the target node as invalid.
[0027] Operation S240 determines the first cached data and the target cached data as invalid and synchronizes them to the distributed cache.
[0028] In operation S210, the invalidation instruction refers to the indication information used to indicate that specific cached data needs to be marked as invalid. It can be understood as an input signal that triggers the cache invalidation process and is used to start the invalidation process for the corresponding cached data and its dependent data.
[0029] For example, invalidation instructions may include, but are not limited to, messages containing invalidated cache data identifiers, requests carrying invalidated cache data key names, commands specifying the location of invalidated cache data, etc.
[0030] In some embodiments, an invalidation instruction may indicate that the first cached data is in an invalidated state; the first cached data refers to the initial invalidation object targeted by the invalidation instruction, which can be understood as the starting point data that triggers the cascading invalidation process.
[0031] For example, the first cached data may include, but is not limited to, business cached data such as user comment count data, article content data, product inventory data, and order status data.
[0032] In this embodiment of the application, the first cached data corresponds to the first node in the dependency graph. The dependency graph refers to a graph structure used to represent the dependency relationship between cached data. It can be understood as a topological structure that reflects the aggregation relationship of cached data and is used to record and query the dependency links between cached data.
[0033] Specifically, the dependency graph includes multiple nodes and edges connecting the nodes. Nodes represent cached data, with each node corresponding to a single copy of the cached data. Nodes can be identified by the identifier of the cached data. Edges represent the dependencies between nodes, i.e., the relationship where one piece of cached data references another piece of cached data during its generation. When a piece of cached data is generated by aggregating multiple other pieces of cached data, the node corresponding to that cached data will establish an edge connection with the node corresponding to the referenced cached data.
[0034] In this embodiment, dependency refers to the reference or aggregation relationship between cached data. It can be understood as the relationship where the generation of one cached data depends on one or more other cached data, and is used to represent the associated path that affects the validity of other cached data when the cached data is updated. Dependencies can include different types and directions.
[0035] From the perspective of the number of dependencies, dependencies can be divided into one-to-one dependencies, one-to-many dependencies, and many-to-many dependencies.
[0036] A one-to-one dependency means that cached data corresponding to a parent node depends on cached data corresponding to a child node. For example, cached data on the article details page (node A) depends on cached data for the article's basic information (node B). In this case, there is a one-to-one dependency between node A and node B, and node A is connected to node B through an edge.
[0037] A one-to-many dependency relationship refers to a parent node's cached data depending on the cached data of multiple child nodes. For example, a user's profile cached data (node C) depends on the user's basic information cached data (node D), the user's latest article title cached data (node E), and the article comment count cached data (node F). In this case, node C is connected to nodes D, E, and F through edges, forming a one-to-many dependency relationship.
[0038] A many-to-many dependency relationship refers to a situation where cached data corresponding to multiple parent nodes jointly depends on cached data corresponding to multiple child nodes. For example, cached data on the user's homepage (node G) and cached data on the article list page (node H) both depend on cached data on user information (node I) and cached data on article information (node J). In this case, nodes G and H are connected to nodes I and J respectively through edges, forming a many-to-many dependency relationship.
[0039] From the perspective of dependency direction, dependencies can include unidirectional dependencies and bidirectional dependencies. A unidirectional dependency refers to recording only the dependency direction from the parent node to the child node, or only the dependency direction from the child node to the parent node. For example, in a positive dependency, user homepage cache data (node K) points to user information cache data (node L), indicating that node K depends on node L, and the direction of the edge is from node K to node L.
[0040] A bidirectional dependency refers to simultaneously recording both the forward dependency from a parent node to a child node and the reverse dependency from a child node to a parent node. For example, for user homepage cache data (node M) and user information cache data (node N), not only are the forward edges from node M to node N recorded (indicating that node M depends on node N), but also the reverse edges from node N to node M are recorded (indicating that node N is depended on by node M), thus achieving the recording of a bidirectional dependency.
[0041] In one feasible implementation, the invalidation instruction can be parsed to extract the identification information of the first cached data, and then a node matching the identification information can be found in the node set of the dependency graph. The found node is then determined as the first node.
[0042] In another feasible implementation, the storage structure of the dependency graph can be directly accessed through the key name or index value carried in the failure instruction to obtain the corresponding node information, thereby determining the first node.
[0043] It should be noted that edges in a dependency graph can be directed, representing the directionality of dependencies, such as a positive dependency (parent node pointing to child node) or a negative dependency (child node pointing to parent node). In practical applications, you can choose to record unidirectional or bidirectional dependencies based on your query requirements.
[0044] Optionally, the dependency graph can be stored in a key-value database or a graph database, using the identifier of the cached data as the key to index the corresponding node and its associated edge information, so as to quickly locate the first node and perform subsequent dependency traversal.
[0045] In operation S220, the target node refers to the node in the dependency graph that has a direct or indirect dependency relationship with the first node. It can be understood as the associated node that needs to be synchronized to fail due to the failure of the cached data corresponding to the first node. It is used to determine the scope and objects of cascading failure.
[0046] For example, when the first node is the node corresponding to the article comment count cache data, the target nodes may include the node corresponding to the article details page cache data that depends on the comment count data, the node corresponding to the user's personal homepage cache data, etc.
[0047] For example, when the first node is the node corresponding to the user's basic information cache data, the target node may include multiple nodes corresponding to the upper-level aggregated cache data that depend on the user's basic information, such as the node corresponding to the user's personal homepage cache data, the node corresponding to the user's dynamic list cache data, and the node corresponding to the friend recommendation page cache data.
[0048] In one feasible implementation, one can start from the first node, traverse along the edges of the dependency graph, obtain the neighboring nodes directly connected to the first node, and determine these neighboring nodes as the target node.
[0049] In another feasible implementation, the dependency information of the first node recorded in the dependency graph can be queried, and the set of all nodes that depend on or are depended on by the first node can be obtained based on the dependency information. The nodes in the set of nodes can then be identified as the target nodes.
[0050] Optionally, the target node can be determined in a single step or iteratively. In the iterative determination case, the first batch of target nodes that have a direct dependency relationship with the first node can be determined first. Then, starting from the first batch of target nodes, the search continues to find the second batch of target nodes that have a dependency relationship, and so on, until all related nodes have been traversed.
[0051] Optionally, during the process of determining the target node, visited nodes can be marked to avoid repeatedly visiting the same node during traversal, thereby improving traversal efficiency and preventing traversal dead loops when there are complex dependencies.
[0052] In operation S230, the invalidation state refers to the state in which cached data is no longer valid and needs to be cleared or updated. It can be understood as a status marker that indicates that the cached data has expired or the content is inaccurate.
[0053] In one feasible implementation, the identifiers of the first node and the target node can be added to the set of failed nodes. The set of failed nodes is used to record all nodes that need to perform failure operations. Then, based on the set of failed nodes, the corresponding first cached data and target cached data are determined to be in a failed state.
[0054] In another feasible implementation, a failure flag can be set for the first node and the target node respectively. The failure flag is used to indicate whether the cached data of the corresponding node is in a failed state. By setting the value of the failure flag to a preset value that indicates failure, the first cached data and the target cached data are determined to be in a failed state.
[0055] Optionally, during the process of determining the first cached data and the target cached data as invalid, an invalidation timestamp for each cached data can be recorded. This invalidation timestamp is used to identify the moment when the cached data is determined to be invalid, which facilitates subsequent auditing of invalidation operations, logging, or control of the invalidation order.
[0056] It's important to note that identifying cached data as invalid is a logical marking process. This process doesn't directly delete the actual cached data stored in the distributed cache. Instead, it identifies which cached data needs to be invalidated within the dependency center or local processing environment. The actual cached data deletion or clearing operations are performed in subsequent synchronization steps.
[0057] In S240 operation, distributed caching refers to a distributed storage system composed of multiple cache management nodes. It can be understood as a cache storage cluster deployed on different servers or in different geographical locations to store and manage cached data in business systems, providing high availability and high concurrency access capabilities.
[0058] For example, distributed caching may include, but is not limited to: a Redis cluster consisting of multiple Redis instances, a combination of local cache nodes and remote cache nodes in a multi-level caching architecture, etc.
[0059] In one feasible implementation, a synchronization message containing a first cached data identifier and a target cached data identifier can be generated and sent to each cache management node in the distributed cache, so that each cache management node performs an invalidation operation on the cached data it manages according to the cached data identifier in the synchronization message.
[0060] In another feasible implementation, the first cached data identifier and the target cached data identifier can be written into a shared message queue or event bus. Each cache management node in the distributed cache subscribes to the message queue or event bus, obtains the cached data identifier that needs to be invalidated from it, and performs invalidation operation on the corresponding cached data managed by each node.
[0061] It should be noted that the process of synchronizing to the distributed cache includes two stages: transmitting invalidation status information to the distributed cache system and actually executing the invalidation operation in the distributed cache. Invalidation operations can include deleting the corresponding cached data, marking the cached data as expired, or clearing the contents of the cached data.
[0062] Optionally, when the distributed cache contains multiple cache levels, the synchronization failure status can be propagated level by level, first synchronized to the access layer cache node, and then the access layer cache node continues to synchronize the failure status to the lower layer cache node, thereby realizing the cascading failure synchronization of multi-level caches.
[0063] By employing the above embodiments, in response to an invalidation command, the first node corresponding to the first cached data indicated by the invalidation command in the dependency graph is first determined. Then, based on the dependency graph, the target node that has a dependency relationship with the first node is determined. Subsequently, the first cached data corresponding to the first node and the target cached data corresponding to the target node are determined to be in an invalidated state. Finally, the invalidation state is synchronized to the distributed cache, realizing cascading invalidation processing based on dependency relationships. This approach can accurately identify all associated cached data that need to be synchronized and invalidated due to the invalidation of the first cached data, avoiding data inconsistency problems caused by some dependent caches not being invalidated in time, and improving data consistency in the distributed cache system.
[0064] Meanwhile, by using structured storage and querying of dependency graphs, the target cached data that needs to be invalidated can be quickly located without having to traverse all cached data in the distributed cache one by one, significantly improving the efficiency of cache invalidation handling. Furthermore, this scheme centralizes the handling of dependency maintenance and invalidation propagation logic, reducing the processing complexity of each cache management node, facilitating unified cache invalidation management in a distributed environment, and enhancing the system's maintainability and scalability.
[0065] Based on the above embodiments, as an optional embodiment, an invalidation instruction is used to indicate that at least one of the following cached data is determined to be in an invalid state.
[0066] The first cached data is based on the data update operation.
[0067] Based on the first cached data that has expired in the distributed cache.
[0068] The first cached data is based on the database change record.
[0069] The first cached data is based on nodes in the dependency graph whose effective duration is less than or equal to a preset threshold.
[0070] In this embodiment of the application, the failure instruction can be generated by a variety of different scenarios, and different triggering scenarios correspond to different failure reasons and processing logic.
[0071] For the first cached data corresponding to a data update operation, when the business service executes the data update operation, it proactively generates an invalidation command to trigger the invalidation of the relevant cached data. Specifically, data update operations include adding, modifying, and deleting data records in the database. After the business service completes the data update operation, it can determine the cached data identifier involved in the data update operation and generate an invalidation command containing that cached data identifier.
[0072] In one feasible implementation, when performing data update operations, the business service can simultaneously call the cache update interface to generate an invalidation command while updating database records. For example, when a user modifies their basic personal information, after updating the record in the user table of the database, the business service constructs a corresponding cache data identifier based on the user identifier, generates an invalidation command, and submits it to the cache update service.
[0073] For the first cached data that has expired in the distributed cache, when the cached data in the distributed cache reaches the preset expiration time, an invalidation command is automatically generated by listening for cache expiration events. Specifically, the expiration time of the cached data can be configured in the distributed cache database, and when the cached data expires, the distributed cache database will trigger an expiration event.
[0074] In one feasible implementation, Redis's keyspace notification feature can be used to monitor cached data expiration events. Enable keyspace notification in the Redis configuration file and configure the notification type as an expiration event. In the cache update service, subscribe to the Redis expiration event channel. When an expiration event notification is received, extract the expired cached data identifier from the notification message and generate an invalidation command containing that cached data identifier.
[0075] For the first cached data based on database change records, invalidation instructions are automatically generated by capturing the database change records to prevent business services from overlooking cache invalidation when performing data update operations. Specifically, database change records include the database's binary log, transaction log, change data capture records, etc., which record the change operations and content of data tables in the database.
[0076] In one feasible implementation, MySQL logs can be used to capture database changes. A listening service is deployed to subscribe to the MySQL log stream, parse change events, and extract information such as the changed data table, change type, and changed primary key value. Based on the change information and pre-configured cache data identifier generation rules, a corresponding cache data identifier is constructed, and an invalidation command is generated.
[0077] For the first cached data corresponding to nodes whose effective duration in the dependency graph is less than or equal to a preset threshold, the effective duration of each node in the dependency graph is scanned periodically to identify nodes with excessively short effective durations. Invalidation instructions are then proactively generated to clean up these underutilized cached data. Specifically, a scheduled task or background thread can be used to traverse all nodes in the dependency graph, calculate the effective duration of the cached data corresponding to each node, and determine whether the effective duration is less than or equal to the preset threshold.
[0078] In one feasible implementation, a timestamp field can be maintained for each node in the dependency graph, recording the generation time and most recent access time of the cached data corresponding to that node. A scheduled task iterates through all nodes in the dependency graph at regular time intervals, reads the timestamp field of each node, and calculates the difference between the current time and the generation time as the valid duration. If the valid duration is less than or equal to a preset threshold, the cached data identifier corresponding to that node is obtained, and an invalidation instruction is generated.
[0079] It should be noted that the four triggering scenarios mentioned above can be used independently or in combination. In practical applications, one or more triggering scenarios can be selected and enabled based on business needs and cache management strategies. The invalidation instructions generated by different triggering scenarios will execute the same cache invalidation handling logic in subsequent processing flows, including finding the corresponding node in the dependency graph, determining the target node, and performing the invalidation operation.
[0080] Optionally, different priorities can be set for invalidation instructions generated in different triggering scenarios. For example, invalidation instructions triggered by data update operations have the highest priority and need to be processed immediately. By setting priorities, processing resources can be allocated reasonably to ensure that important cache invalidation operations can be executed in a timely manner.
[0081] By adopting the above embodiments, the generation of invalidation commands can be supported for various scenarios, such as data update operations, cache expiration, database change records, and valid duration scanning, thus realizing diversified triggering of cache invalidation processing. Different triggering scenarios can cover various cache invalidation requirements, such as proactive business triggering, automatic expiration management, passive change capture, and proactive garbage collection, improving the comprehensiveness and flexibility of cache update processing and ensuring the timeliness and consistency of data in distributed cache.
[0082] Based on the above embodiments, as an optional embodiment, the dependency relationship includes a positive dependency relationship; a positive dependency relationship represents the dependency relationship where a parent node points to a child node.
[0083] Based on this, the above operation S220 may further include the following steps.
[0084] Operation S310 determines the child nodes of the first node based on the positive dependency relationship.
[0085] Operation S320 identifies the child nodes of the first node as the target node.
[0086] In the embodiments of this application, the positive dependency relationship refers to the dependency relationship represented by the directed edge from the parent node to the child node. It can be understood as recording the reference direction of which basic cache data the aggregated cache data depends on, and is used to indicate that the cache data corresponding to the parent node references the cache data corresponding to the child node during the generation process.
[0087] Specifically, in the dependency graph, the cached data corresponding to the parent node is generated by aggregating the cached data corresponding to one or more child nodes. For example, the cached data of a user's personal homepage (parent node P) is generated by aggregating the cached data of the user's basic information (child node C1), the cached data of the user's latest article title (child node C2), and the cached data of the number of article comments (child node C3). In this case, a positive dependency relationship is established in the dependency graph from the parent node P to the child nodes C1, C2, and C3, respectively.
[0088] In operation S310, a child node refers to a downstream node that is directly connected to the first node in the dependency graph through a positive dependency relationship. It can be understood as the node corresponding to other cached data referenced by the cached data corresponding to the first node when it is generated.
[0089] In one feasible implementation, the adjacency list or edge set of the first node in the dependency graph can be queried to obtain all nodes pointed to by the positive edges originating from the first node, and these nodes can be identified as child nodes of the first node.
[0090] For example, suppose the first node is node P corresponding to the user's personal homepage cached data. The dependency graph stores the positive dependency relationship edge:P→{C1, C2, C3}, where C1 corresponds to the user's basic information cached data, C2 corresponds to the user's latest article title cached data, and C3 corresponds to the article comment count cached data. When determining the child nodes of the first node, the set of child nodes {C1, C2, C3} can be obtained by querying edge:P, thereby determining that nodes C1, C2, and C3 are child nodes of the first node P.
[0091] In operation S320, when the cached data corresponding to the first node becomes invalid, since the cached data corresponding to the child nodes is a component of the cached data corresponding to the first node, the failure of the first node means that the data generated based on the aggregation of that node is no longer accurate. Therefore, it is necessary to determine the cached data corresponding to the child nodes of the first node as invalid as well.
[0092] In one feasible implementation, all child nodes of the first node obtained in operation S310 can be directly added to the target node set, which is used to record the nodes that need to perform failure operations.
[0093] In another feasible implementation, each child node of the first node can be traversed, and the identifier of each child node can be recorded in the list of failed nodes. The node in the list of failed nodes can then be identified as the target node.
[0094] It's important to note that in a forward dependency traversal scenario, the first node acts as the parent node. Its failure directly invalidates the cached data for all its child nodes. Therefore, there's no need for additional failure condition checks on the child nodes; all child nodes can be directly identified as the target node. This approach is based on the propagation characteristics of aggregated caching, meaning that the failure of a parent node inevitably leads to the need for its dependent child node data to be retrieved or updated again.
[0095] Optionally, when storing positive dependencies in a dependency graph, key-value pairs can be used, with the parent node's identifier as the key and the set of identifiers of all child nodes of that parent node as the value. For example, in a Redis database, positive dependencies can be stored using a data type with the key "edge: parent node identifier" and the set members being the identifiers of each child node, thereby enabling fast querying and updating of positive dependencies.
[0096] Optionally, after determining the child nodes of the first node, the child nodes can be recursively searched for, starting from the child nodes, to achieve multi-level positive dependency traversal and ensure that all downstream nodes in the entire dependency chain are marked as in a failed state.
[0097] By employing the above embodiments, the child nodes of the first node are determined based on the positive dependency relationship and identified as the target node, thus realizing dependency traversal and failure propagation from the parent node to the child node. This method can accurately identify all the basic cache data constituting the aggregated cache data when the aggregated cache data fails, avoiding data inconsistency problems caused by some dependent caches failing to expire synchronously.
[0098] Based on the above embodiments, as an optional embodiment, the dependency relationship includes a reverse dependency relationship, which represents the dependency relationship between a child node and its parent node.
[0099] Based on this, the above operation S220 may further include the following steps.
[0100] Operation S410 determines the parent node of the first node based on the reverse dependency relationship.
[0101] In operation S420, if the count value of the parent node of the first node is 0, then the parent node of the first node is determined as the target node; the count value represents the number of dependencies between the child node and the parent node.
[0102] In this embodiment, the reverse dependency relationship refers to the dependency relationship represented by the directed edge from the child node to the parent node. It can be understood as the reverse reference direction that records which aggregate cache data references the basic cache data, and is used to indicate that the cache data corresponding to the child node is dependent on the cache data corresponding to the parent node.
[0103] It should be noted that reverse dependency and forward dependency are two different storage perspectives of the same dependency. When constructing a dependency graph, forward and reverse dependencies can be established simultaneously to support dependency traversal in different directions.
[0104] Optionally, in a dependency graph, when establishing a forward dependency from a parent node to a child node, a corresponding reverse dependency can be established simultaneously, i.e., a directed edge from a child node to a parent node. For example, if user profile cache data (parent node P) is generated by aggregating user basic information cache data (child node C1), then in the dependency graph, not only is a forward dependency from parent node P to child node C1 established, but also a reverse dependency from child node C1 to parent node P, indicating that child node C1 is referenced by parent node P.
[0105] In operation S410, the parent node refers to the upstream node that is directly connected to the first node in the dependency graph through a reverse dependency relationship. It can be understood as the node corresponding to other aggregated cache data that references the cache data corresponding to the first node.
[0106] In one feasible implementation, the reverse adjacency list or reverse edge set of the first node in the dependency graph can be queried to obtain the starting nodes of all reverse edges pointing to the first node, and these nodes can be determined as the parent nodes of the first node.
[0107] For example, suppose the first node is node C1 corresponding to the user's basic information cache data. The dependency graph stores the reverse dependency relationship redge:C1→{P1, P2}, where P1 corresponds to the user's personal homepage cache data and P2 corresponds to the user's summary information cache data. When determining the parent node of the first node, the set of parent nodes {P1, P2} can be obtained by querying redge:C1, thereby determining that nodes P1 and P2 are the parent nodes of the first node C1.
[0108] In operation S420, the count value is used to record the number of child nodes that are not currently invalidated by the parent node, representing how many of the child nodes that the parent node depends on are still in a valid state. When the count value of the parent node is 0, it means that all child nodes that the parent node depends on have become invalid, and at this time the aggregate cache data corresponding to the parent node should also become invalid.
[0109] Specifically, the count of each parent node in its initial state is equal to the total number of its child nodes. When the first node fails, for each parent node of the first node, the count of that parent node needs to be decremented by 1, indicating that the parent node has lost a valid child node. If the count of a parent node decreases to 0, it means that all of its child nodes have failed, and then that parent node is identified as the target node, and its corresponding cached data is also invalidated.
[0110] It's important to note that in scenarios involving reverse dependencies, the failure of a child node does not directly cause the parent node to fail. This is because the aggregated cache data corresponding to the parent node may be generated by aggregating multiple child nodes. The parent node only needs to fail when all the child nodes it depends on have failed. By using a counter value, we can accurately identify whether the parent node meets the failure condition, avoiding unnecessary cache invalidation operations.
[0111] In one feasible implementation, a count field can be maintained for each parent node in the dependency graph. During cache invalidation processing, whenever a child node fails, all parent nodes of that child node are retrieved, and an atomic decrement operation is performed on the count value of each parent node. The decremented count value is then checked to see if it is 0. If the decremented count value is 0, the parent node is added to the target node set.
[0112] For example, suppose the aggregated cache data corresponding to parent node P1 is generated by aggregating the cache data corresponding to child nodes C1, C2, and C3, and the initial count value of parent node P1 is 3. When child node C1 fails, the count value of parent node P1 is decremented by 1, and the count value is now 2, indicating that parent node P1 still has 2 valid child nodes, so parent node P1 is not identified as the target node. When child node C2 also fails, the count value of parent node P1 is decremented by 1 again, and the count value is now 1, so parent node P1 is still not identified as the target node. When child node C3 also fails, the count value of parent node P1 is decremented by 1 again, and the count value is now 0, indicating that all child nodes of parent node P1 have failed, so parent node P1 is identified as the target node, and its corresponding cache data is invalidated.
[0113] Optionally, when storing reverse dependencies in a dependency graph, key-value pairs can be used, with the identifier of the child node as the key and the set of identifiers of all the parent nodes of that child node as the value.
[0114] Optionally, to support the maintenance of parent node count values, a count value record can be created for each parent node during dependency graph initialization, with the initial value of the count value set to the total number of child nodes of that parent node.
[0115] Optionally, after determining the parent node of the first node, the search can continue recursively, starting from the parent node, to find the parent node's parent node, thereby achieving multi-level reverse dependency traversal. During the recursive traversal, for each level of parent node, it is necessary to check whether its count value is 0. Only parent nodes with a count value of 0 will be determined as target nodes and the traversal will continue upwards, ensuring that failure propagation in the entire dependency chain conforms to the data dependency characteristics of aggregated caching.
[0116] By employing the above embodiments, the parent node of the first node is determined based on the reverse dependency relationship, and whether the parent node should be identified as the target node is determined by checking if its count value is 0. This achieves dependency traversal from child nodes to parent nodes and conditional failure propagation. This method can accurately determine whether aggregate cache data dependent on the basic cache data needs to be invalidated when the basic cache data fails, avoiding the problem of decreased cache utilization caused by the erroneous failure of the parent node due to the failure of some child nodes.
[0117] Based on the above embodiments, as an optional embodiment, the above cache update method may further include the following operations.
[0118] Operation S510 responds to the target node's count value being 0 by determining whether the effective duration of the cached data corresponding to the target node is less than or equal to a preset threshold.
[0119] In operation S520, if the effective duration of the cached data corresponding to the target node is less than or equal to a preset threshold, the cached data corresponding to the target node is marked as invalid and the target node is deleted.
[0120] In S510 operation, the effective duration refers to the time from the moment the cached data corresponding to the target node is generated to the current moment. The corresponding preset threshold can be set to a reasonable value based on the business scenario and the statistical results of cache utilization.
[0121] When the count value of the target node is 0, it indicates that the parent node corresponding to the target node has met the failure condition. Before determining the target node as the node that needs to be failed, the validity period of the cached data corresponding to the target node is first determined to identify whether the cached data belongs to short-term cached data that expires soon after its generation.
[0122] Specifically, a generation timestamp can be recorded when cached data is generated, and this timestamp can be associated with and stored in the cached data. When determining the valid duration, the current timestamp is obtained, and the difference between the current timestamp and the generation timestamp is calculated; this difference is the valid duration. Then, the valid duration is compared with a preset threshold to determine whether the valid duration is less than or equal to the preset threshold.
[0123] In operation S520, if the validity period of cached data corresponding to a target node is less than or equal to a preset threshold, it indicates that the cached data expires soon after its generation and is considered low-utilization cached data. For this type of cached data, in addition to marking it as invalid, the corresponding target node also needs to be removed from the dependency graph to prevent the node from continuing to occupy storage resources and participate in subsequent dependency traversal.
[0124] Specifically, marking the cached data corresponding to the target node as invalid can be achieved by setting an invalidation flag for the cached data or deleting the corresponding cache record in the cache database. Deleting the target node can be achieved by removing the node and its related dependency edges from the dependency graph, including deleting the node's reverse dependency records, forward dependency records, and count value records.
[0125] It should be noted that deleting a target node involves not only deleting the node itself, but also updating the dependencies between other nodes and the target node. Specifically, for all child nodes of the target node, the target node's identifier needs to be removed from its reverse dependency set; for all parent nodes of the target node, the target node's identifier needs to be removed from its forward dependency set, and the parent node's count value needs to be updated as needed.
[0126] By employing the above embodiments, when the count value of the target node is 0, short-term cached data that expires quickly after generation is identified by determining whether the effective duration of the cached data corresponding to the target node is less than or equal to a preset threshold. This cached data is then marked as invalid, and the corresponding target node is removed from the dependency graph. This method effectively reclaims storage and computing resources occupied by underutilized cached data, avoids the accumulation of a large number of invalid nodes in the dependency graph, reduces the maintenance cost and query overhead of the dependency graph, and improves the efficiency of cache update processing and the utilization rate of storage space.
[0127] Based on the above embodiments, as an optional embodiment, the above cache update method may further include the following operations.
[0128] Operation S610 determines the parent node and / or child node of the target node based on the dependency relationship of the target node.
[0129] Operation S620 determines whether the parent node and / or child node of the target node meet the failure conditions.
[0130] If the failure condition is met, the parent node and / or child node of the target node will be determined as the new target node.
[0131] In operation S610, the query can be performed on forward dependencies, reverse dependencies, or both, depending on the direction of failure propagation. Specifically, if the failure needs to propagate downwards (from aggregate cache data to base cache data), the forward dependencies of the target node are queried to determine its child nodes; if the failure needs to propagate upwards (from base cache data to aggregate cache data), the reverse dependencies of the target node are queried to determine its parent node; if the failure needs to propagate both upwards and downwards simultaneously, the forward and reverse dependencies of the target node are queried simultaneously to determine its child and parent nodes, respectively.
[0132] In one feasible implementation, the positive dependency record corresponding to the target node in the dependency graph can be queried to obtain the set of identifiers of all child nodes pointed to by the target node.
[0133] In another feasible implementation, the reverse dependency record corresponding to the target node in the dependency graph can be queried to obtain the set of identifiers of all parent nodes pointing to the target node. In operation S620, the failure condition is used to determine whether the parent node and / or child node of the target node needs to fail along with the target node. The judgment logic of the failure condition differs for parent nodes and child nodes.
[0134] For child nodes, since the basic cache data corresponding to the child node is referenced by the aggregate cache data corresponding to the target node, when the target node fails, if a downward propagation failure strategy is adopted, the child node can be directly determined to meet the failure conditions.
[0135] In other scenarios, the failure condition can also be determined based on the attributes of the child node itself. For example, it can be determined whether the child node is only referenced by the current target node. If the set of reverse dependencies of the child node only contains the current target node, then the child node is determined to meet the failure condition; if the child node is also referenced by other nodes, then the child node is determined not to meet the failure condition.
[0136] For the parent node, since the aggregated cache data corresponding to the parent node depends on the cache data corresponding to the target node, when the target node fails, it is necessary to determine whether all child nodes of the parent node have failed.
[0137] In one feasible implementation, the set of child nodes and the set of parent nodes of the target node can be traversed separately, and failure condition judgment can be performed on each child node and parent node.
[0138] In operation S630, the parent node and / or child node that meets the failure condition is identified as the new target node for subsequent recursive traversal and failure handling. Specifically, the new target node can be added to the processing queue, and operations S610 to S630 can be repeated until the processing queue is empty, i.e., no new node needs to be failed.
[0139] In one feasible implementation, initially, the first node can be added to the processing queue. Each time, a node is taken from the queue as the current target node. The parent and / or child nodes of this target node are queried to determine if these nodes meet the failure conditions. Nodes that meet the failure conditions are added to the processing queue, and the current target node is marked as processed. This process is repeated until the queue is empty.
[0140] It should be noted that during the recursive traversal, it is necessary to record the nodes that have already been processed to avoid repeatedly judging and processing the same node. Specifically, a set of visited nodes can be maintained. When processing each target node, first check whether the node is already in the set of visited nodes. If it already exists, skip the node; if it does not exist, add the node to the set of visited nodes and continue processing.
[0141] Optionally, a maximum traversal depth can be set for cascading failures to prevent excessive nodes from being processed when the dependency graph is complex. For example, the maximum traversal depth can be set to 5 levels. When the traversal depth exceeds 5 levels, further expansion will stop, and only nodes already added to the processing queue will be processed. By limiting the maximum traversal depth, the impact of cascading failures can be controlled, preventing a large amount of cached data from becoming invalid simultaneously due to the failure of a single node, thus avoiding a negative impact on the overall system performance.
[0142] By employing the above embodiments, the parent and / or child nodes of the target node are recursively determined based on the dependency relationship. It is then determined whether these nodes meet the failure conditions, and the nodes that meet the failure conditions are identified as new target nodes, thus achieving the cascading propagation of cache failure along the dependency chain. This approach can automatically identify and invalidate all affected cache data when a certain cached data fails, ensuring the consistency of cached data in the dependency graph, avoiding data inconsistency problems caused by missing dependent caches, and improving the completeness and accuracy of cache update processing.
[0143] Based on the above embodiments, as an optional embodiment, the above operation S240 may further include the following operations.
[0144] Operate S710 to generate a synchronization message containing a first cached data identifier and a target cached data identifier.
[0145] Operate S720 to write synchronization messages to the message queue, so that each cache management node in the distributed cache can subscribe to the synchronization messages from the message queue and perform invalidation operations on the first cache data and the target cache data under its management.
[0146] In this embodiment of the application, in the application scenario of distributed caching, cached data is typically distributed and stored across multiple cache management nodes, with each cache management node independently managing its own stored cached data. When a piece of cached data needs to be invalidated, all cache management nodes that may store that cached data need to be notified to perform an invalidation operation to ensure the consistency of data in the distributed cache.
[0147] In S710 operation, synchronization messages are used to transmit cache invalidation instructions to each cache management node. The synchronization message contains a first cache data identifier and a target cache data identifier. The first cache data identifier indicates which cache data triggered the invalidation operation, and the target cache data identifier indicates which cache data needs to be invalidated.
[0148] Specifically, the first cached data identifier is a unique identifier for the cached data corresponding to the first node, and the target cached data identifier is a unique identifier for the cached data corresponding to the target node. If there are multiple target nodes, the synchronization message can contain multiple target cached data identifiers, forming a set of target cached data identifiers.
[0149] In one feasible implementation, the synchronization message can be constructed as a structured data format. The synchronization message may include a message type field, a first cached data identifier field, a target cached data identifier list field, and a timestamp field. Specifically, the message type field identifies the message as a cache invalidation synchronization message; the first cached data identifier field stores the cached data identifier corresponding to the first node; the target cached data identifier list field stores the cached data identifiers corresponding to all target nodes; and the timestamp field records the time the message was generated.
[0150] In S720 operation, message queues are used to transmit synchronous messages between various cache management nodes, enabling the publication and subscription of cache invalidation commands. Message queues feature decoupling of senders and receivers, support for asynchronous processing, and guaranteed reliable message transmission, making them suitable for message communication scenarios in distributed systems.
[0151] Specifically, after the synchronization message is written to the message queue, each cache management node, as a message subscriber, can read the synchronization message from the message queue. Upon receiving the synchronization message, each cache management node parses the target cache data identifier in the synchronization message, determines whether it has stored the corresponding cache data, and if it has, invalidates the cache data.
[0152] Optionally, the method for executing the invalidation operation can be selected based on the type of cache storage. If the cache data is stored in local memory, the corresponding cache data object can be directly deleted from memory.
[0153] By employing the above embodiments, a synchronization message containing a first cached data identifier and a target cached data identifier is generated and written to a message queue. This allows each cache management node to subscribe to the synchronization message and perform invalidation operations, thus achieving cache invalidation synchronization in a distributed caching environment. This method decouples the sender and receiver of invalidation commands through a message queue, enabling each cache management node to asynchronously receive and process invalidation commands, avoiding the problem of slow processing by a single node affecting the overall invalidation process.
[0154] The above embodiments illustrate the cache invalidation handling process triggered by invalidation instructions. Based on the above embodiments, the following describes the process of constructing the dependency graph.
[0155] The process of constructing the dependency graph described above may also include the following operations.
[0156] Operation S810, in response to the association instruction, determines the second node in the dependency graph corresponding to the second cached data indicated by the association instruction, and the child nodes corresponding to multiple child cached data that have a dependency relationship with the second cached data.
[0157] Operation S820 establishes positive dependencies from the second node to each child node and negative dependencies from each child node to the second node in the dependency graph.
[0158] In this embodiment, the dependency graph needs to be dynamically constructed and maintained when cached data is generated or updated. When a business service generates aggregated cached data, it will be clear which underlying cached data the cached data depends on. At this time, an association instruction can be generated to trigger the establishment of the dependency relationship.
[0159] In operation S810, the association instruction is used to indicate cached data for which a dependency relationship needs to be established in the dependency graph. The association instruction contains an identifier for the second cached data and identifiers for multiple sub-cached data that the second cached data depends on. The second cached data is aggregated cached data, and the multiple sub-cached data are the underlying cached data referenced or depended upon by the second cached data.
[0160] Specifically, the business service can proactively generate association instructions when generating aggregated cache data. After reading data from multiple basic cache data and combining them to generate aggregated cache data, the business service records the identifiers of all basic cache data referenced by the aggregated cache data, constructs an association instruction containing a list of aggregated cache data identifiers and basic cache data identifiers, and sends the association instruction to the cache update service.
[0161] After receiving the association instruction, the cache update service first determines the second node corresponding to the second cached data in the dependency graph based on the identifier of the second cached data. If the second node does not yet exist in the dependency graph, it is created in the dependency graph, using the identifier of the second cached data as the node identifier. If the second node already exists in the dependency graph, a reference to that node is directly obtained.
[0162] Similarly, based on the multiple sub-cached data identifiers in the association instruction, the corresponding child node in the dependency graph for each sub-cached data is determined one by one. For each sub-cached data identifier, the dependency graph is checked to see if a corresponding node already exists. If it does not exist, a new child node is created; if it already exists, a reference to that child node is obtained.
[0163] Optionally, when creating a node, its attribute fields can be initialized simultaneously, including node identifier, node type, creation time, and count value. For a second node, its node type can be marked as an aggregate node or a parent node, and its count value can be initialized to the number of child cached data. For child nodes, their node type can be marked as a base node or a leaf node.
[0164] In operating S820, after determining the second node and each child node, it is necessary to establish dependencies between these nodes, including forward dependencies and reverse dependencies. A forward dependency points from the second node to each child node, indicating that the cached data corresponding to the second node depends on the cached data corresponding to each child node. A reverse dependency points from each child node to the second node, indicating that the cached data corresponding to each child node is referenced by the cached data corresponding to the second node.
[0165] Specifically, for the second node, a positive dependency relationship is established from the second node to each of its child nodes in the dependency graph. The node identifiers of each child node can be added to the set of positive dependencies of the second node, forming a set of all child nodes pointed to by the second node.
[0166] For each child node, establish a reverse dependency relationship from each child node to the second node in the dependency graph. The node identifier of the second node can be added to the set of reverse dependencies of each child node, forming a set of all parent nodes pointing to that child node.
[0167] Optionally, the count value of the second node can be updated while establishing dependencies. The count value represents the number of valid child nodes that the second node depends on, with an initial value of the total number of child cache data. Optionally, weights or priorities can be set for dependencies. Different child cache data may have different degrees of influence on the aggregate cache data. A weight value can be assigned to each dependency to indicate the importance of the child cache data to the aggregate cache data. When establishing dependencies, not only the child node identifier but also the weight value of the dependency is recorded. In subsequent failure detection, the weight value can be used to determine whether the second node needs to be immediately failed when a child node fails, or only when the accumulated weight reaches a certain threshold.
[0168] By employing the above embodiments, the second node and child nodes are determined in response to the association command, and forward and reverse dependencies between the second node and each child node are established in the dependency graph, realizing the dynamic construction and maintenance of dependencies between cached data. By simultaneously establishing forward and reverse dependencies, bidirectional queries on the dependency chain can be supported, allowing queries to find child nodes from parent nodes and vice versa. This provides complete dependency information for subsequent cache invalidation processing, improving the accuracy and efficiency of cache update processing.
[0169] Based on the above embodiments, as an optional embodiment, the above cache update method may further include the following operations.
[0170] Operation S910 involves traversing the edges in the dependency graph based on the second node to obtain at least one traversal path that includes the second node.
[0171] If any traversal path forms a closed loop, the associated instruction will be refused to be executed when operating S920.
[0172] In this embodiment, before establishing dependencies, loop detection is required on the dependency graph to prevent the cache invalidation process from getting stuck in an infinite loop due to circular dependencies. If establishing a dependency would create a closed loop, then the dependency is unreasonable and should be rejected.
[0173] In operation S910, a traversal path refers to a sequence of paths that start from a node in the dependency graph and visit multiple nodes sequentially along the dependency edges between the nodes. A traversal path includes each node on the path and the dependency edges between the nodes.
[0174] Specifically, starting from the second node, traverse downwards along the positive dependency edges, visiting each child node pointed to by the second node. For each child node, continue traversing downwards along the positive dependency edges, visiting the child node's next-level nodes. Continue in this manner until a leaf node with no children is reached, or the number of visited nodes reaches a preset limit.
[0175] Optionally, a maximum depth limit can be set during the traversal process to prevent excessively long dependency links from causing the traversal to take too long.
[0176] In S920 operations, a closed loop can be understood as a traversal path containing repeated nodes. That is, starting from a certain node, traversing along the dependency edges, and then returning to that node, forming a circular path. If a closed loop exists, it indicates a circular dependency relationship between cached data, and this dependency relationship should be rejected.
[0177] Specifically, during the traversal, a set of visited nodes is maintained, recording all nodes that have been visited on the current traversal path. When a new node is visited, it is first checked whether the node already exists in the set of visited nodes. If the node already exists in the set, it means that a duplicate node has appeared in the traversal path, forming a closed loop. If the node is not in the set, it is added to the set, and the traversal continues.
[0178] Optionally, when recursively entering a node, that node is marked as being visited. When recursively returning, that node is marked as having been visited. If a node marked as being visited is visited during the traversal, it indicates that a closed loop has been formed.
[0179] If any traversal path is detected to form a closed loop, the association instruction is refused to be executed, and the dependency relationship is not established in the dependency graph. An error response can be returned to the business service, indicating that a circular dependency exists and cannot be established. The error response can include information about the detected closed loop path, facilitating troubleshooting by the business service.
[0180] It's important to note that loop detection should be performed before establishing dependencies. Only after the loop detection passes should the dependency establishment process proceed. By performing loop detection before establishing dependencies, circular dependencies can be avoided at the source, ensuring that the dependency graph always remains a directed acyclic graph (DAG).
[0181] Optionally, the results of loop detection can be cached for a period of time. For the same associated instruction, if loop detection has already been performed within the cache validity period, the cached detection results can be used directly, avoiding repeated traversal operations. The cache validity period can be set to a relatively short time to balance detection efficiency and the timeliness of dependency changes.
[0182] Optionally, a timeout period for loop detection can be set. If the traversal operation takes longer than the set timeout, the traversal operation is terminated, and the associated instructions are rejected. By setting a timeout period, it is possible to avoid excessively long loop detection times due to overly complex dependency graphs, which could affect system response speed.
[0183] By employing the above embodiments, before establishing dependencies, the dependency graph is traversed based on the second node to obtain a traversal path containing the second node. The system then checks whether the traversal path forms a closed loop. If a closed loop is formed, the associated instruction is rejected, thus avoiding the generation of circular dependencies. Through loop detection, the dependency graph maintains the structural characteristics of a directed acyclic graph, preventing the cache invalidation process from falling into an infinite loop due to circular dependencies.
[0184] The above embodiments illustrate the cache invalidation handling process triggered by invalidation instructions and the dependency graph construction process triggered by associated instructions. The following sections will discuss this further. Figure 3 The overall architecture of the cache update system provided in the embodiments of this application is described. Figure 3 A schematic diagram of the architecture of a cache update system provided in an embodiment of this application is illustrated.
[0185] like Figure 3As shown, the cache update system includes a business service layer 310, a dependency management layer 320, and a distributed cache layer 330.
[0186] The business service layer 310 includes multiple business service nodes. Figure 3 The example illustrates business service nodes A311, B312, and C313. Each business service node is responsible for processing specific business requests, generating aggregated cached data, sending association instructions to the dependency management layer 320 when generating cached data, and sending invalidation instructions to the dependency management layer 320 when data is updated.
[0187] The dependency management layer 320 includes a dependency graph storage unit 321, a cascading failure unit 322, a graph maintenance unit 323, and a message synchronization unit 324.
[0188] The dependency graph storage unit 321 is used to store and query the data structure of the dependency graph, including each node and the positive and negative dependencies between them.
[0189] The cascaded failure unit 322 is used to respond to failure instructions, perform reverse or forward traversal in the dependency graph, determine the target node that needs to be failed, and generate a synchronization message containing the failure cache data identifier.
[0190] The graph maintenance unit 323 is used to respond to association instructions, establish new dependencies in the dependency graph, perform loop detection, and periodically perform garbage collection operations to clean up nodes with a reference count of 0 and a short validity period.
[0191] The message synchronization unit 324 is used to send the synchronization message generated by the cascading failure unit 322 to each cache node of the distributed cache layer 330, so as to realize the synchronous propagation of the failure state.
[0192] The distributed caching layer 330 includes multiple cache nodes. Figure 3 The example illustrates cache node 1 331, cache node 2 332, and cache node n 333. Each cache node stores a portion of cached data, receives synchronization messages sent by message synchronization unit 324, and performs invalidation operations on its stored cached data according to the cached data identifier in the synchronization message.
[0193] In actual operation, when generating aggregated cache data, the business service node 312 in the business service layer 310 sends an association instruction to the graph maintenance unit 323 of the dependency management layer 320 via the path indicated by the arrow. After receiving the association instruction, the graph maintenance unit 323 establishes the corresponding dependency relationship in the dependency graph storage unit 321.
[0194] When a business service node in the business service layer 310 performs a data update operation or receives a cache expiration notification, it sends an invalidation instruction to the cascading invalidation unit 322 in the dependency management layer 320. The cascading invalidation unit 322 traverses the dependency relationships stored in the dependency graph storage unit 321, determines all cached data that needs to be invalidated, and passes the invalidated cached data identifier to the message synchronization unit 324.
[0195] The message synchronization unit 324 encapsulates the invalid cached data identifier into a synchronization message and sends it to each cache node 331, 332, and 333 of the distributed cache layer 330 via the path indicated by the arrow. After receiving the synchronization message, each cache node performs deletion or invalidation operations on the corresponding cached data stored therein.
[0196] By adopting the above system architecture, dependency management and cascading failure handling are centralized in the dependency management layer 320. The business service layer 310 and the distributed cache layer 330 do not need to maintain complex dependency logic, reducing the coupling between layers. At the same time, the message synchronization unit 324 enables the synchronous propagation of failure states among multiple cache nodes, ensuring data consistency in the distributed cache environment.
[0197] Please refer to Figure 4 , Figure 4 A schematic diagram illustrating the module structure of a cache update system provided in an embodiment of this application is shown. Figure 4 As shown, the cache update system may include the following modules.
[0198] The node determination module 410 is used to determine the first node in the dependency graph corresponding to the first cached data indicated by the failure instruction in response to the failure instruction; the dependency graph is a data structure of data dependencies, including multiple nodes, which are connected by edges; the edges represent the dependencies between nodes, and the nodes represent cached data;
[0199] The relationship determination module 420 is used to determine the target node that has a dependency relationship with the first node;
[0200] The failure handling module 430 is used to determine the first cached data corresponding to the first node and the target cached data corresponding to the target node as having a failure status.
[0201] The state synchronization module 440 is used to synchronize the first cached data and the target cached data to the distributed cache after determining that they are in an invalid state.
[0202] According to an embodiment of this application, the dependency relationship includes a positive dependency relationship; the positive dependency relationship represents the dependency relationship between a parent node and a child node; the relationship determination module 420 is further used to determine the child node of the first node based on the positive dependency relationship; and determine the child node of the first node as the target node.
[0203] According to an embodiment of this application, the dependency relationship includes a reverse dependency relationship, which represents the dependency relationship between a child node and its parent node; the relationship determination module 420 is further configured to determine the parent node of the first node based on the reverse dependency relationship; if the count value of the parent node of the first node is 0, then the parent node of the first node is determined as the target node; the count value represents the number of dependencies between a child node and its parent node.
[0204] According to an embodiment of this application, the cache update system further includes a garbage collection module, which is used to determine whether the effective duration of the cached data corresponding to the target node is less than or equal to a preset threshold in response to the count value of the target node being 0; if the effective duration of the cached data corresponding to the target node is less than or equal to the preset threshold, the cached data corresponding to the target node is marked as invalid and the target node is deleted.
[0205] According to an embodiment of this application, the relationship determination module 420 is further configured to determine the parent node and / or child node of the target node based on the dependency relationship of the target node; determine whether the parent node and / or child node of the target node meets the failure condition; if the failure condition is met, then determine the parent node and / or child node of the target node as a new target node.
[0206] According to an embodiment of this application, the state synchronization module 440 is further configured to generate a synchronization message containing a first cached data identifier and a target cached data identifier; write the synchronization message into a message queue so that each cache management node in the distributed cache subscribes to the synchronization message from the message queue and performs invalidation operations on the first cached data and the target cached data it manages.
[0207] According to an embodiment of this application, the cache update system further includes an association processing module, which is used to respond to an association instruction to determine the second node corresponding to the second cache data indicated by the association instruction in the dependency graph, and the child nodes corresponding to multiple child cache data that have a dependency relationship with the second cache data; and to establish a forward dependency relationship from the second node to each child node and a reverse dependency relationship from each child node to the second node in the dependency graph.
[0208] According to an embodiment of this application, the cache update system further includes a loop detection module, which is used to traverse along the edges of the dependency graph based on the second node to obtain at least one traversal path containing the second node; if any traversal path forms a closed loop, the associated instruction is refused to be executed.
[0209] This application also discloses an electronic device, including:
[0210] Memory, used to store computer instructions;
[0211] The computer instructions are loaded by the processor and are used to: in response to the failure instruction, determine the first node in the dependency graph corresponding to the first cached data indicated by the failure instruction; the dependency graph is a data structure of data dependencies, including multiple nodes connected by edges; the edges represent the dependencies between nodes, and the nodes represent cached data; determine the target node that has a dependency relationship with the first node; determine the first cached data corresponding to the first node and the target cached data corresponding to the target node as invalid; and synchronize the first cached data and the target cached data, which are determined to be invalid, to the distributed cache.
[0212] Figure 5 This is a block diagram of an electronic device provided in an embodiment of this application. Figure 5 The electronic device shown is merely an example and should not impose any limitation on the functionality and scope of use of the embodiments of this application.
[0213] like Figure 5 As shown, an electronic device according to an embodiment of this application includes a processor 501, which can perform various appropriate actions and processes according to a program stored in a read-only memory (ROM) 502 or a program loaded from a memory 508 into a random access memory (RAM) 503. The processor 501 may include, for example, a general-purpose microprocessor (e.g., a CPU), an instruction set processor and / or an associated chipset and / or a special-purpose microprocessor (e.g., an application-specific integrated circuit (ASIC)), etc. The processor 501 may also include onboard memory for caching purposes. The processor 501 may include a single processing unit or multiple processing units for performing different actions of the method flow according to an embodiment of this application.
[0214] RAM 503 stores various programs and data required for the operation of the electronic device. Processor 501, ROM 502, and RAM 503 are interconnected via bus 504. Processor 501 executes various operations of the method flow according to embodiments of this application by executing programs in ROM 502 and / or RAM 503. It should be noted that the programs may also be stored in one or more memories other than ROM 502 and RAM 503. Processor 501 may also execute various operations of the method flow according to embodiments of this application by executing programs stored in said one or more memories.
[0215] According to embodiments of this application, the electronic device may further include an input / output (I / O) interface 506, and an input / output (I / O) interface 504 is also connected to a bus 504. The electronic device may also include one or more of the following components connected to the input / output (I / O) interface 504: an input device 506 including a keyboard, mouse, etc.; an output device 507 including a cathode ray tube (CRT), liquid crystal display (LCD), display screen, etc., and a speaker, etc.; a memory 508 including a hard disk, etc.; and a communication section 509 including a network interface card such as a LAN card, modem, etc. The communication section 509 performs communication processing via a network such as the Internet. A drive 510 is also connected to the input / output (I / O) interface 504 as needed. A removable medium 511, such as a disk, optical disk, magneto-optical disk, semiconductor memory, etc., is installed on the drive 510 as needed so that computer programs read from it can be installed into the memory 508 as needed.
[0216] The embodiments of this application have been described above. However, these embodiments are merely illustrative and not intended to limit the scope of this application. Although various embodiments have been described above, this does not mean that the measures in the various embodiments cannot be used advantageously in combination. The scope of this application is defined by the appended claims and their equivalents. Various substitutions and modifications can be made by those skilled in the art without departing from the scope of this application, and all such substitutions and modifications should fall within the scope of this application.
Claims
1. A cache update method, comprising: In response to a failure instruction, determine the first node in the dependency graph corresponding to the first cached data indicated by the failure instruction; The dependency graph is a data structure for data dependencies, including multiple nodes connected by edges; the edges represent the dependencies between nodes, and the nodes represent cached data. Identify the target node that has a dependency relationship with the first node; The first cached data corresponding to the first node and the target cached data corresponding to the target node are determined to be in an invalid state. The first cached data and the target cached data are determined to be in an invalid state and synchronized to the distributed cache.
2. The method according to claim 1, wherein the dependency relationship includes a positive dependency relationship; The positive dependency relationship represents the dependency relationship between the parent node and the child node; The step of determining the target node that has a dependency relationship with the first node includes: Based on the positive dependency relationship, determine the child nodes of the first node; The child nodes of the first node are identified as the target nodes.
3. The method according to claim 1, wherein the dependency relationship includes a reverse dependency relationship, the reverse dependency relationship representing the dependency relationship between a child node and a parent node; The step of determining the target node that has a dependency relationship with the first node includes: Based on the reverse dependency, determine the parent node of the first node; If the count value of the parent node of the first node is 0, then the parent node of the first node is determined as the target node; the count value represents the number of dependencies between the child node and the parent node.
4. The method according to claim 3, further comprising: In response to the count value of the target node being 0, it is determined whether the effective duration of the cached data corresponding to the target node is less than or equal to a preset threshold. If the effective duration of the cached data corresponding to the target node is less than or equal to a preset threshold, the cached data corresponding to the target node is marked as invalid, and the target node is deleted.
5. The method according to claim 2 or 3, further comprising: Based on the dependencies of the target node, determine the parent node and / or child node of the target node; Determine whether the parent node and / or child node of the target node meet the failure conditions; If the failure condition is met, the parent node and / or child node of the target node will be determined as the new target node.
6. The method according to claim 1, wherein determining the first cached data and the target cached data as invalid and synchronizing them to the distributed cache includes: Generate a synchronization message containing the first cached data identifier and the target cached data identifier; The synchronization message is written to a message queue so that each cache management node in the distributed cache subscribes to the synchronization message from the message queue and performs invalidation operations on the first cache data and the target cache data it manages.
7. The method of claim 1, wherein the invalidation instruction is used to indicate that at least one of the following cached data is determined to be in an invalid state: Based on the first cached data corresponding to the data update operation; Based on the first cached data that has expired in the distributed cache; Based on the first cached data corresponding to the database change record; The first cached data is based on the nodes in the dependency graph whose effective duration is less than or equal to a preset threshold.
8. The method according to claim 1, further comprising: In response to an association instruction, determine the second node corresponding to the second cached data indicated by the association instruction in the dependency graph, and the child nodes corresponding to multiple child cached data that have a dependency relationship with the second cached data; In the dependency graph, a positive dependency relationship is established from the second node to each of the child nodes, and a negative dependency relationship is established from each of the child nodes to the second node.
9. The method according to claim 8, further comprising: Based on the second node, traverse along the edges of the dependency graph to obtain at least one traversal path that includes the second node. If any of the traversal paths forms a closed loop, the associated instruction is refused to be executed.
10. A cache update system, comprising: A node determination module is used to determine, in response to a failure instruction, the first node corresponding to the first cached data indicated by the failure instruction in the dependency graph. The dependency graph is a data structure for data dependencies, including multiple nodes connected by edges; the edges represent the dependencies between nodes, and the nodes represent cached data. The relationship determination module is used to determine target nodes that have a dependency relationship with the first node; The failure handling module is used to determine the first cached data corresponding to the first node and the target cached data corresponding to the target node as having a failure status. The state synchronization module is used to determine the first cached data and the target cached data as invalid and synchronize them to the distributed cache.