Forest for spatial optimization of graph databases

By employing a space-optimized forest graph database in the graph database system, the data entries of highly active users are split into separate tree graphs, which solves the problems of concurrent write conflicts and storage space waste, and achieves high-efficiency storage and query performance.

CN122196235APending Publication Date: 2026-06-12FACE CUTE CO LTD +1

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
FACE CUTE CO LTD
Filing Date
2025-10-10
Publication Date
2026-06-12

AI Technical Summary

Technical Problem

In graph database systems with high concurrency write conflicts, such as those on social media platforms, traditional tree-graph storage solutions lead to problems of concurrent write conflicts and wasted storage space.

Method used

A space-optimized forest graph database system splits data entries for highly active users into separate tree graphs, processes query requests independently, reduces write conflicts, and optimizes storage space utilization.

🎯Benefits of technology

It effectively reduces concurrent write conflicts, improves the concurrent write throughput of the database system, optimizes storage space utilization, and reduces storage costs.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122196235A_ABST
    Figure CN122196235A_ABST
Patent Text Reader

Abstract

Embodiments of the present disclosure relate to a spatially optimized forest for graph databases. Implementations of a graph database system for spatial optimization are provided. One implementation includes a computing system for implementing a graph database system, the computing system comprising: processing circuitry and a memory storing instructions that, when executed, cause the processing circuitry to: store a graph database comprising an initial tree graph, the initial tree graph storing a plurality of data entries, each data entry comprising a respective field identifier; receive a query to update the graph database, wherein the query comprises a request to add a new data entry; determine, based on one or more predetermined criteria, a split event to be performed; generate, by splitting out a subset of the plurality of data entries of the initial tree graph, a new tree graph corresponding to the field identifier of the new data entry, the subset of the plurality of data entries comprising all data entries of the initial tree graph corresponding to the field identifier of the new data entry; and update the new tree graph in accordance with the query.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This disclosure relates to the field of computer science. More specifically, it relates to a forest for spatial optimization of graph databases. Background Technology

[0002] Graph databases are a type of database that uses a graph structure to represent and store data. The components of a graph include nodes, edges, and attributes, which describe and store data entries and their relationships. This structure allows graph algorithms to analyze relationships between data in ways that are difficult to see using other methods. As the connectivity of graphs and the volume of data increase, graph algorithms become more powerful tools for cost-effectively analyzing and utilizing data. For example, querying relationships in a graph database can involve graph traversal algorithms that leverage the connectivity within the graph to provide queries that are more efficient than those in relational databases. Paths, distances between nodes, and the clustering properties of nodes provide intuitive indicators of various database properties. Because the graph itself explicitly stores relationships, queries and algorithms performed on graph components can be executed quickly. In contrast, traditional relational databases compute relationships through multiple basic operations during a query. Summary of the Invention

[0003] This summary is provided to introduce, in a simplified form, some concepts further described below in the detailed description. This summary is not intended to identify key or essential features of the claimed subject matter, nor is it intended to limit the scope of the claimed subject matter. Furthermore, the claimed subject matter is not limited to embodiments that address any or all the shortcomings pointed out in any part of this disclosure.

[0004] Implementations of a graph database system for spatial optimization are provided. One implementation includes a computational system for implementing the graph database system, the computational system comprising: a processing circuit system and a memory storing instructions, which, when executed, cause the processing circuit system to: store a graph database including an initial tree graph, the initial tree graph storing multiple data entries, each data entry including a corresponding field identifier; receive a query to update the graph database, wherein the query includes a request to add new data entries; determine a splitting event to be performed based on one or more predetermined criteria; generate a new tree graph corresponding to the field identifiers of the new data entries by splitting a subset of the multiple data entries of the initial tree graph, wherein the subset of the multiple data entries includes all data entries of the initial tree graph corresponding to the field identifiers of the new data entries; and update the new tree graph according to the query. Attached Figure Description

[0005] Figure 1 A schematic diagram of an example computational system for implementing a graph database using spatially optimized forest graphs is shown.

[0006] Figure 2 A schematic diagram of an example Bw tree used for storing data is shown, which can be used... Figure 1 This is implemented using an example computing system.

[0007] Figure 3 A schematic diagram of a graph database implemented using a single Bw tree with multiple additional incremental records is shown.

[0008] Figure 4A and 4B This diagram illustrates an example graph database implemented using a space-optimized Bw tree forest graph, which can be used... Figure 1 This is implemented using an example computing system.

[0009] Figure 5 This diagram illustrates a process flowchart for an example method of implementing a graph database using spatially optimized forest graphs, which can be used... Figure 1 This is implemented using an example computing system.

[0010] Figure 6 This document illustrates a process flowchart for implementing a graph database for a social media platform using spatially optimized forest graphs. Figure 1 This is implemented using an example computing system.

[0011] Figure 7 A schematic diagram of an example computing system is shown that can implement one or more of the methods and processes described herein. Detailed Implementation

[0012] Graph databases can be used to store large-scale graph data for a variety of applications. Typically, graph databases are implemented using tree structures to provide powerful algorithmic capabilities such as fast querying, insertion, and deletion. Various types of tree graphs have been considered for use in database systems, including but not limited to binary trees, m-ary trees, B-trees, B+ trees, and Bw trees. Different applications may have different design considerations that influence the way the database is implemented. For example, concurrency can be a critical factor for applications with constant update requests. In some tree graph databases (e.g., Bw tree graph databases), data entries are stored in a sorted manner on leaf nodes, which enables fast querying. When multiple requests to update leaf nodes are received simultaneously, write conflicts may occur, leading to retries and waiting times.

[0013] In some applications, concurrent write conflicts are largely unavoidable. One such example includes databases implemented for social media platforms, where various aspects associated with a user are dynamically stored / updated. On social media platforms, numerous different interactions between users and media content can be advantageously stored for various purposes. For example, information related to a user "subscribing" or "following" another user can be stored. In another example, information related to each user's preferences for media content can be stored, which can support various functions such as providing recommendations based on user preferences. One way to directly store such information is to provide users with a way to perform a "like" action on media content (e.g., clicking "like" on an image, video, etc.) and to record / store such actions as they occur. In traditional database systems, storing like actions is typically implemented using a single tree graph for efficiency. However, on social media platforms with sufficient popularity, like actions performed by different users can lead to constant update requests to the tree graph, resulting in large-scale concurrent write conflicts. This can substantially reduce the concurrent write throughput of the database system.

[0014] Reads and writes between different treegraphs are completely independent and do not interfere with each other. Thus, dividing the like operations to be stored into separate treegraphs corresponding to different users can solve the problem of concurrent write conflicts. Since users may not like two videos at the same time, this storage scheme can significantly reduce the risk of access conflicts within the framework. However, this approach leads to additional wasted storage space. In the given example, the activity levels of users on a social media platform typically follow a power-law distribution, with some extremely active users and the majority of ordinary users being less active. In graph database systems, the back-end storage of treegraphs is typically allocated based on blocks. To align with these storage units, a single treegraph often contains a large number of stored data elements. If a separate treegraph is allocated for each user, the block-based allocation of storage can result in significant wasted space for the vast majority of ordinary users due to storage holes in leaf nodes and the memory overhead of maintaining additional data structures such as intermediate nodes and mapping tables.

[0015] Based on the above observations, implementations of a database system utilizing a space-optimized forest graph are provided. Utilizing a space-optimized forest graph as a storage engine can be implemented in various ways. In some implementations, the database system includes an initial tree graph for storing data entries. Each data entry may be associated with a field identifier, such as a user identifier (e.g., username, account name, account number, etc.). Based on the frequency of query requests, data entries associated with highly active field identifiers can be separated from the initial tree graph and stored in a new, separate tree graph. The database system may include a hash table that stores field identifiers and pointers to their corresponding tree graphs as key-value pairs. Using the aforementioned social media platform example, a space-optimized forest graph can be implemented using an initial tree graph storing likes from users, including new users. When a user is determined to be a highly active user (e.g., a high query request rate or a high number of likes exceeding a predetermined threshold), the stored likes associated with that user can be separated into separate tree graphs. This approach may reduce write conflicts in the initial tree graph without requiring additional storage space for a separate tree graph for each user.

[0016] Turning now to the accompanying figures, an implementation of a database system utilizing spatially optimized forest graphs is depicted and described in further detail. Figure 1 A schematic diagram of an example computing system 100 for implementing a graph database 102 using a spatially optimized forest graph 104 is shown. The example computing system 100 includes a processing circuitry system 106 and a memory 108 storing instructions that, during execution, cause the processing circuitry system 106 to perform the processes described herein. The example computing system 100 can be implemented using various types of computing devices, including but not limited to personal computers, servers, and mobile devices. For example, the computing system 100 may include multiple computing devices, and the processing circuitry system 106 and memory 108 may each include multiple components distributed across multiple computing devices (e.g., the processing circuitry system 106 may include multiple processors within a single device or distributed across multiple devices). Devices may be located locally or remotely. In some embodiments, the computing system 100 is implemented as a cloud storage server. The example computing system 100 may also include components not depicted for providing various functionalities, including components on the respective computing devices.

[0017] Example computing system 100 includes a space-optimized graph database module 110, which implements a graph database 102 and serves query requests 112 made on the graph database 102. Query requests 112 can include any type of database query, including queries for storing, manipulating, and / or retrieving data. Module 110 can implement the graph database 102 in various ways. In the example shown, the graph database 102 is implemented using a tree structure. Various types of tree structures can be used, including but not limited to binary trees, B-trees, and B-trees. + Trees and Bw trees.

[0018] Module 110 can implement the treegraph database 102 by initializing a space-optimized forest graph 104 using an initial treegraph for storing data entries (e.g., key-value pairs). Upon meeting certain predetermined criteria, new treegraphs can be generated in the space-optimized forest graph 104 by splitting a portion from the initial treegraph. Module 110 further initializes the treegraph database 102 with a hash table 114 that stores identifiers and pointers to individual treegraphs within the forest graph 104. The space-optimized forest graph 104 can be implemented in various ways. Figure 1 In the space-optimized forest 104, each tree graph is logically depicted as connected nodes, where leaf nodes point to corresponding data blocks containing stored data entries. In the physical implementation, each tree graph within the space-optimized forest 104 can be implemented using a mapping table that stores node identifiers and corresponding pointers to the physical addresses of the respective nodes. Various other designs can also be used.

[0019] Upon receiving query request 112, which includes a request to add new data entries to treegraph database 102, module 110 determines the treegraph in the space-optimized forest graph 104 to which the new data entries should be added. Initially, the space-optimized forest graph 104 only includes the initial treegraph, where all new data entries are added. As query requests become more frequent, module 110 determines split events to be performed, which splits the initial treegraph into a forest graph comprising multiple trees. Read and write requests between different treegraphs can be completely independent and do not interfere with each other. Thus, splitting the initial treegraph based on the frequency of certain access requests can help mitigate future concurrent write conflicts.

[0020] Module 110 can determine split events in various ways. In some implementations, a split event is performed when predetermined criteria associated with a field identifier of a new data entry are met. To mitigate concurrent write conflicts, the field identifier associated with the source of the update request can be used to determine whether a new entry should be added to the initial treegraph. If the source of the update request is determined to be a high-activity source, module 110 can perform a split event to split the data entry associated with the high-activity source from the initial treegraph, thereby forming a separate treegraph. The update request can then be performed on the new separate treegraph. In other implementations, the update request is performed on the initial treegraph before the split event. Queries issued by high-activity sources can advantageously utilize the separate treegraph instead of the initial treegraph. Queries issued by remaining sources (low-activity sources) can utilize the initial treegraph, which should result in fewer concurrent write conflicts. In some implementations, module 110 continuously monitors the treegraph database 102 to determine split events. In some implementations, module 110 determines whether to perform a split event in response to receiving a request to add a new data entry to the treegraph database 102.

[0021] Graph database 102 can be implemented for various applications. Using the previously discussed example of a social media platform, graph database 102 can be implemented to store likes performed by users of the social media platform. In this case, new data entries to be stored can represent likes performed by a given user. New data entries can include, for example, a user identifier associated with the user who performed the like, the media content on which the like was performed, and various other properties (e.g., time, date, etc.). Initially, the initial tree graph can store likes performed by all users, including new users. As access requests become more frequent, module 110 can determine to split off a portion of the initial tree graph associated with highly active users. Future access requests (future likes performed) by highly active users can be split into new tree graphs, independent of queries to the initial tree graph. It will be readily understood that graph database 102 can be implemented to store any type of data. In some implementations, graph database 102 is implemented to store relationships between users (e.g., subscribers, followers, etc.).

[0022] Criteria for executing split events can be determined in various ways. In some implementations, a split event is determined when a user's access frequency and / or the number of likes performed by a user exceeds a predetermined threshold. In a further implementation, the predetermined threshold includes a threshold rate at which users perform likes, exceeding approximately 80% of the rate of all users. Additionally or alternatively, the predetermined threshold may include a threshold number of likes performed, exceeding approximately 80% of the number of all users. It will be readily understood that any other percentage threshold can also be implemented.

[0023] As mentioned above, various types of tree structures can be implemented for space-optimized forest databases. Different tree designs offer different advantages. For example, the Bw tree is well-suited for hardware implementation. Figure 2 A schematic diagram of an example Bw tree 200 for storing data is shown. The Bw tree 200 is organized using a mapping table 202, which describes the physical locations of the nodes (also referred to as pages) of the Bw tree 200. Mapping table 202 includes an identifier column listing identifiers of the nodes of the Bw tree 200 and a corresponding pointer column listing corresponding pointers to the physical locations of the nodes. The example Bw tree 200 includes at least a root node N1, internal nodes N2 and N3, and a leaf node N4. In the example shown, data entry 204 is stored at leaf node N4. As shown, leaf node N4 includes at least three data entries D1-D3. In other embodiments, each leaf node of the Bw tree includes a pointer to a data block storing a data entry.

[0024] Data entry 204 can be in any format. In some implementations, each data entry includes key-value pairs. Using the social media platform example described above, each data entry 204 may include information describing a user's "like" action on media content. For example, each data entry in data entry 204 may include a user identifier and a media content identifier. In some implementations, the user identifier is stored as the key and the media content identifier is stored as the value. In other implementations, the user identifier and the media content identifier are stored as the key, and other characteristics are stored as values. Characteristics may include any type of information, such as the time when the "like" action was performed.

[0025] In some implementations, the Bw tree 200 stores edge information of a graph, where nodes represent users and media content, and an edge between two nodes represents a user (first node) performing a "like" action on media content (second node). In this case, whenever a user performs a "like" action on media content, an edge is established between the node representing the user and the node representing the media content. This edge describes the source and target nodes and can be stored in the Bw tree 200 as a key for a data entry representing a user's "like" action on media content. Edge characteristics (e.g., the time the "like" action was performed) can be stored as values ​​associated with the key. Additionally or alternatively, the Bw tree 200 can store information describing relationships between users (e.g., subscribers, followers, etc.). For example, the Bw tree 200 can store information related to edges between users, rather than storing edges between user nodes and media content nodes. Directed edges can be used to identify follower-follower relationships. In some implementations, this relationship is defined by how information is stored (e.g., the first node indicates a follower).

[0026] Bw trees are designed to implement various functionalities. Similar to B+ trees, information is stored in leaf nodes in an ordered manner, while internal nodes provide information guiding the search for specific data entries. Bw trees can be implemented as logical pages, and therefore do not have a fixed size. For example, Figure 2 The use of mapping table 202 provides the physical location of the nodes (pages) of the Bw tree 200. This allows nodes to be located in different locations in memory, enabling the size of a given node to be changed. Another significant feature of the Bw tree is its update process. Instead of making in-place changes to the tree, an incremental record describing the update is added to an existing page. The added incremental record points to the physical address of the page, and the pointer to the page is redirected to the incremental record.

[0027] Redirecting pointers can be performed via atomic operations. This way, if multiple attempts try to add different incremental records to the same page, only one operation will succeed. Failed update attempts can be retried, which will include attempts to add incremental records to previously added incremental records. After several updates, a chain of incremental records can form. As the chain grows, search performance may be affected. To address this, page merging can be performed periodically to create a new base page for updates of incremental records added by the application.

[0028] While incremental record updates provide database integrity, system performance can be impacted by retries and latency when multiple concurrent attempts to add incremental records exist. For example, using the aforementioned social media platform example, a Bw tree database could be implemented to store user-performed "likes." Traditionally, likes and other similar data storage schemes are stored in a single Bw tree. Figure 3 A schematic diagram of a graph database implemented using a single Bw tree 300 with multiple incrementally added records is shown. The Bw tree 300 is implemented to store data entries, where each data entry includes a field identifier key and an associated value. For example, the field identifier key could be a user identifier of the user who performed the like operation, and the value could be an identifier of the media content to which the like operation was performed. In some implementations, the field identifier key describes both the user and the media content, and the value field can be used to describe other characteristics, such as the time when the like operation was performed.

[0029] The portion depicted in Bw tree 300 shows the leaf node N storing key-value pair data entries. X In the example shown, leaf node N XStore at least four data entries with three distinct field identifiers ("A", "B", and "C"). Using the example above, the three distinct field identifiers can correspond to three different users. Currently, the Bw tree 300 includes a chain of three incremental records 304A-304C, used to update (add) the three data entries with different field identifiers, respectively. The latest incremental record 304C is used as the entry for leaf node N. X The current memory address. Therefore, mapping table 306 includes a list of leaf nodes N. X And the entry pointing to the corresponding pointer to the latest increment record 304C.

[0030] Figure 3 A Bw tree 300 depicts how query requests from different users affect the same leaf node. With thousands or millions of users, concurrent access requests are almost deterministic. With more conflicts, the resulting retries and wait times substantially reduce concurrent write throughput. To mitigate this problem, a forest graph can be implemented using multiple tree graphs, each corresponding to a different user. Since access requests across the tree graph can be independent of each other, this completely solves the concurrency problem. However, the backend storage used for such database systems is typically block-based. Therefore, allocating storage space for each user's tree is impractical.

[0031] This disclosure provides a hybrid solution for achieving spatial optimization of forest graphs in database systems. Using... Figure 3 For example, suppose there are three users: A, B, and C. While B and C are ordinary, inactive users, A is an active user who performs multiple "like" actions per day. In this case, it would be ideal to store user A's "like" actions in a separate tree graph, and store user B and C's "like" actions on the initial tree graph. Since B and C are inactive users, query requests associated with users B and C are unlikely to be concurrent. This approach can scale to any number of users.

[0032] Figure 4A and 4B A schematic diagram of an example graph database implemented using a space-optimized Bw tree forest graph 400 is shown. Figure 4A In the example shown, the Bw tree forest graph 400 includes an initial Bw tree graph and individual Bw tree graphs. In the example shown, the individual Bw tree graph corresponds to the user identifier A. As shown, the individual Bw tree graph includes leaf nodes N. YThis leaf node stores at least two data entries associated with user identifier A. The separate Bw treegraph also includes incremental records for updating / adding data entries associated with user identifier A. Query requests associated with user identifiers other than A (including B and C) currently point to the initial Bw treegraph. As shown, the initial Bw treegraph includes leaf node N. X The leaf node stores the data entry associated with user identifier B and the data entry associated with user identifier C. The two Bw tree graphs are logically organized with corresponding mapping tables 402 and 404. The organization of the entire forest graph 400 is managed by hash table 406. Hash table 406 includes an identifier column and a pointer column. Each entry associates a user identifier with a pointer to the corresponding Bw tree. In the example shown, the entry associated with user identifier A includes a pointer to a separate Bw tree graph, and the entries associated with user identifiers B and C include pointers to the initial Bw tree graph.

[0033] Figure 4A The state of the spatially optimized forest graph database is depicted, where user identifier A is classified as a highly active user and is therefore associated with a separate tree graph. Using the social media platform example as described above, each data entry corresponds to a like user identifier, and user identifier A is a user who frequently performs like actions on the social media platform. In this case, the data entries stored in the tree graph associated with user identifier A do not need to include information describing the user identifier. Instead, this implies that all entries in that tree graph are associated with user identifier A. User identifiers B and C are still considered inactive users and are therefore associated with the initial Bw tree graph. When identifying new highly active users, a new separate Bw tree can be formed for that user.

[0034] Determining highly active users can be done in various ways. In some implementations, a user is identified as highly active when one or more predetermined thresholds are met. Any type of threshold can be used. For example, once the number of data entries associated with a user identifier exceeds a predetermined threshold number, their data is split from the initial Bw tree and placed into a separate individual tree. In some implementations, a split event is performed once the rate of query requests received from a user identifier exceeds a predetermined threshold rate.

[0035] exist Figure 4A In this context, user identifier A has already been identified as a highly active user and has a separate individual Bw tree. When identifying new highly active users, the corresponding data entries can be split into another separate individual Bw tree. Figure 4BA new, separate Bw tree, split from the initial Bw tree, is depicted. In the example shown, user identifier B is identified as a highly active user. Thus, the data entry associated with user identifier B in the initial Bw tree, along with the corresponding mapping table 408, is split into a separate Bw tree. Forest graph 400 now includes the initial Bw tree and two separate Bw trees. Inactive users (such as user identifier C) and new users can utilize the initial Bw tree, while user identifiers A and B utilize their respective separate Bw trees.

[0036] although Figure 4A and 4B A sample graph database implemented using a Bw tree is described, but other tree structures can also be used in general. For example, Bw trees can be used. + A tree structure is used to create a forest graph. Figure 5 A flowchart of an example method 500 for generally implementing a graph database using a spatially optimized forest graph is shown. Method 500 includes, in step 502, storing a graph database comprising an initial tree graph that stores multiple data entries. Each data entry can store various types of data. In some embodiments, each data entry in the initial tree graph includes a corresponding field identifier. The field identifier can include information identifying a given attribute of the stored data. For example, the field identifier could be a user identifier identifying a specific user, such as a user of a social media platform. In some embodiments, the data entries store data in a key-value pair format. For example, the field identifier (such as a user identifier) ​​can be stored as a key. In such cases, the field identifier can be used to index, search, and update the initial tree graph (e.g., the tree graph can be sorted by key). In some embodiments, the data entries include information describing the relationship between the corresponding user identifier and a media content identifier. For example, a given data entry could store information describing a user's "like" action on media content on a social media platform. In a further embodiment, such information is stored as a key, while edge information is stored as corresponding values.

[0037] Graph databases can be implemented in various ways. In some implementations, a graph database includes a hash table and a forest graph including an initial tree graph. The initial tree graph can be implemented as a specific type of tree graph. Various types of tree structures can be used. Examples of tree structures that can be used include, but are not limited to, binary trees, B-trees, and B-trees. + Trees and Bw trees. In some implementations, a graph database comprises multiple initial tree graphs. Hash tables can be implemented in various ways. Generally, a hash table comprises a key column and a value column. In some implementations, field identifiers (e.g., user identifiers) are used as keys, and values ​​are pointers to the corresponding tree graph associated with the respective field identifier. In some implementations, the graph database uses a block-based allocation scheme for storage.

[0038] Method 500 includes receiving a query to update the graph database in step 504. The query can be any type of database query. In some embodiments, the query includes a request to add a new data entry including a field identifier. In further embodiments, the new data entry also includes a value. The field identifier can be any type of identifier. In some embodiments, the field identifier includes a user identifier. For example, the field identifier can be a username, account, or any other identifier of the user. Depending on the application, the data entry can include different types of information. In applications used to store information on social media platforms, the data entry can include information describing a user's actions. In this case, the data entry can include a user identifier that identifies the user. In some embodiments, the data entry includes information describing a user's "like" action on media content (such as an image or video). In this case, the data entry can include information describing the user, the media content, and / or any other information (such as the time and / or date of the "like" action).

[0039] Method 500 includes determining the splitting event to be performed in step 506 based on one or more predetermined criteria. Any type of criterion can be used. In some embodiments, the one or more predetermined criteria include one or more thresholds. In a further embodiment, the one or more thresholds include a threshold number of data entries in the initial tree graph that include the same field identifier as the field identifier of the new data entry. Additionally or alternatively, the one or more thresholds may include a threshold rate of received queries updating the graph database with data entries that include the same field identifier as the new entry. In some embodiments, the threshold rate is higher than approximately 80% of the query rate corresponding to the number of different field identifiers in the initial tree graph.

[0040] Method 500 includes generating a new tree graph in step 508. The new tree graph may correspond to the field identifiers of the new data entries and can be generated in various ways. The new tree graph may utilize the same type of graph structure as the initial tree graph. In some implementations, the new tree graph is generated by splitting the initial tree graph into a subset comprising all data entries having field identifiers corresponding to the field identifiers of the new data entries. In some implementations, the data in the new tree graph may be stored in a different manner than in the initial tree graph. For example, the initial tree graph may store data entries in a key-value pair format, where information describing the relationship between user identifiers and media content identifiers is stored as keys, while edge information describing that relationship may optionally be stored as values. Since the new tree graph stores data entries corresponding to a given field identifier (e.g., a user identifier), such information can be omitted in the new tree graph. In the previous example, instead of storing the relationship between user identifiers and media content identifiers, data entries in the new tree graph may only need to store the media content identifier to represent similar information. Such schemes allow for higher storage efficiency.

[0041] Method 500 includes updating a new treegraph in step 510 based on the query. The update may be, for example, adding new data entries to the new treegraph. In some implementations, the update is performed before a split event occurs. Steps 504-510 may be repeated when a new query request is received. For example, upon receiving a second query to update the graph database, method 500 may include determining a second split event to be performed and generating a second new treegraph. The second new treegraph may correspond to the field identifier of a second new data entry corresponding to the second query updating the graph database. Thus, the graph database operation generates a new treegraph to be included in the forest graph whenever it determines that a field identifier (e.g., a user) meets one or more predetermined criteria (which typically reflect the activity level of the field identifier). This allows for the separation of more active field identifiers that are more likely to cause concurrent write conflicts, enabling queries associated with active field identifiers to be executed on individual trees, which can be performed independently of each other.

[0042] In the illustrated example method 500, the split event is determined based on a received query for updating the graph database. Additionally or alternatively, the graph database may monitor the state of the initial treegraph to determine the split event. For example, the graph database may continuously monitor the state of the initial treegraph to determine whether one or more preset criteria (such as the aforementioned criteria) are met. In some implementations, the one or more preset criteria include a size threshold for the initial treegraph. When a specific size (storing a specific amount of data) is reached, a split event can be performed to split a subset of the data entries in the initial treegraph into a new, separate treegraph. This subset can be selected in various ways. In some implementations, the subset of data entries includes all data entries in the initial treegraph corresponding to a given field identifier that appears most frequently in the initial treegraph. For example, the subset of data entries may include all data entries associated with a given user identifier that has the most data entries (e.g., a new treegraph may be formed to store information about a given user identified as a highly active user).

[0043] Different applications can achieve Figure 5 The example method has 500 different variations. For example, implementing a graph database for a social media platform could involve tracking user identifiers and performing like actions. Figure 6 A flowchart of an example method 600 for implementing a graph database using a forest graph optimized for social media platforms is shown. Method 600 includes a request to add new users in step 602. For example, new users are frequently added on social media platforms. These new users should be added and recorded in the forest graph database.

[0044] Method 600 includes inserting new users into the initial tree graph in step 604. New users are considered inactive by default. Therefore, they can be inserted into the initial tree graph until the opposite information is provided. Various types of tree structures can be used. Examples of tree structures that can be used include, but are not limited to, binary trees, B-trees, and B-trees. + Trees and Bw trees.

[0045] Method 600 includes detecting the activity level of each user in step 606. High user activity can be continuously monitored. In some implementations, this determination is made upon receiving a query associated with a given user. For example, upon receiving a query associated with adding a "like" action performed by a given user, method 600 can determine whether a given user is a highly active user based on one or more predetermined criteria. This can be achieved using criteria such as... Figure 5 The guidelines described in Method 500.

[0046] Method 600 includes identifying whether a user is a highly active user in step 608. The process continuously checks for highly active users. Upon detecting a highly active user, the process proceeds to step 610, which splits the portion of the initial tree graph corresponding to the highly active user into separate trees. The process is then repeated in step 606 to check for more highly active users.

[0047] The methods and implementations described herein provide a graph database system utilizing forest graphs. A forest graph comprises multiple graphs, including an initial tree graph. The initial graph is the default graph for storing data entries. As the database grows, data entries associated with high activity are identified and split into separate tree graphs. This allows queries from high-activity sources to be performed independently of each other, while still efficiently utilizing storage space by leveraging the initial tree graph for the remaining low-activity sources—sources unlikely to execute queries and therefore unlikely to execute concurrent queries that would lead to write conflicts.

[0048] In some embodiments, the methods and processes described herein may be associated with a computing system of one or more computing devices. Specifically, such methods and processes may be implemented as computer applications or services, application programming interfaces (APIs), libraries, and / or other computer program products.

[0049] Figure 7 A non-limiting embodiment of a computing system 700 that can implement one or more of the methods and processes described above is schematically illustrated. The computing system 700 is shown in a simplified form. The computing system 700 can implement the methods and processes described above and... Figure 1 The computing system 100 is illustrated in the figure. Components of the computing system 700 may be included in one or more personal computers, server computers, tablet computers, home entertainment computers, network computing devices, video game devices, mobile computing devices, mobile communication devices (e.g., smartphones) and / or other computing devices.

[0050] The computing system 700 includes a processing circuit system 702, volatile memory 704, and non-volatile storage device 706. The computing system 700 may optionally include a display subsystem 708, an input subsystem 710, a communication subsystem 712, and / or... Figure 7 Other components not shown.

[0051] The processing circuit system 702 includes a logic processor, which can be implemented using one or more physical devices configured to execute instructions. For example, the processing circuit system 702 can be configured to execute instructions that are part of one or more applications, programs, routines, libraries, objects, components, data structures, or other logical constructs. Such instructions can be implemented to perform tasks, implement data types, change the state of one or more components, achieve technical effects, or otherwise achieve desired results.

[0052] Processing circuit system 702 may include one or more physical processors configured to execute software instructions. Additionally or alternatively, processing circuit system 702 may include one or more hardware logic circuits or firmware devices configured to execute hardware-implemented logic or firmware instructions. The processor of processing circuit system 702 may be single-core or multi-core, and the instructions executed thereon may be configured for sequential, parallel, and / or distributed processing. The various components of processing circuit system 702 may optionally be distributed across two or more separate devices that may be remotely located and / or configured for coordinated processing. Aspects of processing circuit system 702 may be virtualized and executed by remotely accessible networked computing devices configured in a cloud computing configuration. In this context, it will be understood that these virtualized aspects run on different physical logic processors on various different machines.

[0053] The non-volatile storage device 706 includes one or more physical devices configured to store instructions executable by the processing circuitry system 702 to implement the methods and processes described herein. When implementing such methods and processes, the state of the non-volatile storage device 706 can be transformed—for example, to store different data.

[0054] Non-volatile storage device 706 may include removable and / or built-in physical devices. Non-volatile storage device 706 may include optical memory, semiconductor memory, and / or magnetic memory, or other high-capacity storage device technologies. Non-volatile storage device 706 may include non-volatile, dynamic, static, read / write, read-only, sequential access, location-addressable, file-addressable, and / or content-addressable devices. It will be understood that non-volatile storage device 706 is configured to retain instructions even when power to non-volatile storage device 706 is cut off.

[0055] Volatile memory 704 may include a physical device that includes random access memory. Volatile memory 704 is typically used by processing circuitry system 702 to temporarily store information during the processing of software instructions. It will be understood that when power to volatile memory 704 is cut off, volatile memory 704 will typically not continue storing instructions.

[0056] Various aspects of the processing circuitry 702, the volatile memory 704, and the non-volatile storage device 706 can be integrated into one or more hardware logic components. Such hardware logic components may include, for example, field-programmable gate arrays (FPGAs), application-specific integrated circuits (PASICs / ASICs), application-specific standard products (PSSPs / ASSPs), system-on-a-chip (SoCs), and complex programmable logic devices (CPLDs).

[0057] The terms "module," "program," and "engine" can be used to describe an aspect of computing system 700, typically implemented in software by a processor to perform a specific function using a portion of volatile memory. This function involves transformation processing specifically configured by the processor to perform that function. Therefore, a module, program, or engine can be instantiated via processing circuitry 702 using a portion of volatile memory 704 to execute instructions stored in non-volatile storage device 706. It will be understood that different modules, programs, and / or engines can be instantiated from the same application, service, code block, object, library, routine, API, function, etc. Similarly, the same module, program, and / or engine can be instantiated from different applications, services, code blocks, objects, routines, APIs, functions, etc. The terms "module," "program," and "engine" can encompass individuals or groups of executable files, data files, libraries, drivers, scripts, database records, etc.

[0058] When included, the display subsystem 708 can be used to present a visual representation of the data stored by the non-volatile storage device 706. The visual representation may take the form of a graphical user interface (GUI). When the methods and processes described herein change the data stored by the non-volatile storage device, and thus change the state of the non-volatile storage device, the state of the display subsystem 708 can also be changed to visually represent the change in the underlying data. The display subsystem 708 may include one or more display devices utilizing virtually any type of technology. Such display devices may be combined with the processing circuitry system 702, the volatile memory 704, and / or the non-volatile storage device 706 in a shared housing, or such display devices may be peripheral display devices.

[0059] When included, the input subsystem 710 may include or interface with one or more user input devices, such as a keyboard, mouse, touchscreen, camera, or microphone.

[0060] When included, the communication subsystem 712 can be configured to communicatively couple the various computing devices described herein to each other and to other devices. The communication subsystem 712 may include wired and / or wireless communication devices compatible with one or more different communication protocols. As a non-limiting example, the communication subsystem can be configured to communicate via wired or wireless local area networks or wide area networks, broadband cellular networks, etc. In some embodiments, the communication subsystem may allow the computing system 700 to send messages to and / or receive messages from other devices via a network such as the Internet.

[0061] The following paragraphs provide additional description of the subject matter of this disclosure. One example provides a computational system for implementing a graph database system, the computational system comprising: a processing circuit system and a memory storing instructions, which, when executed, cause the processing circuit system to: store a graph database including an initial tree graph storing a plurality of data entries, each data entry including a corresponding field identifier; receive a query to update the graph database, wherein the query includes a request to add a new data entry; determine a splitting event to be performed based on one or more predetermined criteria; generate a new tree graph corresponding to the field identifier of the new data entry by splitting a subset of the plurality of data entries of the initial tree graph, wherein the subset of the plurality of data entries includes all data entries of the initial tree graph corresponding to the field identifier of the new data entry; and update the new tree graph according to the query. In this example, additionally or alternatively, the graph database further includes a hash table with different field identifiers as keys, and the values ​​of the hash table point to corresponding tree graphs storing data entries corresponding to the respective field identifiers. In this example, additionally or alternatively, the field identifiers of the initial tree graph include user identifiers, and each of the plurality of data entries of the initial tree graph includes information describing the relationship between the corresponding user identifier and the corresponding media content identifier. In this example, additionally or alternatively, generating the new tree graph includes storing the subset of the plurality of data entries of the initial tree graph that has the corresponding media content identifier and does not have the corresponding user identifier. In this example, additionally or alternatively, the one or more predetermined criteria include one or more thresholds of: the number of data entries in the initial tree graph corresponding to the field identifier of the new data entry; or the rate of received queries for updating the graph database corresponding to the field identifier of the new data entry. In this example, additionally or alternatively, the field identifiers include user identifiers. In this example, additionally or alternatively, the initial tree graph includes data entries with a plurality of different user identifiers, and the threshold for the rate of received queries is higher than the query rate of approximately 80% of the user identifiers among the plurality of different user identifiers in the initial tree graph. In this example, additionally or alternatively, the initial tree diagram includes a B+ tree or a Bw tree.In this example, additionally or alternatively, the instructions, when executed, also cause the processing circuitry to: receive a second query to update the graph database, wherein the second query includes a request to add a second new data entry; determine a second split event to be performed; generate a second new tree graph corresponding to the field identifier of the second new data entry by splitting out a second portion of the initial tree graph, wherein the second portion includes a second subset of the plurality of data entries of the initial tree graph corresponding to the field identifier of the second new data entry; and update the second new tree graph according to the second query.

[0062] Another example provides a method for implementing a graph database system, the method comprising: storing a graph database including an initial tree graph, the initial tree graph storing multiple data entries, each data entry including information describing the relationship between a corresponding user identifier and a corresponding media content identifier; determining a splitting event to be performed based on one or more predetermined criteria; and generating a new tree graph by splitting a subset of the multiple data entries of the initial tree graph, wherein the subset of the multiple data entries includes all data entries of the initial tree graph corresponding to the same user identifier. In this example, additionally or alternatively, generating the new tree graph includes: storing the subset of the multiple data entries of the initial tree graph that has the corresponding media content identifier and does not have the corresponding user identifier. In this example, additionally or alternatively, the one or more predetermined criteria include a size threshold for the initial tree graph. In this example, additionally or alternatively, the same user identifier corresponding to the subset of the multiple data entries of the initial tree graph has the highest frequency of occurrence in the initial tree graph. In this example, additionally or alternatively, the one or more predetermined criteria include one or more thresholds for: the number of data entries in the initial tree graph corresponding to the same user identifier; or the rate at which received queries are used to update the graph database corresponding to the same user identifier. In this example, additionally or alternatively, the plurality of data entries in the initial tree graph have multiple different user identifiers, and the threshold for the rate of the received queries is higher than the query rate for approximately 80% of the multiple different user identifiers in the initial tree graph. In this example, additionally or alternatively, the initial tree graph includes a B+ tree or a Bw tree. In this example, additionally or alternatively, the graph database also includes hash tables with different user identifiers as keys, and the values ​​of the hash tables point to corresponding tree graphs storing data entries corresponding to the respective user identifiers.

[0063] Another example provides a method for storing user data from a social media platform using a graph database system. The method includes: in response to an event of a user performing an action on the social media platform, receiving a query for updating a graph database including an initial Bw tree graph, wherein the query includes a request to add a new data entry, the new data entry including: a user identifier corresponding to the user; and information describing the action performed by the user; determining a splitting event to be performed based on one or more predetermined criteria; generating a new Bw tree graph corresponding to the user identifier of the new data entry by splitting a portion of the initial Bw tree graph, wherein the portion includes the data entry of the initial Bw tree graph corresponding to the user identifier of the new data entry; and updating the new Bw tree graph according to the query. In this example, additionally or alternatively, the information describing the action performed by the user includes information instructing the user to perform a "like" operation on a video. In this example, additionally or alternatively, the one or more predetermined criteria include one or more of the following thresholds: the number of data entries in the initial Bw tree graph corresponding to the user identifier of the new data entry; or the rate of received queries for updating the graph database corresponding to the user identifier of the new data entry.

[0064] It will be understood that the configurations and / or methods described herein are exemplary in nature, and these specific embodiments or examples are not to be considered limiting, as many variations are possible. The specific routines or methods described herein may represent one or more of any number of processing strategies. Therefore, the various actions illustrated and / or described may be performed in the illustrated and / or described order, in another order, in parallel, or omitted. Similarly, the order of the above processes may be changed.

[0065] The subject matter of this disclosure includes all novel and non-obvious combinations and sub-combinations of the various processes, systems and configurations disclosed herein, as well as any and all equivalents thereof.

Claims

1. A computing system for implementing a graph database system, the computing system comprising: A processing circuit system and a memory for storing instructions, which, when executed, cause the processing circuit system to: A graph database is stored, including an initial tree graph that stores multiple data entries, each data entry including a corresponding field identifier; Receive queries to update the graph database, wherein the queries include requests to add new data entries; Based on one or more predetermined criteria, determine the splitting events to be executed; By splitting a subset of the plurality of data entries of the initial tree graph, a new tree graph corresponding to the field identifier of the new data entry is generated, wherein the subset of the plurality of data entries includes all data entries of the initial tree graph corresponding to the field identifier of the new data entry; and Update the new tree graph based on the query.

2. The computing system of claim 1, wherein the graph database further comprises a hash table with different field identifiers as keys, and the values ​​of the hash table point to corresponding tree graphs storing data entries corresponding to the respective field identifiers.

3. The computing system of claim 1, wherein the field identifier of the initial tree diagram includes a user identifier, and wherein each of the plurality of data entries of the initial tree diagram includes information describing the relationship between the corresponding user identifier and the corresponding media content identifier.

4. The computing system of claim 3, wherein generating the new tree graph comprises: The subset of data entries storing the initial tree graph that have corresponding media content identifiers but do not have corresponding user identifiers.

5. The computing system of claim 1, wherein the one or more predetermined criteria include one or more thresholds of the following: The number of data entries in the initial tree diagram corresponding to the field identifier of the new data entry; or The rate at which received queries are used to update the graph database, corresponding to the field identifier of the new data entry.

6. The computing system according to claim 5, wherein the field identifier includes a user identifier.

7. The computing system of claim 6, wherein the initial tree graph comprises data entries having a plurality of different user identifiers, and wherein the threshold for the rate of the received queries is higher than the rate of queries from user identifiers among the plurality of different user identifiers in the initial tree graph.

8. The computing system according to claim 1, wherein the initial tree graph comprises a B+ tree or a Bw tree.

9. The computing system of claim 1, wherein the instructions, when executed, further cause the processing circuitry to: Receive a second query to update the graph database, wherein the second query includes a request to add a second new data entry; Identify the second splitting event to be executed; By splitting out the second part of the initial tree graph, a second new tree graph is generated corresponding to the field identifier of the second new data entry, wherein the second part includes a second subset of the plurality of data entries of the initial tree graph corresponding to the field identifier of the second new data entry; as well as Update the second new tree graph based on the second query.

10. A method for implementing a graph database system, the method comprising: The storage includes a graph database containing an initial tree graph, which stores multiple data entries, each data entry including information describing the relationship between a corresponding user identifier and a corresponding media content identifier; Based on one or more predetermined criteria, determine the splitting events to be executed; A new tree graph is generated by splitting a subset of the plurality of data entries of the initial tree graph, wherein the subset of the plurality of data entries includes all data entries of the initial tree graph that correspond to the same user identifier.

11. The method of claim 10, wherein generating the new tree graph comprises: The subset of data entries storing the initial tree graph that have corresponding media content identifiers but do not have corresponding user identifiers.

12. The method of claim 10, wherein the one or more predetermined criteria include a size threshold for the initial tree graph.

13. The method of claim 12, wherein the same user identifier corresponding to the subset of the plurality of data entries of the initial tree graph has the highest frequency of occurrence in the initial tree graph.

14. The method of claim 10, wherein the one or more predetermined criteria include one or more thresholds of the following: The number of data entries in the initial tree diagram corresponding to the same user identifier; or The rate at which received queries are used to update the graph database, corresponding to the same user identifier.

15. The method of claim 14, wherein the plurality of data entries of the initial tree graph have a plurality of different user identifiers, and wherein the threshold of the rate of the received query is higher than the rate of query of user identifiers among the plurality of different user identifiers in the initial tree graph.

16. The method of claim 10, wherein the initial tree graph comprises a B+ tree or a Bw tree.

17. The method of claim 10, wherein the graph database further comprises a hash table having different user identifiers as keys, and the values ​​of the hash table pointing to corresponding tree graphs storing data entries corresponding to the respective user identifiers.

18. A method for storing user data of a social media platform using a graph database system, the method comprising: In response to an event where a user performs an action on a social media platform, a query is received to update the graph database, including the initial Bw tree graph, wherein the query includes a request to add new data entries, the new data entries comprising: The user identifier corresponding to the user; and Information describing the action performed by the user; Based on one or more predetermined criteria, determine the splitting events to be executed; By splitting a portion of the initial Bw tree diagram, a new Bw tree diagram corresponding to the user identifier of the new data entry is generated, wherein the portion includes the data entry of the initial Bw tree diagram corresponding to the user identifier of the new data entry; and Update the new Bw tree graph based on the query.

19. The method of claim 18, wherein the information describing the action performed by the user includes information instructing the user to perform a "like" operation on the video.

20. The method of claim 18, wherein the one or more predetermined criteria include one or more of the following thresholds: The number of data entries in the initial Bw tree diagram corresponding to the user identifier of the new data entry; or The rate at which received queries are used to update the graph database, corresponding to the user identifier of the new data entry.