A high performance in-memory based software method for querying graph data
By constructing a fixed-length storage structure and improving the graph traversal algorithm in the graph database, the performance bottleneck of graph database in complex relation queries is solved, achieving efficient memory utilization and improved query speed.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- BEIJING YIRUTUZHEN TECH CO LTD
- Filing Date
- 2023-01-31
- Publication Date
- 2026-06-16
AI Technical Summary
Existing graph databases suffer from query performance bottlenecks when processing complex relational data, especially in multi-level and diverse complex relationship queries and analyses. These bottlenecks include graph traversal performance, node/edge traversal status judgment, and memory usage issues.
We employ a high-performance memory-based query method. By constructing a fixed-length storage structure of node table, edge table, and string mapping table, and combining in-memory computing and external storage caching, we use an improved graph traversal algorithm to optimize memory space utilization and query performance.
It significantly improves graph traversal speed and query performance, reduces memory usage, and supports efficient querying of tens of millions or even hundreds of millions of graph data.
Smart Images

Figure CN116069983B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the technical field of computer applications and software engineering, specifically to a high-performance memory-based software method for querying graph data. Background Technology
[0002] In the concepts of graph databases and graph computing, a graph refers to a graphical representation of a given set of points and lines connecting those points. This type of graph is typically used to describe specific relationships between things and is also a primary subject of study in graph theory, a branch of mathematics. Graph data is a data structure based on nodes (points) and relationships (edges), suitable for representing topological relationships between data.
[0003] Graph databases use graph data as storage units, aiming for efficient storage and retrieval of graph data. They are more suitable for handling complex relational data and belong to the category of NoSQL databases. With the increasing demands for intelligence across industries, in addition to storing larger-scale data, the need for capabilities such as heterogeneous data, aggregation of new data, and complex relational analysis is also growing. Graph databases can effectively complement the shortcomings of traditional databases, better meeting market demands through a mutually reinforcing approach.
[0004] Graph databases differ significantly from traditional relational databases in their modeling methods. Graphs are modeled based on real-world entities and relationships, making them more direct and easier to understand; traditional relational databases require a higher level of abstraction and are more complex. Because graph databases use entities and relationships as their basic units, they are particularly suitable for querying and analyzing multi-level, diverse, and complex relationships. Relational databases, on the other hand, struggle with complex relationship queries, especially those involving multi-table joins or recursive queries, finding it difficult to respond promptly. Therefore, relational databases are no longer adequate for querying and analyzing massive amounts of multi-level, diverse, and complex relationships, while graph databases, thanks to their entity and relationship modeling design, are better suited for such needs.
[0005] Graph queries in graph databases can be mainly divided into three categories. First, subgraph queries, applicable to scenarios such as knowledge graphs. Second, path queries, applicable to scenarios such as risk control. Third, neighbor queries, applicable to scenarios such as social relationships.
[0006] Subgraph queries, path queries, and neighbor queries all share the common goal of expanding and exploring a graph, based on a single starting point or multiple starting points. A core issue encountered during graph expansion and exploration is exploration performance, which is also one of the bottlenecks in query performance. Taking a neighbor query from a single starting point as an example, when querying its 1-6 hop neighbors, it's necessary to first obtain its 1-hop neighbors based on the starting point, then obtain its 2-hop neighbors based on its 1-hop neighbors, and so on, until all 1-6 hop neighbors are obtained. Efficiently traversing relevant nodes during this process is key to improving exploration performance.
[0007] Furthermore, if a node is a 1-hop neighbor of the originating node, then that node should not be considered a more distant hop neighbor of the originating node. Efficiently determining whether a node has been traversed during the query process is a crucial factor in improving query performance.
[0008] In path querying, while exploring and expanding the graph, it's also necessary to record intermediate results and perform node / edge deduplication based on the business scenario. The purpose of recording intermediate results is to recombine the data and generate complete path data after the query is complete. The purpose of node / edge deduplication for intermediate results is to remove loops from the path based on the actual business scenario. Therefore, controlling the memory usage of intermediate results and efficiently performing deduplication are advisable methods to improve path query performance.
[0009] In summary, the present invention addresses three problems: first, improving graph traversal performance; second, efficiently determining the traversal status of nodes / edges; and third, controlling memory usage during graph queries. Summary of the Invention
[0010] To address the shortcomings of existing technologies, this invention provides a high-performance memory-based software method for querying graph data.
[0011] The present invention provides a high-performance memory-based software method for querying graph data, comprising:
[0012] A data structure for graph entities is constructed in memory. The data structure includes a node table structure, an edge table structure, and a string mapping table structure. The node table structure and the edge table structure are stored in a fixed-length storage space, and the string mapping table structure is stored in a variable-length storage space. A graph query request is obtained, and the address information of the storage location of the graph entity to be queried is determined according to the graph query request.
[0013] The graph entity to be queried is read from the fixed-length storage space based on the address information.
[0014] Preferably, the data structure for constructing graph entities in memory includes:
[0015] Obtain the first fixed-length storage space and the second fixed-length storage space from memory;
[0016] Store the node entities in the first fixed-length storage space in order, and store the edge entities in the second fixed-length storage space in order.
[0017] The length of the storage unit in the first fixed-length storage space is determined by the largest node entity among all node entities, and the length of the storage unit in the second fixed-length storage space is determined by the largest edge entity among all edge entities.
[0018] Preferably, the node entity includes a type and a slot, and the edge entity includes a type, an edge ID, an endpoint ID, and a slot, wherein the slot is used to continuously store attribute information of different types.
[0019] Preferably, determining the address information of the storage location of the entity to be queried in the graph based on the graph query request includes:
[0020] The query parameters are obtained according to the graph query request, and the query parameters include, but are not limited to, node ID, edge ID, and attribute conditions;
[0021] Based on the query parameters and the length of the storage unit in the fixed-length storage space, the offset address of the storage unit where the graph entity to be queried is located is calculated to obtain the address information.
[0022] Preferably, calculating the offset address of the storage unit containing the entity to be queried, based on the query parameters and the length of the storage unit in the fixed-length storage space, includes:
[0023] The starting address and offset number of the fixed-length storage space are determined based on the query parameters;
[0024] Based on the starting address, the number of offsets, and the length of the storage unit in the fixed-length storage space, calculate the offset address of the storage unit where the graph entity to be queried is located.
[0025] Preferably, obtaining query parameters based on the graph query request includes:
[0026] Obtain the key-value pair to be queried, retrieve the key value from the key-value pair to be queried based on the memory table or the sorted string table, and determine the query parameters corresponding to the value of the key value based on the key value; the memory table is a KV structure stored in memory, and the sorted string table is a KV structure stored on external storage.
[0027] The beneficial effects of this invention are reflected in the following: The software method for high-performance querying graph data based on memory provided by this invention significantly improves the graph traversal speed through memory-based computing, external storage caching acceleration, and improved graph traversal algorithm. Through a deeply customized memory structure, the graph data occupies less memory space, and the query performance of large amounts of data can be effectively improved with good memory usage. Attached Figure Description
[0028] To more clearly illustrate the specific embodiments of the present invention or the technical solutions in the prior art, the accompanying drawings used in the description of the specific embodiments or the prior art will be briefly introduced below. In all the drawings, similar elements or parts are generally identified by similar reference numerals. In the drawings, the elements or parts are not necessarily drawn to scale.
[0029] Figure 1 This is a schematic diagram of the node entity provided in an embodiment of the present invention;
[0030] Figure 2 This is a schematic diagram of the edge entity provided in an embodiment of the present invention;
[0031] Figure 3 This is a schematic diagram of the node table structure provided in an embodiment of the present invention;
[0032] Figure 4 This is a schematic diagram of the edge table structure provided in an embodiment of the present invention;
[0033] Figure 5 This is a schematic diagram of the slot structure provided in an embodiment of the present invention;
[0034] Figure 6 This is a flowchart illustrating a high-performance memory-based software method for querying graph data, as provided in an embodiment of the present invention. Detailed Implementation
[0035] The embodiments of the technical solution of the present invention will now be described in detail with reference to the accompanying drawings. These embodiments are merely illustrative of the technical solution of the present invention and are therefore intended to limit the scope of protection of the present invention.
[0036] It should be noted that, unless otherwise stated, the technical or scientific terms used in this application should have the ordinary meaning as understood by one of ordinary skill in the art to which this invention pertains.
[0037] Example
[0038] Modern computer CPU caches are typically divided into three levels: L1, L2, and L3, also known as Level 1, Level 2, and Level 3 caches. Their read / write speeds usually range from a few nanoseconds to tens of nanoseconds, and their storage space typically ranges from tens of KB to tens of MB. CPU cache read / write operations are usually handled by the operating system kernel, and it is difficult for user space to program it directly. Main memory, also known as internal memory or internal storage, is used to temporarily store data processed by the CPU and serves as a bridge between external storage and the CPU. Its read / write speeds typically range from tens of nanoseconds to hundreds of nanoseconds, and its storage space typically ranges from several GB to hundreds of GB. External storage, also known as secondary storage, refers to storage other than CPU cache and main memory. This type of storage generally retains data even after power is off. Common external storage options include solid-state drives (SSDs) and hard disk drives (HDDs). Their read / write speeds typically range from hundreds of nanoseconds to tens of milliseconds, and their storage space typically ranges from tens of GB to several TB. CPU cache is the closest to the CPU and the fastest of the three types of memory, but its storage space is small and it is difficult for user space to directly program it. Secondary storage has sufficient storage space, but its read / write speed is the slowest of the three. Internal memory (RAM) strikes a good balance between storage space and read / write speed compared to the other two. Therefore, implementation based on RAM is the core of this invention. Furthermore, although it is difficult for user space to directly program the CPU cache, this invention also incorporates RAM for highly optimized CPU cache hit rates, thereby improving both CPU processing speed and graph lookup performance.
[0039] In real-world scenarios, graph nodes and edges often carry attribute data. Assuming a 128GB memory environment, and each node with attribute data occupies 128 bytes, then 100 million nodes would require approximately 11GB of memory. Adding edge data, which occupies five times the memory of node data, the total memory usage would be around 66GB, leaving 62GB of free space.
[0040] Graph queries typically consume far less memory than the entire graph data. Assuming a graph query uses one-third the memory of the entire graph data, then, using the example above, a single graph query would have a peak memory usage of 88GB. Therefore, querying graph data of tens of millions of records is feasible on a server with 128GB of memory. Furthermore, modern computers support even higher memory capacities, such as 256GB. Thus, memory-based graph query solutions can support over 99% of graph data scales.
[0041] Based on the above description, embodiments of the present invention provide a high-performance memory-based software method for querying graph data, such as... Figure 1 As shown, the method includes:
[0042] A data structure for constructing graph entities in memory is provided, the data structure including a node table structure, an edge table structure, and a string mapping table structure. The node table structure and the edge table structure are stored in a fixed-length storage space, and the string mapping table structure is stored in a variable-length storage space.
[0043] It should be noted that the data structure for constructing graph entities in memory includes: obtaining a first fixed-length storage space and a second fixed-length storage space from memory; storing node entities in the first fixed-length storage space in order, and storing edge entities in the second fixed-length storage space in order; wherein, the length of the storage unit of the first fixed-length storage space is determined according to the largest node entity among all node entities, and the length of the storage unit of the second fixed-length storage space is determined according to the largest edge entity among all edge entities.
[0044] The above method can achieve memory alignment based on graph entities, improve memory read and write performance, and provide a foundation for lock-free concurrent query and read.
[0045] Furthermore, such as Figure 1 As shown, the node entity includes type and slot, such as Figure 2 As shown, the edge entity includes type, edge ID, endpoint ID, and slot, and the slot is used to continuously store attribute information of different types.
[0046] Computer memory space is divided into bytes. Accessing variables of a specific type typically occurs at specific memory addresses. This necessitates restrictions on the location of this data in memory. Different data types are arranged in space according to certain rules, rather than being placed sequentially one after another; this is called memory alignment. For example, in a C++ structure, there are two variables: one of type char, occupying 1 byte, and the other of type int, occupying 4 bytes. After alignment, they occupy a total of 8 bytes of memory. Without memory alignment, they would occupy only 5 bytes. In a database with tens or even hundreds of millions of records, this wasted memory would be a significant amount. Therefore, this invention proposes the concept of slots, such as... Figure 5 As shown, the necessary type and ID fields are removed from the graph entity, and all other attribute data is placed in a contiguous memory space (slot).
[0047] The above method can achieve non-memory alignment at the byte level, which can effectively save memory overhead.
[0048] Obtain a graph query request and determine the address information of the storage location of the graph entity to be queried based on the graph query request; read the graph entity to be queried from the fixed-length storage space based on the address information.
[0049] In this embodiment of the invention, determining the address information of the storage location of the graph entity to be queried according to the graph query request includes: obtaining query parameters according to the graph query request, the query parameters including but not limited to node ID, edge ID, and attribute conditions; calculating the offset address of the storage unit where the graph entity corresponding to the value in the key-value pair to be queried is located according to the query parameters and the length of the storage unit in the fixed-length storage space, and obtaining the address information.
[0050] In this embodiment of the invention, calculating the offset address of the storage unit where the graph entity to be queried is located, based on the query parameters and the length of the storage unit in the fixed-length storage space, includes: determining the starting address and the number of offsets of the fixed-length storage space based on the query parameters; and calculating the offset address of the storage unit where the graph entity to be queried is located based on the starting address, the number of offsets, and the length of the storage unit in the fixed-length storage space.
[0051] Since the data structure of the graph entities in this embodiment of the invention is stored in a fixed-length storage space, and the address of each graph entity in each fixed-length storage space is arranged in multiples relative to the fixed-length storage space, when it is necessary to read the graph entity, it can be obtained by the following method: graph entity address = first address of continuous space + number of offsets * length of storage unit in fixed-length storage space. There is no need to worry about resource conflicts, so it can be implemented without locks.
[0052] In this embodiment of the invention, obtaining query parameters according to the graph query request includes: obtaining a key-value pair to be queried, obtaining the key value in the key-value pair to be queried according to a memory table or a sorted string table, and determining the query parameters corresponding to the value of the key value according to the key value.
[0053] Specifically, the in-memory table is a key-value structure that resides in memory, used for fast in-memory retrieval; the sorted string table is an immutable, sorted, and persistent key-value structure; where both the key and value are arbitrary-byte strings, it provides operations for finding the value corresponding to a specific key, as well as operations for finding all key-value pairs within a given range. An index of the data is stored at the end of the sorted string table, and this index is written into memory when the end of the sorted string table is accessed, facilitating quick location of the data on disk.
[0054] In summary, this invention provides a high-performance memory-based software method for querying graph data. By accelerating the process through in-memory computing and external storage caching, and by using an improved graph traversal algorithm, the graph traversal speed is significantly improved. Through a deeply customized memory structure, the graph data occupies less memory space, effectively improving the query performance of massive amounts of data with good memory utilization.
[0055] To better understand the solutions of the embodiments of the present invention, the following will provide a more detailed explanation from four aspects: memory-based, depth-based memory, external storage caching, and graph traversal algorithms.
[0056] 1. Memory-based
[0057] The memory-based implementation mainly focuses on how to store large-scale graph data in memory and ensure sufficient memory space to support the computational overhead of graph queries. The specific implementation method is as follows:
[0058] Step 1: Define the numerical mapping of node / edge types in memory.
[0059] Step 2: Construct the necessary data structures for the graph entity in memory, including a node table structure for storing node data (e.g., ...). Figure 3 As shown), the edge table structure used to store edge data (e.g.) Figure 4 (as shown) and a string mapping table structure used to store the mapping relationship between numerical values and strings (such as...). Figure 5 (As shown).
[0060] Step 3: When adding, deleting, or modifying graph data, the data is written to disk while simultaneously storing necessary data in memory. There are two states when adding data to memory: First, if there is no available free space in the contiguous node / edge table structure, memory space is allocated, along with some additional space. After allocating the space, the data is written to memory. Second, if there is available free space in the contiguous node / edge table structure, the data is written directly to memory. This free space is either the space released by deleting node / edge data or the previously allocated extra space.
[0061] Step 4: During graph lookup, the corresponding graph entity data is retrieved from memory based on the directly passed node / edge ID, or the node / edge ID obtained from the passed index and attribute conditions. Because the table structure storing node / edge data in memory is contiguous, where the node ID in the node table is the offset of the node in contiguous memory, and the edge start point ID in the edge table is the offset of edges with the same start point in contiguous memory, the actual location of the target graph entity in memory can be quickly obtained using the formula: Graph entity address = Table structure start address + Offset.
[0062] Step 5: After obtaining the graph entity, perform the quick verification from step 1 based on the node / edge type mapping in the graph entity. If the type verification passes, and other attribute values need verification, prioritize retrieving the attribute value that needs verification through the slot. If the target attribute is not present in the slot, verify it using external memory cache and by retrieving the complete attributes of the graph entity from external memory.
[0063] Step 6: Construct detachable data structures for graph entities in memory, including an adjacency list structure for storing only node IDs and an island mapping table structure for storing node island data.
[0064] Step 7: Load the adjacency list structure based on the necessary data structures constructed in Step 2.
[0065] Step 8: Based on the necessary data structures constructed in Step 2, calculate and load the island mapping table using the strongly connected component algorithm.
[0066] Step 9: When performing queries without restrictions such as type or attribute, use an adjacency table structure that only stores node IDs.
[0067] Step 10: When performing connectivity queries, use the island mapping table to directly retrieve and compare islands; when performing path queries, first retrieve and compare islands using the island mapping table. If connectivity exists, perform a path query; otherwise, no path exists.
[0068] 2. Deeply customized memory structure
[0069] a. Memory compression
[0070] The main purpose of memory compression is to allow limited memory space to store more graph entity data, which is also a crucial foundation for this invention to support tens of millions or even hundreds of millions of data points. Its specific implementation is as follows:
[0071] Step 1: Construct the graph definition in memory, which includes information such as the type, attributes, and index of nodes / edges. Two fields will be used to represent the type of a node / edge: a complete string and a numerical mapping.
[0072] Step 2: Construct a string mapping table in memory (e.g., ... Figure 5 (As shown). The key is a numeric type, and the value is a string type. There is a one-to-one correspondence between the key and the value.
[0073] Step 3: Store data in the graph entities using a mapping method. Use numerical values to store their types, and obtain their specific types through the graph definition described in Step 1 when detailed types are needed. Use numerical values to store string type attributes, and obtain their string content through the string mapping table described in Step 2 when their actual content is needed.
[0074] Step 4: Simplify the graph entity structure in memory. For example... Figure 1 As shown, the node structure only stores numerical values representing the type and slots for storing attribute information. The node ID is represented by the node's offset in contiguous memory and is not stored separately. Similarly, as... Figure 2As shown, the starting point ID of an edge is not stored in the edge structure; its starting point ID is obtained by using the index of the array in which it is located.
[0075] b. Memory alignment based on graph entities
[0076] Memory alignment based on graph entities is fundamental to achieving lock-free concurrency, providing strong support for efficient graph queries. Its specific implementation is as follows:
[0077] Step 1: Define node entities in memory (e.g., Figure 1 (As shown).
[0078] Step 2, as follows Figure 3 As shown, a contiguous memory space, i.e., a one-dimensional array, is allocated for storing the nodes in memory. The size of this space is determined by the graph data size and the number of node entities in step 1, ensuring that it can store all node entities, i.e., the space is greater than or equal to the sum of all node data.
[0079] Step 3: Arrange the graph data in the corresponding positions according to the node entity size defined in Step 1, with the rule that the node ID and the index of its position are the same.
[0080] Step 4: Define edge entities in memory (e.g., Figure 2 (As shown).
[0081] Step 5, as follows Figure 4 As shown, multiple contiguous and interconnected memory spaces, i.e., two-dimensional arrays, are allocated for storing the edges in memory. The size of these spaces is determined by the graph data size and the edge entities in step 4, ensuring that all edge entities can be stored, i.e., the space is greater than or equal to the sum of all edge data.
[0082] Step 6: Arrange the graph data in the corresponding positions according to the edge entity size defined in Step 4. The rule is that the subscript of the outer dimension array represents the starting point ID, and the inner dimension array it points to contains all edge data starting from that point.
[0083] c. Byte-level non-memory alignment
[0084] Byte-level non-memory alignment is a product of the trade-off between speed and memory usage, and it greatly improves memory utilization. At the same time, minimizing its performance overhead is also a highlight of this invention. Its specific implementation is as follows:
[0085] Step 1: Define the slot size and layout (e.g., ...) in memory for each type of graph entity. Figure 5 (As shown).
[0086] Step 2: When defining graph entities, explicitly define them as non-memory aligned. Normally, they are defined as memory aligned by default, for example, in C++ with `#pragmapack(0)`.
[0087] Step 3: Remove necessary fields such as type from the graph entity and store the other attribute data in a contiguous and controllable memory space, i.e., slots.
[0088] Step 4: Based on the definition in Step 1, store the attribute values of different types in the slots according to byte order.
[0089] Step 5: When reading the attributes in the slot, obtain their position in the slot according to the definition in Step 1. If it is a basic type such as a number, it can be used directly. If it is a string type, the actual string content can be obtained through the string mapping table (e.g., ...). Figure 5 (As shown).
[0090] 3. External storage cache
[0091] Typically, after a graph query is completed, it's necessary to assemble the complete query results for the requesting client, including but not limited to a reasonable data structure and comprehensive graph entity data. External storage caching is an effective way to quickly retrieve complete data. Its specific implementation is as follows:
[0092] Step 1: Create a memory table in memory to temporarily store data.
[0093] Step 2: When adding, deleting, or modifying graph data, first operate on the in-memory table.
[0094] Step 3: When the size of the memory table in memory exceeds a certain limit, the data in the memory table is written to the sorted string table on external storage in the background.
[0095] Step 4: When querying graph data, the system first searches the memory table.
[0096] Step 5: If the target data does not exist in the memory table, perform a binary search on the index table of the sorted string table in memory.
[0097] Step 6: If the index table for the sorted string table is not in memory, it is read from disk and stored in memory. If there is enough memory, the sorted string table is mapped into memory.
[0098] Step 7: If the sort string table is not in memory, read the data from the disk according to the location pointed to by the index table.
[0099] 4. Graph Traversal Algorithm
[0100] The graph traversal algorithm is the core of graph query. In this invention, the graph traversal algorithm is improved for two query methods: class neighbor query and class path query.
[0101] The specific implementation of the improved traversal algorithm for path-like lookup is as follows:
[0102] Step 1: Define the data structure of the intermediate results, so that the structure only contains the direction, type, edge ID and end point ID of the edge, and the start point ID can be obtained from the predecessor edge.
[0103] Step 2: Create two corresponding instances in memory based on the data structure of the intermediate results. One instance is used to store the intermediate results to be traversed; the other instance is used to store the intermediate results that meet the expectations.
[0104] Step 3: Enable multi-threaded mode and use breadth-first search to find the path data of the first two hops of the starting point. Data that meets the parameter constraints is stored in the intermediate results to be traversed.
[0105] Step 4: Based on the path data obtained in step 3, perform a depth-first search in multi-threaded mode.
[0106] Step 5: During the depth-first search, if a path that meets the parameter constraints is encountered, it is saved in the expected intermediate results. When encountering node / edge deduplication constraints, the IDs in memory are directly compared.
[0107] Step 6: After the depth-first search ends due to parameter limitations, the expected intermediate results are extracted, aligned, and encapsulated. At this point, the query is complete.
[0108] The specific implementation of the improved traversal algorithm for neighbor-like lookup is as follows:
[0109] Step 1: Before traversing the graph data, allocate a contiguous block of memory with an array as the data structure. The size of the array is equal to the number of nodes in the graph data, and it is used to store the traversal state of each node.
[0110] Step 2: Find the starting point in the array and mark it as the starting point state.
[0111] Step 3: Locate the starting point in the edge table structure and obtain all its associated edge data.
[0112] Step 4: Obtain the associated node ID based on the acquired edge data and mark it as 1 in the array, which means 1-hop neighbor.
[0113] Step 5: Traverse the 1-hop neighbors in the array and obtain the 1-hop neighbors of the 1-hop neighbors, i.e. the 2-hop neighbors of the starting point, according to the logic similar to steps 3 and 4, and then mark them as 2.
[0114] Step 6: Following a logic similar to steps 3, 4, and 5, obtain N hop neighbors sequentially until the constraint conditions are met and the process terminates.
[0115] Step 7: Traverse the data in the contiguous memory space allocated in Step 1, read the neighbor nodes with different hop counts according to their marked content and encapsulate them. At this point, the query ends.
[0116] Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention, and not to limit them. Although the present invention has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that modifications can still be made to the technical solutions described in the foregoing embodiments, or equivalent substitutions can be made to some or all of the technical features therein. Such modifications or substitutions do not cause the essence of the corresponding technical solutions to deviate from the scope of the technical solutions of the embodiments of the present invention, and they should all be covered within the scope of the claims and specification of the present invention.
Claims
1. A high-performance memory-based software method for querying graph data, characterized in that, include: A data structure for constructing graph entities in memory is provided, the data structure including a node table structure, an edge table structure, and a string mapping table structure. The node table structure and the edge table structure are stored in a fixed-length storage space, and the string mapping table structure is stored in a variable-length storage space. Obtain a graph query request, determine the address information of the storage location of the graph entity to be queried based on the graph query request, and read the graph entity to be queried from the fixed-length storage space based on the address information; The data structures for constructing graph entities in memory include: Obtain the first fixed-length storage space and the second fixed-length storage space from memory; Store the node entities in the first fixed-length storage space in order, and store the edge entities in the second fixed-length storage space in order. The length of the storage unit in the first fixed-length storage space is determined by the largest node entity among all node entities, and the length of the storage unit in the second fixed-length storage space is determined by the largest edge entity among all edge entities. Memory alignment is achieved on a graph entity basis, providing a foundation for lock-free concurrent query and read operations. The node entity includes a type and a slot, and the edge entity includes a type, edge ID, endpoint ID, and a slot. The slot is a contiguous memory space in the graph entity where all attribute data except for the type and ID fields are placed, achieving non-memory alignment on a byte basis. The slot contains two types of attribute values: basic types including numeric values and string types. If it is a basic type, it is used directly; if it is a string type, the actual string content is obtained through a string mapping table.
2. The software method for high-performance memory-based query graph data according to claim 1, characterized in that, The address information for determining the storage location of the entity to be queried in the graph based on the graph query request includes: The query parameters are obtained according to the graph query request, and the query parameters include, but are not limited to, node ID, edge ID, and attribute conditions; Based on the query parameters and the length of the storage unit in the fixed-length storage space, the offset address of the storage unit where the graph entity to be queried is located is calculated to obtain the address information.
3. The software method for high-performance memory-based query graph data according to claim 2, characterized in that, Based on the query parameters and the length of the storage unit in the fixed-length storage space, the offset address of the storage unit where the graph entity to be queried is located is calculated, including: The starting address and offset number of the fixed-length storage space are determined based on the query parameters; Based on the starting address, the number of offsets, and the length of the storage unit in the fixed-length storage space, calculate the offset address of the storage unit where the graph entity to be queried is located.
4. The software method for high-performance memory-based query graph data according to claim 3, characterized in that, The query parameters obtained from the graph query request include: Obtain the key-value pair to be queried, retrieve the key value from the key-value pair to be queried based on the memory table or the sorted string table, and determine the query parameters corresponding to the value of the key value based on the key value; the memory table is a KV structure stored in memory, and the sorted string table is a KV structure stored on external storage.