System and computer-implemented method for graph node sampling
By introducing first and second data structures into the solid-state drive, the attribute data of graph nodes can be directly accessed and loaded, solving the problem of wasted node sampling bandwidth in graph neural networks and achieving more efficient utilization of computing resources and energy saving.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- ALIBABA INNOVATION PRIVATE LIMITED
- Filing Date
- 2021-04-26
- Publication Date
- 2026-06-16
AI Technical Summary
Existing technologies use coarse-grained SSDs for node sampling in graph neural networks, resulting in significant bandwidth waste, inefficient use of computing resources, and increased energy consumption.
The first and second data structures are used to store node attributes and neighbor node address information respectively. The required attribute data can be directly accessed and loaded through the sampling unit on the solid-state drive, reducing unnecessary data transmission.
It reduces bandwidth and computing resource consumption, lowers energy consumption, and improves the efficiency of the computing system.
Smart Images

Figure CN115249057B_ABST
Abstract
Description
Technical Field
[0001] This disclosure relates to the field of computers, and more particularly, to a system for sampling graphics nodes, a non-transitory computer-readable medium, and a method implemented by a computer. Background Technology
[0002] A graph is a type of data structure or database that is stored and executed by a computing system and used to model a set of objects and the connections (relationships) between them. Each object is represented as a node (or vertices) connected or linked by edges in the graph. The properties or attributes of an object are associated with the nodes used to represent that object.
[0003] Graphs can be used to identify dependencies, clusters, similarities, matches, categories, flows, costs, centralities, and more in large datasets. Graphs are used in a wide range of applications, including but not limited to graph analytics and graph neural networks (GNNs), and more specifically, applications such as online shopping engines, social networks, recommendation engines, mapping engines, fault analysis, network management, and search engines. Graphs allow for faster retrieval and navigation of complex hierarchical structures in relational systems that are difficult to model.
[0004] Graph data typically includes node structure data and attribute data. Node structure data may include, for example, information for identifying nodes (e.g., the root node) and information for identifying neighboring nodes that are the root node (e.g., neighboring nodes can be linked to the root node by a single edge). Attribute data may include features or attributes of an object that are associated with the nodes used to represent that object. For example, if the object represents a person, its features might include the person's age and gender; in this case, the attribute data would include values representing age and values representing gender.
[0005] Graphics data can be stored in one or more Solid State Drives (SSDs), which are connected to the main memory. Node structure data is typically stored separately from attribute data because: attribute data may be the same for multiple nodes, so attribute data can be referenced by different nodes, avoiding redundancy and thus consuming less storage space.
[0006] Data is stored in SSDs in page format. Figure 1Examples of a conventional structure page 100 storing structural information and a conventional attribute page 110 storing attribute values are shown. In this example, structure page 100 includes a first entry that includes a node identifier (ID) of the root node V1 and a node identifier of a neighboring node V2, and is adapted to identify attribute attr2 of node V1 and attribute attr1 of node V2. In this example, structure page 100 also includes a second entry that includes the node identifier of the root node V2 and the node identifier of the neighboring node V1, and is adapted to identify attribute attr1 and attribute attr2 of node V2. Structure page 100 may include more than two entries, and for each node, each entry may include more than one neighboring node.
[0007] exist Figure 1 In the example shown, attribute page 110 includes multiple entries containing data (values) used to characterize attributes attr1 and attr2. Attribute page 110 may include more than two entries. That is, attribute page 110 may include attribute data for nodes other than nodes V1 and V2.
[0008] Figure 2 A typical procedure 200 for sampling nodes in a GNN is shown. In step 202, a structure page containing the sampled nodes is retrieved from the SSD. Figure 2 In the example shown, node V1 is sampled, so the entire structure page 100 is retrieved (read), and all the structure information in that structure page is transferred from the SSD to the main memory.
[0009] In step 204, the structural information is decoded. In step 206, an attribute page, containing the attributes of the sampled nodes, is read from the SSD. Figure 2 In the example shown, the entire property page 110 is read, and all property data in that property page is transferred from the SSD to main memory. In step 208, the property data is processed.
[0010] A typical page size is 16 kilobytes (KB). However, the size of the structure data and attribute data of each node is much smaller than the page size. The size of the node's structure data can be as small as one byte per entry, and the size of the node's attribute data can range from 10 to 400 bytes per entry.
[0011] like Figure 1In the example shown, each page can have multiple entries corresponding to multiple nodes, and when reading data stored in the page, the entire page is read. Therefore, the SSD's granularity is too coarse to achieve efficient GNN sampling. More specifically, when sampling a node, not all the data in the page may be needed, but all data (the entire page) is read (even data from nodes unrelated to the sampled node) and transferred to main memory, even if only a portion of the page's data is required. Therefore, bandwidth is wasted on transferring unwanted node structure and attribute data from the SSD to main memory.
[0012] In applications like GNNs, graphs can contain billions of nodes. Therefore, for large-scale applications like GNNs, the amount of bandwidth wasted can be substantial. Reducing the bandwidth consumed by GNN sampling would be beneficial, allowing for more efficient use of computing resources and requiring fewer resources, thus lowering costs and energy consumption. Summary of the Invention
[0013] Embodiments of this disclosure provide solutions to the problems described above. In general, the methods and systems (e.g., computer systems) introduced according to embodiments of this disclosure reduce the bandwidth consumed by sampling nodes in large applications such as graphical neural networks.
[0014] In some embodiments, information is stored in a first set of data structures (which may be referred to herein as a first set of data structures) and a second set of data structures (which may be referred to herein as a second set of data structures). The information stored in each first data structure includes: the values of attributes of one or more nodes in the graph, and information for locating data for each attribute in each first data structure. The information stored in each second data structure includes: addresses of attributes of a subset of nodes (e.g., a subset of nodes may include a particular node and its neighboring nodes), and information for locating the attributes (attributes of the subset of nodes) addressed by the addresses in the second data structure in the first data structure.
[0015] More specifically, in some embodiments, the first data structure is a page in a solid-state drive (SSD), and the address in the second data structure is a page number. In response to a command to sample a specific node (which may be referred to as the root node), the following operations are performed: accessing the second data structure associated with the root node to identify the corresponding page number, which stores the attributes of the root node and the attributes of neighboring nodes that are the root node in the graph; loading the page identified by the corresponding page number; reading attributes from the loaded page; and transferring the attributes read from the page to main memory. In embodiments employing an SSD, these operations are performed by the SSD. In some embodiments, these operations are performed by sampling units within the SSD.
[0016] According to a first aspect of this disclosure, a computer-implemented method is provided, comprising: storing first information associated with a graph in a plurality of first data structures in a computer system memory, wherein the graph includes a plurality of nodes, and the first information includes: attribute value data representing values of attributes of one or more nodes in the graph, and information for locating the attribute value data of each attribute in the first data structures; and storing second information associated with nodes in the graph in a plurality of second data structures in the computer system memory, wherein the second information includes: addresses of attributes of a subset of nodes of the plurality of nodes, and information for locating the attribute addressed by the addresses in the first data structures.
[0017] In some embodiments, each of the second data structures is associated with a corresponding root node among the plurality of nodes, and the subset of nodes includes: the corresponding root node, and neighboring nodes in the graph that are adjacent to the corresponding root node.
[0018] In some embodiments, the computer system memory includes a solid-state drive.
[0019] In some embodiments, the method further includes: receiving a command for sampling a specific node among the plurality of nodes; accessing a second data structure among the plurality of second data structures associated with the specific node; and accessing one or more first data structures among the plurality of first data structures based on an address in the second data structure associated with the specific node.
[0020] In some embodiments, the command includes an identifier that identifies the particular node, a sampling method, and the number of neighboring nodes of the particular node to be sampled.
[0021] In some embodiments, the computer system memory includes a solid-state drive, the first data structure includes pages in the solid-state drive, and the address in the second data structure includes a page number. The method further includes: receiving a command to sample a specific node among the plurality of nodes; accessing a second data structure associated with the specific node among the plurality of second data structures to identify a corresponding page number, the corresponding page number being used to store attributes of the specific node and attributes of neighboring nodes of the specific node in the graph; loading the page identified by the corresponding page number; reading attribute values from the loaded page; and transferring the attribute values read from the page to the main memory of the computer system coupled to the solid-state drive.
[0022] In some embodiments, the command includes an identifier for the specific node, a sampling method, and the number of neighboring nodes of the specific node to be sampled.
[0023] In some embodiments, the first information may further include an identifier for each of the attributes.
[0024] According to a second aspect of this disclosure, a system for sampling graph nodes is provided, comprising: a processor; and a memory coupled to the processor. The memory stores: a plurality of first data structures for storing first information associated with a graph, wherein the graph includes a plurality of nodes, and the first information in each of the first data structures includes: attribute value data representing the value of an attribute of one or more nodes in the graph, and information for locating the attribute value data of each attribute within the first data structure; and a plurality of second data structures for storing second information associated with nodes in the graph. The second information includes: addresses of attributes of a subset of nodes of the plurality of nodes, and information for locating the attribute addressed by the addresses within the first data structures.
[0025] In some embodiments, each of the second data structures is associated with a corresponding root node among the plurality of nodes, and the subset of nodes includes: the corresponding root node, and neighboring nodes in the graph that are adjacent to the corresponding root node.
[0026] In some embodiments, the system further includes a controller for performing a node sampling process in response to a command to sample a specific node among the plurality of nodes to perform: accessing a second data structure associated with the specific node among the plurality of second data structures; and reading attributes of the specific node and attributes of the specific node’s neighboring nodes in the graph from one or more first data structures based on addresses in the second data structure associated with the specific node.
[0027] In some embodiments, the memory includes main memory and a solid-state drive.
[0028] In some embodiments, the first data structure includes pages in the solid-state drive, and the address in the second data structure includes a page number. The solid-state drive performs a node sampling process in response to a command to sample a specific node among the plurality of nodes, to perform: accessing a second data structure among the plurality of second data structures to identify a corresponding page number, the corresponding page number being used to store attributes of the specific node and attributes of neighboring nodes of the specific node in the graph; loading the page identified by the responding page number; reading the value of the attribute from the loaded page; and transferring the value of the attribute read from the page to main memory.
[0029] In some embodiments, the solid-state drive includes a sampling unit that receives the command and performs the node sampling process.
[0030] In some embodiments, the solid-state drive includes a software application programming interface (API), and the command includes parameters, including an identifier for the specific node, a sampling method, and the number of neighboring nodes of the specific node to be sampled. When the API is invoked, the parameters are written to the registers of the solid-state drive.
[0031] In some embodiments, the first information may further include an identifier for each of the attributes.
[0032] According to a third aspect of this disclosure, a computer-implemented method is also provided, comprising: receiving a command for sampling a specific node among a plurality of nodes, wherein a graph includes the plurality of nodes; and in response to the command, performing: accessing a second data structure in computer system memory, wherein the second data structure stores second information, the second information including: an address of a first data structure in the computer system memory, the first data structure including attributes of the specific node and attributes of other nodes, the other nodes being neighbor nodes of the specific node in the graph, and location information for locating the attributes of the specific node and the other nodes in the first data structure addressed by the address; and accessing the first data structure, the first data structure storing first information associated with the graph, each of the first data structures storing: attribute value data representing the values of one or more attributes of the specific node and the other nodes, and information for locating the attributes in each of the first data structures based on the location information in the second data structure.
[0033] In some embodiments, the other nodes are the neighboring nodes of the node.
[0034] In some embodiments, the command includes an identifier that identifies the particular node, a sampling method, and the number of neighboring nodes of the particular node to be sampled.
[0035] In some embodiments, the computer system memory includes a solid-state drive and main memory, the first data structure includes pages in the solid-state drive, and the address in the second data structure includes a page number; accessing the second data structure includes: identifying a corresponding page number, the corresponding page number being used to store attributes of the specific node and attributes of the other nodes; accessing one or more first data structures includes: loading a page identified by the corresponding page number, and reading the value of an attribute from the loaded page; the method further includes transferring the value of the attribute read from the page to the main memory.
[0036] According to a fourth aspect of this disclosure, a non-transitory computer-readable medium is also provided, which stores a set of instructions executable by one or more processors of the device to cause the device to initiate any method according to this disclosure.
[0037] According to embodiments of this disclosure, the processing of unnecessary structural and attribute data is eliminated when sampling nodes, thereby reducing the consumption of bandwidth and other computing resources (e.g., memory). The sampled attribute data can be directly accessed from main memory, which reduces latency associated with processing operations on the host computer system. Energy consumption is also reduced because the processor and memory on the solid-state drive require less power than the central processing unit of the host computer system. In general, embodiments of this disclosure can provide functional improvements to computing systems.
[0038] Those skilled in the art will recognize the above-mentioned objects, other objects, and advantages of the various embodiments of the present invention after reading the following detailed description of the embodiments illustrated in the accompanying drawings. Attached Figure Description
[0039] The accompanying drawings, which form part of this specification, depict the same / similar elements and illustrate some embodiments of the present disclosure, and together with the detailed description, serve to explain the principles of the present disclosure.
[0040] Figure 1 Examples of regular property pages and regular node structure pages used to store information about nodes in a graph are shown;
[0041] Figure 2This illustrates the standard procedure for sampling nodes in a graph;
[0042] Figure 3 A block diagram illustrating an example of a system that can implement embodiments of the present disclosure is shown;
[0043] Figure 4 An example of a data structure including node structure information is shown in an embodiment according to this disclosure;
[0044] Figure 5 An example of a data structure including attribute data corresponding to nodes is shown in an embodiment according to this disclosure;
[0045] Figure 6 This is a flowchart illustrating an example of a method for sampling nodes according to embodiments of this disclosure;
[0046] Figure 7 This is a flowchart illustrating an example of a method for sampling nodes according to embodiments of this disclosure. Detailed Implementation
[0047] Reference will now be made in detail to various embodiments of this disclosure, examples of which are illustrated in the accompanying drawings. Although described in conjunction with these embodiments, it should be understood that they are not intended to limit this disclosure to these embodiments. Rather, this disclosure is intended to cover alternatives, modifications, and equivalents that may be included within the spirit and scope of this disclosure as defined by the appended claims. Furthermore, numerous specific details are set forth in the following detailed description of this disclosure in order to provide a thorough understanding of the disclosure. However, it should be understood that this disclosure may be practiced without these specific details. On the other hand, well-known methods, processes, components, and circuits have not been described in detail to avoid unnecessarily obscuring aspects of this disclosure.
[0048] Certain parts described in detail below are presented in terms of procedures, logic blocks, processes, and other symbolic representations of operations performed on data bits within computer memory. These descriptions and representations are means used by those skilled in the art of data processing to most effectively convey the substance of their work to others skilled in the art. In this application, procedures, logic blocks, processes, etc., are conceived as a self-consistent series of steps or instructions leading to a desired result. These steps are physical operations utilizing physical quantities. Typically, though not always, these quantities take the form of electrical or magnetic signals that can be stored, transmitted, combined, compared, and otherwise manipulated in a computing system. It is sometimes convenient to refer to these signals as transactions, bits, values, elements, symbols, characters, samples, pixels, etc., primarily for general reasons.
[0049] However, it should be remembered that all these and similar terms should be associated with appropriate physical quantities and are merely convenient notations applied to those quantities. Unless otherwise obviously stated in the following discussion, it should be understood that throughout this disclosure, discussions using terms such as “access,” “store,” “sample,” “send,” “write,” “read,” “transmit / transfer,” “receive,” “load,” etc., refer to a device or computing system or similar electronic computing device or system (e.g., Figure 3 The actions and processing of the system shown (e.g., Figure 6 and 7 (The method shown). A computing system or similar electronic computing device operates and transforms data represented as physical (electrical) quantities within a memory, register, or other such device used for information storage, transmission, or display.
[0050] Some of the elements or embodiments described herein can be discussed in the general context of computer-executable instructions embodied on some form of computer-readable storage medium (e.g., a program module) that is executed by one or more computers or other devices. By way of example and not limitation, a computer-readable storage medium may include non-transitory computer storage media and communication media. Typically, a program module includes routines, programs, objects, components, data structures, etc., for performing a particular task or implementing a particular abstract data type. In various embodiments, the functionality of a program module may be combined or distributed as needed.
[0051] Computer storage media include volatile and non-volatile, removable and non-removable media implemented in any manner or technology used for storing information (such as computer-readable instructions, data structures, program modules or other data). Computer storage media include, but are not limited to, Double Data Rate (DDR) memory, Random Access Memory (RAM), Static Random Access Memory (SRAM), or Dynamic Random Access Memory (DRAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), Flash Memory (such as SSD) or other memory technologies, CompactDisk Read Only Memory (CD-ROM), Digital Versatile Disk (DVD) or other optical storage, Magnetic Cassette, Magnetic Tape, Magnetic Disk Storage or other magnetic storage devices, or any other medium that can be used to store desired information and access and retrieve that information.
[0052] Communication media can embody computer-executable instructions, data structures, and program modules, and includes any medium for transmitting information. By way of example and not limitation, communication media includes wired media such as wired networks or direct wired connections, and wireless media such as acoustic, radio frequency (RF), infrared, and other wireless media. Any combination of the above may also be included within the scope of computer-readable media.
[0053] Figure 3 A block diagram illustrating an example of a system 300 (e.g., a computing system) on which embodiments of the present disclosure may be implemented. System 300 may include elements or components other than those shown and described below, and these elements or components may be coupled in the manner shown or in different ways. Some blocks in the example system 300 are described in relation to the functions they perform. Although described and shown as separate blocks, the invention is not limited thereto; that is, for example, combinations of these blocks / functions may be integrated into a single block for performing multiple functions.
[0054] In one embodiment, system 300 is a method for implementing the methods disclosed herein (e.g., Figure 6 and 7 An example of a system (showing the method). In Figure 3 In this example, system 300 includes a central processing unit (CPU) 301, main memory 303, and a solid state drive (SSD) 305. For example, main memory 303 may be DRAM. SSD 305 also includes a RAM buffer 314.
[0055] SSD 305 includes multiple storage elements, specifically multiple dies or chips 318a-318n for storing data. In one embodiment, dies 318a-318n are NAND dies, therefore SSD 305 may be referred to as a NAND flash memory device, and the multiple dies may be referred to as a NAND flash memory package 318. The NAND flash memory package 318 is used to store data including node structure data (see [link to documentation]). Figure 4 ) and attribute data (see Figure 5 The data page.
[0056] Figure 3 The SSD 305 also includes an SSD controller 307, which includes a processor 310 and a flash memory controller 316. Importantly, in contrast to conventional SSD controllers, the SSD controller 307 also includes a sampling unit 312 located in the read path. Figure 3 In the illustrated embodiment, the sampling unit 312 is a hardware unit within the SSD controller 307; however, the invention is not limited thereto. Typically, the sampling unit 312 is used to support or process the sampling of nodes in a graph of a graph of a graph neural network (GNN).
[0057] Processor 310 receives commands from CPU 301 and passes these commands to flash controller 316 and sampling unit 312. In some embodiments, dedicated sampling commands are used for node sampling. Each sampling command includes parameters including an identifier (or simply ID) for the node being sampled (which may be referred to as the node of interest or the root node), and also includes information for identifying the sampling method and the number of neighboring nodes of the root node to be sampled. Neighboring nodes of the root node are nodes separated from the root node by a defined number of edges (e.g., a single edge). Sampling methods include, for example, random sampling and weighted sampling.
[0058] In some embodiments, the SSD controller 307 includes an application programming interface (API) 308. The processor 310 transmits sampling commands to the sampling unit 312 and writes command parameters to the appropriate registers when the API 308 is invoked.
[0059] Figure 4 An example of a data structure 400 including structural information (data) of nodes (root nodes) in an embodiment according to this disclosure is shown. Data structure 400 may also be referred to herein as a node data structure or a second data structure.
[0060] Similar to data structure 400, the node data structure is associated with each node of the GNN graph. In other words, each node in the graph can be identified by a corresponding node ID, and the node ID can be used to identify and access the data structure 400 associated with that node. In the SSD embodiment, the node data structure 400 is a NAND flash memory package 318 (… Figure 3 (Pages in )
[0061] exist Figure 4 In the illustrated embodiment, the node data structure 400 includes: the address of an attribute associated with the root node (e.g., attribute addr1), and the addresses of attributes associated with neighboring nodes of the root node. The node data structure 400 also includes location information that can be used to locate the value (which can be represented by the attribute value data) of each attribute in the attribute data structure (used to store attribute value data). The location information can be the ID (e.g., Attrid1) of each attribute and / or the offset value of each attribute. The node data structure 400 may also include other information associated with GNN sampling, such as (but not limited to) node degree and edge weight.
[0062] In some embodiments, attribute data is stored in a format similar to Figure 5 The data structure 500 shown is part of the data structure of data structure 500. In the embodiment using an SSD, data structure 500 is a NAND flash memory package 318 ( Figure 3 The page number is one of the pages in the data structure 400, and the address in the data structure 400 (e.g., attribute addr1) is the page number.
[0063] Figure 5 An example of a data structure 500 including data representing node attributes in an embodiment according to this disclosure is shown. Figure 5In the illustrated embodiment, the attribute data structure 500 includes attribute IDs (e.g., Id1) of attributes stored in the data structure, corresponding offset values (e.g., Offset1) associated with each of these attribute IDs, and a value (data, e.g., data1) for each of these attribute IDs.
[0064] Return to reference Figure 3 The sampling command is used to identify the node (root node) to be sampled. In response to the sampling command, the sampling unit 312 retrieves data from the node data structure 400 of that node. Figure 4 Access or read (get) the structure data of the node.
[0065] More specifically, in some embodiments, the node structure data 400 of the root node includes page numbers, wherein the attribute data of the root node is stored in the NAND die 318. Figure 3 The sampling unit 312 obtains these page numbers from the node structure page, and the storage location corresponds to these page numbers.
[0066] The root node's structural data can also identify its neighboring nodes. Therefore, the sampling unit 312 can also obtain page numbers corresponding to the attribute data of the neighboring nodes, which correspond to the storage location of the neighboring node's attribute data within the NAND flash memory package 318.
[0067] Then, the sampling unit 312 controls the flash memory controller 316, causing the flash memory controller 316 to load the pages of attribute data identified by these page numbers. Figure 5 An example of a data structure including attribute data is shown.
[0068] Return to reference Figure 3 Pages of attribute data (including attribute data of the sampled neighbor nodes and attribute data of the root node) are loaded into RAM buffer 314 via flash controller 316. These pages may also include attribute data of nodes other than the root node and sampled neighbor nodes. In any case, sampling unit 312 then selects, reads, and collects only the attribute data of the root node and sampled neighbor nodes, and outputs the attribute data to main memory 303. In some embodiments, sampling unit 312 may output the selected and collected attribute data to RAM buffer 314, which is then written to main memory 303.
[0069] Compared to conventional methods and systems, embodiments according to this disclosure consume less bandwidth and storage space. More specifically, by performing sampling in SSD 305 (in sampling unit 312), the amount of data transferred to main memory 303 is reduced, thereby reducing the consumption of storage space and bandwidth. In some cases, up to 50% bandwidth can be saved.
[0070] Furthermore, since the attribute data in main memory 303 consists only of those attributes associated with the sampled node and its neighboring nodes, this data can be directly accessed without additional processing by CPU 301. This reduces the latency associated with the additional processing. Also, since the processing and storage components in SSD 305 consume less power than CPU 301, the power consumption of system 300 is also reduced. Overall, embodiments of this disclosure provide improvements in computing system functionality.
[0071] Figure 6 This is a flowchart 600, an example of a method for sampling nodes according to embodiments of the present disclosure. All or some of the operations represented by the boxes in flowchart 600 can be implemented as computer-executable instructions residing on some form of non-transitory computer-readable storage medium, and can be generated by, for example... Figure 3 The system 300 shown is executing.
[0072] exist Figure 6 In box 602 shown, information is stored in a first set of data structures (which may be referred to here as a first set of data structures or attribute data structures; for example, Figure 5 The data structure shown is 500. The information stored in each first data structure includes: the values of attributes of one or more nodes in the graph, and information for locating the data of each attribute in each first data structure.
[0073] exist Figure 6 In box 604 shown, information is stored in a second set of data structures (which may be referred to here as a second set of data structures or node data structures; for example, Figure 4 The information stored in each second data structure (400) includes: addresses of attributes of a subset of nodes (e.g., the subset may include a specific node and its neighboring nodes), and information for locating the attributes (attributes of the node subset) addressed by the addresses in the second data structure within the first data structure.
[0074] Figure 7 This is a flowchart 700, an example of a method for sampling nodes according to embodiments of this disclosure. All or some of the operations represented by the boxes in flowchart 700 can be implemented as computer-executable instructions residing on some form of non-transitory computer-readable storage medium, and by, for example... Figure 3 The system 300 is executed. In some embodiments, Figure 7 The operation shown is by Figure 3 The SSD 305 shown is executed, and more specifically, it can be executed by the sampling unit 312.
[0075] exist Figure 7 In the box 702 shown, a command is received for sampling a specific node (root node).
[0076] In box 704, in response to the command, the node data structure associated with the root node (second data structure; for example, ...) is accessed. Figure 4 The data structure shown is 400. The second data structure includes the address of the root node's attributes, and the addresses of the attributes of other nodes in the graph that are neighbors of the root node. In the embodiment using SSD, the address is a page number, and data structure 400 (… Figure 4 The page number is accessed to identify the page number used to store the attributes of the root node, and the page number used to store the attributes of the neighboring nodes that are the root node in the graph.
[0077] exist Figure 7 In box 706 shown, the attribute data structure (first data structure; for example,) addressed by the second data structure in box 704 is accessed. Figure 5 The data structure shown is 500. The first data structure includes attribute value data, representing the attribute values of neighboring nodes and the root node. In the embodiment using SSD, pages identified by page numbers are loaded.
[0078] In box 708, attribute value data is read from the loaded first data structure (page). More specifically, based on location information (e.g., offset values and / or attribute IDs) in the second data structure, only the attribute value data of the root node and the sampled neighbor nodes are selected, read, and collected.
[0079] In box 710, the selected, read, and collected attribute value data is output (e.g., the attribute value data is transferred to main memory, or transferred to a buffer and then to main memory).
[0080] In summary, embodiments of this disclosure eliminate the need to process unnecessary structural and attribute data when sampling nodes, thereby reducing the consumption of bandwidth and other computing resources (e.g., storage space). Attribute data can be accessed directly from main memory, which reduces latency associated with processing by the host computer system. Energy consumption is also reduced because the processor and memory on the solid-state drive require less energy than the central processing unit of the host computer system. Generally, embodiments of this disclosure can provide functional improvements to computing systems.
[0081] While the foregoing disclosure has illustrated various embodiments using specific block diagrams, flowcharts, and examples, each block diagram component, flowchart step, operation, and / or component described and / or illustrated herein can be implemented individually and / or collectively with a wide range of configurations. Furthermore, any disclosure of a component contained within other components should be considered exemplary, as the same functionality can be implemented through many other architectures.
[0082] Although the subject matter has been described in specific language of structural features and / or methodological actions, it should be understood that the subject matter defined in this disclosure is not necessarily limited to the specific features or actions described above. Rather, the specific features and actions described above are disclosed as examples of implementing this disclosure.
[0083] The embodiments of the present invention are described as follows. Although the invention has been described in specific embodiments, it should not be construed as being limited to these embodiments, but should be interpreted in accordance with the appended claims.
Claims
1. A computer-implemented method, wherein, include: First information associated with a graph is stored in a plurality of first data structures in a computer system memory, wherein the graph includes a plurality of nodes, and the first information includes: Attribute value data, representing the attribute values of one or more nodes in the graph, and Information for locating the attribute value data of each attribute within the first data structure; and Second information associated with nodes in the graph is stored in a plurality of second data structures in the computer system memory, wherein the second information includes: The addresses of the attributes of the node subsets of the multiple nodes, and Information used to locate the attribute addressed by the address in the first data structure; Each of the second data structures is associated with a corresponding root node among the plurality of nodes, and the subset of nodes includes: the corresponding root node, and neighboring nodes in the graph that are adjacent to the corresponding root node.
2. The computer-implemented method according to claim 1 further includes: Receive a command for sampling a specific node among the plurality of nodes; Access the second data structure associated with the specific node among the plurality of second data structures; as well as Based on the address in the second data structure associated with the specific node, access one or more of the plurality of first data structures.
3. The computer- implemented method of claim 1, wherein, The computer system memory includes a solid-state drive, the first data structure includes pages in the solid-state drive, and the address in the second data structure includes a page number. The method further includes: Receive a command to sample a specific node among the plurality of nodes; Access the second data structure associated with the specific node among the plurality of second data structures to identify the corresponding page number, the corresponding page number being used to store the attributes of the specific node and the attributes of the specific node's neighboring nodes in the graph; Load the page identified by the corresponding page number; Read the values of the attributes from the loaded page; and The values of the attributes read from the page are transferred to the main memory of the computer system coupled to the solid-state drive.
4. A system for graph node sampling, wherein, include: processor; as well as A memory coupled to the processor, the memory being used to store: Multiple first data structures are used to store first information associated with a graph, wherein the graph includes multiple nodes, and the first information in each of the first data structures includes: Attribute value data, representing the attribute values of one or more nodes in the graph, and Information for locating the attribute value data of each attribute within the first data structure; and Multiple second data structures are used to store second information associated with nodes in the graph, wherein the second information includes: The addresses of the attributes of the node subsets of the multiple nodes, and Information used to locate the attribute addressed by the address in the first data structure; Each of the second data structures is associated with a corresponding root node among the plurality of nodes, and the subset of nodes includes: the corresponding root node, and neighboring nodes in the graph that are adjacent to the corresponding root node.
5. The system of claim 4, wherein, It also includes a controller for performing a node sampling process, which, in response to a command to sample a specific node among the plurality of nodes, performs the following: Access the second data structure associated with the specific node among the plurality of second data structures; and Based on the address in the second data structure associated with the specific node, the attributes of the specific node and the attributes of the neighboring nodes of the specific node in the graph are read from one or more first data structures.
6. The system of claim 4, wherein, The memory includes main memory and solid-state drive.
7. The system of claim 6, wherein, The first data structure includes pages in the solid-state drive, and the address in the second data structure includes a page number. The solid-state drive performs a node sampling process in response to a command to sample a specific node among the plurality of nodes, to perform: Access the second data structure among the plurality of second data structures to identify the corresponding page number, the corresponding page number being used to store the attributes of the specific node and the attributes of the neighboring nodes of the specific node in the graph; Load the page identified by the corresponding page number; Read the value of the attribute from the loaded page; as well as The value of the attribute read from the page is transferred to the main memory.
8. The system of claim 7, wherein, The solid-state driver includes a sampling unit that receives the command and executes the node sampling process.
9. The system of claim 8, wherein, The solid-state drive includes a software application programming interface (API). The command includes parameters, including an identifier for the specific node, a sampling method, and the number of neighboring nodes of the specific node to be sampled. When the API is invoked, the parameters are written to the registers of the solid-state drive.
10. The system of claim 4, wherein, The first information also includes an identifier for each of the attributes.
11. A method implemented by a computer, comprising: Receive a command for sampling a specific node among a plurality of nodes, wherein the graph includes the plurality of nodes; as well as In response to the command, execute: Access a second data structure in the computer system memory, wherein the second data structure stores second information, the second information including: The address of a first data structure in the computer system memory, the first data structure including the attributes of the specific node and the attributes of other nodes, the other nodes in the graph being neighboring nodes of the specific node, and Location information for locating the attributes of the specific node and the other nodes in the first data structure addressed by the address; as well as Access the first data structure, which stores first information associated with the graph. Each of the first data structures is used to store: Attribute value data, representing the values of one or more attributes of the specific node and the other nodes, and Information for locating the attribute in each of the first data structures based on the location information in the second data structure.
12. The computer- implemented method of claim 11, wherein, The computer system memory includes a solid-state drive and main memory, the first data structure includes pages in the solid-state drive, and the address in the second data structure includes page numbers; The access to the second data structure includes: identifying the corresponding page number, which is used to store the attributes of the specific node and the attributes of the other nodes; The access to one or more first data structures includes: loading a page identified by the corresponding page number, and reading the values of attributes from the loaded page; The method further includes transferring the value of the attribute read from the page to the main memory.