A database heterogeneous acceleration method based on memory pooling

By combining management nodes and computing nodes, and utilizing memory expansion modules designed with CXL switches and FPGAs, efficient parallel computing and storage access for database query tasks are achieved. This solves the problems of excessive database computing tasks and resource waste, and improves query efficiency and response speed.

CN118606355BActive Publication Date: 2026-06-19SHANDONG INSPUR SCI RES INST CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
SHANDONG INSPUR SCI RES INST CO LTD
Filing Date
2024-06-12
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Existing databases require heterogeneous acceleration systems to offload them when computational tasks are too heavy, but this leads to resource waste, reduced query performance, and high memory access latency.

Method used

By combining management nodes and compute nodes, and connecting compute nodes and storage pools through CXL switches, storage and heterogeneous computing resources are managed and allocated. FPGA is used to design memory expansion modules for large-scale parallel computing. Combined with the CXL protocol, storage access and heterogeneous computing control are realized, and the computing power allocation of CPU and heterogeneous computing is optimized.

Benefits of technology

It improved database query efficiency, reduced system operating costs, and increased resource utilization and query response speed.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN118606355B_ABST
    Figure CN118606355B_ABST
Patent Text Reader

Abstract

This invention provides a database heterogeneous acceleration method based on memory pooling, belonging to the field of memory pooling and heterogeneous acceleration. This invention achieves optimized allocation of CPU and heterogeneous computing power under high bandwidth and low latency pooled storage access by offloading high computing power-demand tasks in database query sub-operations to heterogeneous computing units and using a memory access method based on the CXL protocol. This improves database query efficiency and reduces the overall operating cost of the system.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of memory pooling and heterogeneous acceleration, and in particular to a database heterogeneous acceleration method based on memory pooling. Background Technology

[0002] With the continuous development of CXL (Fast Compute Connection) protocol and switch technology, memory pooling has become a feasible technical implementation method. Current research and application directions for memory pooling mainly focus on the management and elastic scaling of pooled memory resources. Based on building ultra-large memory pools and multi-node switching configurations, the aim is to maximize access bandwidth and reduce memory access latency, achieving access performance of pooled memory that approximates local memory resources.

[0003] Memory pooling based on the CXL protocol can provide elastic memory for database query systems, solving the problems of elastic memory expansion and resource waste during database query applications. However, existing databases still suffer from excessive computational loads, often requiring offloading and acceleration of query tasks through heterogeneous acceleration systems such as GPUs. Independent CXL-based memory access and heterogeneous acceleration based on GPUs, while resulting in resource waste, also lead to a decrease in query performance due to data interaction between different devices. Summary of the Invention

[0004] To address the above technical problems, this invention provides a database heterogeneous acceleration method based on memory pooling.

[0005] The technical solution of this invention is:

[0006] A database heterogeneous acceleration method based on memory pooling includes: Data query tasks utilize a combination of management nodes and compute nodes, receiving external query tasks and performing parallel query computations on different compute nodes; a CXL switch connects compute nodes and a storage pool, managing and allocating storage and heterogeneous computing resources to compute nodes, enabling database access to the underlying storage nodes and heterogeneous computing control; an FPGA-designed memory expansion module enables storage access based on the CXL protocol and access to the heterogeneous computing control register group, and, based on the configuration parameters of the heterogeneous computing control register group, schedules data decompression, bitone sorting, filtering computation, and hashing computation modules for large-scale parallel computation to complete part of the computational tasks in the database query operation; finally, a data collector uploads the computation results to the compute nodes while simultaneously caching them in a local volatile storage unit. This invention, by offloading high-computing-power-demand tasks in the database query sub-operation to heterogeneous computing units and using a memory access method based on the CXL protocol, achieves optimized allocation of CPU and heterogeneous computing power under high-bandwidth, low-latency pooled storage access conditions, improving database query efficiency and reducing the overall system operating costs.

[0007] Furthermore,

[0008] The database query system is deployed on the management node and compute node servers. The management node records the storage index of all memory expansion modules, receives query tasks, and allocates storage, heterogeneous computing, and compute node resources to the query tasks. The storage access tasks are distributed to different memory expansion modules through the CXL switch, and the accessed data results are sent to different compute nodes for further query and calculation through task marking. Finally, the settlement results are transmitted to the management node through the CXL switch for result feedback.

[0009] A management node can receive parallel queries from several query tasks simultaneously. The management node generates storage access control for query tasks by indexing the data storage address in the non-volatile memory, and issues storage access tasks and heterogeneous acceleration tasks through the CXL controller.

[0010] The management node records and queries the working status of the compute nodes, distributes the query task to the idle compute nodes through the CXL switch, and loads the compute node information into the storage access task to match the storage access task with the host compute task.

[0011] Furthermore,

[0012] The CXL switch establishes a task queue for each memory expansion module access and manages the storage access tasks of the memory expansion module through task queues with priority tags.

[0013] When the CXL switch receives a task instruction submitted by the management node, it breaks down the task instruction into memory access tasks, heterogeneous acceleration tasks, and data path tasks, and stores them independently. The CXL switch detects the progress of memory access and heterogeneous acceleration tasks of different memory expansion modules through the local task record table. Based on the task requirements to be executed in the task queue, it finds and determines the currently available memory expansion modules and sends memory access and heterogeneous computing tasks to those memory expansion modules.

[0014] The CXL switch allows for the pipelined execution of tasks. The CXL switch sends several task streams to the same memory expansion module, marks the maximum number of tasks that can exist simultaneously in a single memory expansion module, and records the number of task types currently existing in the memory expansion module. The CXL switch receives task completion feedback information from the memory expansion module in real time. When it detects that a task process in the task queue has ended, it marks the task as invalid and decrements the recorded number of tasks in the memory expansion module by one.

[0015] The memory expansion module accesses the heterogeneous acceleration controller via the cache protocol in the CXL protocol controller, stores several mechanism acceleration tasks in the local cache, and decodes and executes them sequentially. The decoded instruction information is directly sent to the decompression, bitone sorting array, filtering calculation array, hash calculation array and data collector to control different heterogeneous computing function modules for chip selection and path execution.

[0016] During a database query task, the data is compressed before storage, and the data must be decompressed to obtain the data that meets the computational requirements. In addition, the database query task contains computational processes that can be executed in parallel. Some computational processes in these operators are offloaded and accelerated, and then calculated sequentially in a pipeline manner.

[0017] When data is stored in volatile storage devices, the CXL protocol controller receives a memory access task, shuts down the device by mapping virtual and real physical addresses, directly accesses the data in the local volatile storage, and uploads the read results to the CXL switch.

[0018] When data is stored in a non-volatile storage device, the local non-volatile storage controller directly reads the data from the non-volatile storage and sends it to the decompression module. The decompression module performs a decompression algorithm based on the hardware circuitry to obtain the decompressed data. For some data that does not need to be decompressed, the decompression module sends the data directly to the next processing unit in bypass mode.

[0019] The filtering computation array performs basic filtering computations, including logical comparisons and relational comparisons. The computations are completed sequentially and without interruption as the data stream is sent to the next processing unit. The bitone sorting array supports bitone sorting of up to 64 data items. Through data comparison computation at fixed intervals, it outputs 64 data results in ascending / descending order, completing part of the computational tasks in the query operator computation process. The hash computation supports several hash functions, with the number of each hash function matching the access bandwidth of the non-volatile memory. It can complete the calculation of the total amount of data read from the non-volatile memory in a single clock cycle.

[0020] The data collector receives data transmitted through the chip select path and uploads the data to the CXL switch. At the same time, based on the existing data write time and data read frequency in the volatile memory, it overwrites the new data into the volatile memory and uploads the data and address information to the management node server, providing storage access index information for the management node's next data access.

[0021] While the data collector reads data from volatile memory, the heterogeneous acceleration controller can perform data reading and computation tasks from non-volatile memory and write the computation results to volatile memory in an idle state. By masking some of the memory read and write latency, the overall storage access latency of the query task is reduced.

[0022] The CXL switch parses the message information uploaded by the memory expansion module and provides a data routing path. It transmits the data preprocessed by the heterogeneous computing unit of the memory expansion module to the computing node specified by the management node. The computing node receives the data information read by CXL from different memory expansion modules, performs other calculation processes for the query operation, and forwards the calculation results to the management node to achieve the final query feedback.

[0023] The beneficial effects of this invention are

[0024] By constructing pooled storage of non-volatile and volatile storage resources through CXL switches and memory expansion modules, and caching intermediate results of query calculations in volatile storage resources, the database's response speed to hot query content is improved. Through the scheduling and management of multiple memory expansion modules by CXL switches, as well as the task scheduling within the memory expansion modules, the utilization rate of heterogeneous computing resources is improved, thereby enhancing the parallel computing capability of database query tasks. Attached Figure Description

[0025] Figure 1 This is a schematic diagram of a database heterogeneous acceleration architecture based on memory pooling;

[0026] Figure 2 A schematic diagram of a heterogeneous computing and storage access architecture for a memory expansion module. Detailed Implementation

[0027] To make the objectives, technical solutions, and advantages of the embodiments of the present invention clearer, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are some embodiments of the present invention, but not all embodiments. All other embodiments obtained by those skilled in the art based on the embodiments of the present invention without creative effort are within the scope of protection of the present invention.

[0028] This invention provides a database heterogeneous acceleration method based on memory pooling. A database query system is deployed on management nodes and compute node servers. The management node records the storage indexes of all memory expansion modules, receives query tasks, and allocates storage, heterogeneous computing, and compute node resources to the query tasks. Storage access tasks are distributed to different memory expansion modules via a CXL switch. The accessed data results are sent to different compute nodes for further querying and computation via task marking. Finally, the settlement results are transmitted to the management node via the CXL switch for result feedback.

[0029] like Figure 1 As shown, this invention employs a management node that can simultaneously receive parallel queries from multiple query tasks. The management node generates storage access control for query tasks by indexing the data storage address in the non-volatile memory, and issues storage access tasks and heterogeneous acceleration tasks through the CXL controller. The management node records and queries the working status of the computing nodes, issues query tasks to idle computing nodes with sufficient computing resources through the CXL switch, and loads the computing node information into the storage access task, so that the storage access task matches the host computing task.

[0030] S1 and CXL switches manage and issue storage access tasks.

[0031] The CXL switch establishes a task queue for each memory expansion module access. This queue, marked with priority, manages storage access tasks for the memory expansion modules. When the CXL switch receives a task instruction from the management node, it breaks it down into memory access tasks, heterogeneous acceleration tasks, and data path tasks, storing them independently. The CXL switch detects the progress of memory access and heterogeneous acceleration tasks for different memory expansion modules using a local task log table. Based on the task requirements in the task queue, it locates and determines the currently available memory expansion modules and sends memory access and heterogeneous computing tasks to those modules.

[0032] The CXL switch allows for the pipelined execution of tasks. The switch sends multiple task streams to the same memory expansion module, marks the maximum number of tasks that can exist simultaneously in a single memory expansion module, and records the number of task types currently existing in the memory expansion module. The CXL switch receives task completion feedback information from the memory expansion module in real time. When it detects that a task process in the task queue has ended, it marks the task as invalid and decrements the recorded number of tasks in the memory expansion module by one.

[0033] S2, the memory expansion module performs local storage access and heterogeneous computing.

[0034] The memory expansion module accesses the heterogeneous acceleration controller via the cache protocol in the CXL protocol controller, storing multiple acceleration tasks in the local cache and decoding and executing them sequentially. The decoded instruction information is directly sent to the decompression, bitone sorting array, filtering calculation array, hash calculation array, and data collector to control different heterogeneous computing functional modules for chip selection and path execution. An example of a chip selection path is as follows: Decompression module → Filtering calculation array → Hash calculation array → Data collector.

[0035] During database query tasks, data is compressed before storage to maximize storage density. The data must be decompressed to obtain the format required for computation. Furthermore, database query tasks, such as sorting, aggregation, and filtering, involve numerous computational processes that can be executed in parallel. This invention offloads and accelerates some of these computational processes, performing them sequentially in a pipelined manner.

[0036] When data is stored in volatile storage devices, the CXL protocol controller receives a memory access task, shuts down the system via virtual and physical address mapping, directly accesses the data in the local volatile storage, and uploads the read results to the CXL switch.

[0037] When data is stored in a non-volatile storage device, the local non-volatile storage controller directly reads the data from the non-volatile memory and sends it to the decompression module. The decompression module performs a decompression algorithm using hardware circuitry to obtain decompressed data. For data that does not need to be decompressed, the decompression module bypasses the decompression and directly sends the data to the next processing unit.

[0038] The filtering array performs basic filtering calculations, including logical and relational comparisons such as >, <=, !=, NOR, and AND. The calculations are performed sequentially and without delay as a data stream, and then sent to the next processing unit. The bitone sorting array supports bitone sorting of up to 64 data items. Through fixed-period data comparison calculations, it outputs 64 data results in ascending / descending order, completing part of the computational tasks in the query operator calculation process, such as sorting and aggregation. The hash calculation supports multiple hash functions, with the number of each hash function matching the access bandwidth of the non-volatile memory, enabling the calculation of the total amount of data read from the non-volatile memory in a single clock cycle.

[0039] The data collector receives data transmitted through the chip select path and uploads the data to the CXL switch. At the same time, based on the existing data write time and data read frequency in the volatile memory, it overwrites the new data into the volatile memory and uploads the data and address information to the management node server, providing storage access index information for the management node's next data access.

[0040] In an optimized manner, while the data collector reads data from volatile memory, the heterogeneous acceleration controller can perform data reading and computation tasks from non-volatile memory and write the computation results to volatile memory in an idle state. This reduces the overall storage access latency of the query task by masking some of the memory read / write latency.

[0041] S3 and CXL switches establish data pathways between compute nodes and storage nodes.

[0042] The CXL switch parses the message information uploaded by the memory expansion modules and provides a data routing path. It transmits the data preprocessed by the heterogeneous computing unit from the memory expansion modules to the computing nodes specified by the management node. The computing nodes are traditional computing server nodes that comprehensively receive data information read by the CXL from different memory expansion modules, perform other computational processes for the query operation, and forward the computational results to the management node for final query feedback.

[0043] The above description is merely a preferred embodiment of the present invention and is used only to illustrate the technical solution of the present invention, and is not intended to limit the scope of protection of the present invention. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of the present invention are included within the scope of protection of the present invention.

Claims

1. A database heterogeneous acceleration method based on memory pooling, characterized in that, include: The data query task adopts a combination of management nodes and computing nodes to receive external query tasks and perform parallel query calculations on different computing nodes. CXL switches connect compute nodes and storage pools, manage and allocate storage and heterogeneous computing resources for compute nodes, and enable databases to access storage and control heterogeneous computing on the underlying storage nodes. An FPGA-based memory expansion module is designed to enable storage access and access to the heterogeneous computing control register group based on the CXL protocol. According to the configuration parameters of the heterogeneous computing control register group, the data decompression, bitonic sorting, filtering calculation, and hash calculation modules are scheduled to perform parallel computing to complete part of the computing tasks in the database query operation. Finally, the data collector uploads the computation results to the compute nodes while caching them in the local volatile storage unit. The database query system is deployed on the management node and compute node servers. The management node records the storage index of all memory expansion modules, receives query tasks, and allocates storage, heterogeneous computing, and compute node resources to the query tasks. The storage access tasks are distributed to different memory expansion modules through the CXL switch, and the accessed data results are sent to different compute nodes for further query and calculation through task marking. Finally, the settlement results are transmitted to the management node through the CXL switch for result feedback. A management node can receive parallel queries from several query tasks simultaneously. The management node generates storage access control for query tasks by indexing the data storage address in the non-volatile memory, and issues storage access tasks and heterogeneous acceleration tasks through the CXL controller. The management node records and queries the working status of the compute nodes, distributes the query task to the idle compute nodes through the CXL switch, and loads the compute node information into the storage access task to match the storage access task with the host compute task. The CXL switch establishes a task queue for each memory expansion module access and manages the storage access tasks of the memory expansion module through task queues with priority tags. When the CXL switch receives a task instruction submitted by the management node, it breaks down the task instruction into memory access tasks, heterogeneous acceleration tasks, and data path tasks, and stores them independently. The CXL switch detects the progress of memory access and heterogeneous acceleration tasks of different memory expansion modules through the local task record table. Based on the task requirements to be executed in the task queue, it finds and determines the currently available memory expansion module and sends memory access and heterogeneous computing tasks to that memory expansion module. The CXL switch allows for the pipelined execution of tasks. The CXL switch sends several task streams to the same memory expansion module, marks the maximum number of tasks that can exist simultaneously in a single memory expansion module, and records the number of task types currently existing in the memory expansion module. The CXL switch receives task completion feedback information from the memory expansion module in real time. When it detects that a task process in the task queue has ended, it marks the task as invalid and decrements the recorded number of tasks in the memory expansion module by one. The memory expansion module accesses the heterogeneous acceleration controller through the cache protocol in the CXL protocol controller, stores several mechanism acceleration tasks in the local cache, and decodes and executes them sequentially; the decoded instruction information is directly sent to the decompression, bitone sorting array, filtering calculation array, hash calculation array and data collector to control different heterogeneous computing function modules for chip selection and path execution. During a database query task, the data is compressed before storage, and the data must be decompressed to obtain the data that meets the computational requirements. In addition, the database query task contains computational processes that can be executed in parallel. Some computational processes in these operators are offloaded and accelerated, and then computed sequentially in a pipeline manner. When data is stored in volatile storage devices, the CXL protocol controller receives a memory access task, shuts down the device by mapping virtual and real physical addresses, directly accesses the data in the local volatile storage, and uploads the read results to the CXL switch. When data is stored in a non-volatile storage device, the local non-volatile storage controller directly reads the data from the non-volatile storage and sends it to the decompression module. The decompression module performs a decompression algorithm based on the hardware circuitry to obtain the decompressed data. For some data that does not need to be decompressed, the decompression module sends the data directly to the next processing unit in bypass mode. The filtering computation array performs basic filtering computations, including logical comparisons and relational comparisons. The computations are completed sequentially and without interruption as the data stream is sent to the next processing unit. The bitone sorting array supports bitone sorting of up to 64 data items. Through data comparison computation at fixed intervals, it outputs 64 data results in ascending / descending order, completing part of the computational tasks in the query operator computation process. The hash computation supports several hash functions, with the number of each hash function matching the access bandwidth of the non-volatile memory. It can complete the calculation of the total amount of data read from the non-volatile memory in a single clock cycle. The data collector receives data transmitted through the chip select path and uploads the data to the CXL switch. At the same time, based on the existing data write time and data read frequency in the volatile memory, it overwrites the new data into the volatile memory and uploads the data and address information to the management node server, providing storage access index information for the management node's next data access.

2. The method according to claim 1, characterized in that, While the data collector reads data from volatile memory, the heterogeneous acceleration controller can perform data reading and computation tasks from non-volatile memory and write the computation results to volatile memory in an idle state. By masking some memory read and write latency, the overall storage access latency of the query task is reduced.

3. The method according to claim 1 or 2, characterized in that, The CXL switch parses the message information uploaded by the memory expansion module and provides a data routing path to transmit the data preprocessed by the heterogeneous computing unit of the memory expansion module to the computing node specified by the management node. The compute node receives data from different memory expansion modules from CXL, performs other computational processes for the query operation, and forwards the computational results to the management node to achieve the final query feedback.