Method and apparatus for reading data held in a tree data structure

What is AI technical title?
AI technical title is built by Patsnap AI team. It summarizes the technical point description of the patent document.
By employing the Near Data Processing (NDP) approach in cloud-native databases, and leveraging expected log sequence numbers and shared lock mechanisms, the poor performance and scalability of traditional databases in cloud environments are addressed. This enables efficient data reading and writing, reduces storage costs, and supports concurrent read and write operations.

CN117120998BActive Publication Date: 2026-06-19HUAWEI CLOUD COMPUTING TECHNOLOGIES CO LTD

View PDF 2 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Patents(China)
Current Assignee / Owner: HUAWEI CLOUD COMPUTING TECHNOLOGIES CO LTD
Filing Date: 2022-03-17
Publication Date: 2026-06-19

Application Information

Patent Timeline

17 Mar 2022

Application

19 Jun 2026

Publication

CN117120998B

IPC: G06F16/2455

AI Tagging

Application Domain

Digital data information retrieval Special data processing applications

Technology Topics

Parallel computing Theoretical computer science

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

An FPGA-based DFM Pattern Match hardware accelerator and method
CN122389774AComputer hardware Pattern matching
Instruction cache management method, instruction cache, computing device and system
CN122018985BComputer architecture Parallel computing
A GC mechanism processing method, device and medium
CN114968584Bprocessing speed Relieve stress Resource allocation Memory adressing/allocation/relocation Parallel computing Multithreading
Extending temporal coherency within msoc to improve cache replacement policies for msoc
US20260178490A1Cache memory detailsParallel computing Control cell
A dot code
CN122366488ADot matrix Algorithm

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

Technical Problem

Traditional database architectures suffer from poor performance and scalability in cloud environments, resulting in high storage costs and wasted resources.

Method used

By adopting the Near Data Processing (NDP) approach in cloud-native databases, data processing is performed on storage nodes. Utilizing expected log sequence numbers (LSNs) and shared lock mechanisms, efficient data reading and writing are achieved, concurrent processing is enabled, and network transmission and input/output operations are reduced.

Benefits of technology

It enables efficient reading of data in tree data structures in cloud-native databases, reducing storage costs and resource waste, while supporting concurrent read and write operations, improving performance and scalability.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN117120998B_ABST

Patent Text Reader

Abstract

This invention provides a method for reading data stored in a tree data structure, such as a B+ tree, using near data processing (NDP) in a cloud-native database. According to an embodiment, the expected LSN is used for NDP page reads on a primary compute node (e.g., a primary SQL node). When the primary compute node (e.g., a primary SQL node) reads a regular page, the maximum expected LSN of that regular page (e.g., the latest page version number) is used. The embodiment utilizes the expected LSN and page locking features, where the correct version of a page can be obtained by combining the expected LSN associated with the page with the page locking, and consistent tree structure readings can be achieved, resulting in good read / write concurrency.

Need to check novelty before this filing date? Find Prior Art

Description

[0001] Cross-references to related applications

[0002] This application claims priority and benefit to U.S. Patent Application No. 17 / 218,937, filed March 31, 2021, which is incorporated herein by reference in its entirety. Technical Field

[0003] This invention relates to the field of database technology, and in particular to a method and apparatus for reading data stored in a tree data structure using near data processing (NDP) in a cloud-native database. Background Technology

[0004] As more and more computing applications, platforms, and infrastructure used by many companies and government agencies are now located in the cloud, the demand for cloud-based Database as a Service (DBaaS) is rapidly increasing. With the shift in markets and business models, such DBaaS services are emerging, particularly with Amazon... TM Microsoft TM Google TM Alibaba TM Huawei TM This is offered by other cloud providers. Initially, these providers offered DBaaS services using traditional monolithic database software. In other words, these providers simply ran a regular version of the database system on virtual machines in the cloud using local or cloud storage. While simple to implement, this approach may not provide customers with the functionality they expect from a database service. For example, this form of database system may not offer good performance and scalability, and may incur high storage costs.

[0005] Figure 1 illustrates a typical database deployment using a traditional database architecture. Referring to Figure 1, a traditional database architecture 100 includes a Structured Query Language (SQL) master node 110, SQL replicas 120, and storage nodes 131 and 132. The SQL master node 110 and SQL replicas 120 are communicatively connected to each other. Each of the SQL master node 110 and SQL replicas 120 is communicatively connected to storage nodes 131 and 132 via network 140, respectively. The SQL master node 110 writes logs to its own storage node 131 and transmits logs to the SQL replicas 120. Each of the SQL replicas 120 applies the logs on its respective storage node 132. Database data pages are redundantly stored across three storage nodes: the primary storage node and each storage node associated with a replica. Assuming each storage node in the cloud typically has three replicas, in this database architecture 100, data pages will be stored nine times, resulting in significant resource waste.

[0006] Therefore, there is a need for a method and apparatus for reading data from a database that is not affected by one or more limitations of the prior art.

[0007] The purpose of the background art is to disclose information that the applicant believes may be relevant to the present invention. It is neither necessary nor appropriate to acknowledge that any of the foregoing information constitutes prior art in relation to the present invention. Summary of the Invention

[0008] The purpose of this invention is to provide a method and apparatus for reading data stored in a tree data structure using near data processing (NDP) in a cloud-native database. According to an embodiment of the invention, a method is provided for obtaining one or more pages in response to a query using near data processing (NDP) in a cloud-native database. The method includes: receiving a query, the query including information indicating one or more desired pages, the information further indicating a version of each of the one or more desired pages; scanning a general buffer pool to identify one or more desired pages among the desired pages. When identifying one or more desired pages among the desired pages in the general buffer pool, the method further includes: copying the identified one or more desired pages to a dedicated buffer pool associated with the query. When identifying that there are no other desired pages in the general buffer pool, the method further includes: sending a request to one or more storage nodes for one or more remaining desired pages to be identified. The method further includes: receiving the one or more remaining desired pages; copying the received one or more remaining desired pages to the dedicated buffer pool.

[0009] In some embodiments, the method further includes: applying a shared lock to the identified one or more required pages before copying them. In some embodiments, the version of the one or more required pages is defined by a log sequence number (LSN). In some embodiments, the method further includes: applying a shared lock to the root of the B+ tree associated with the one or more required pages. In some embodiments, the method further includes: copying the received one or more remaining required pages into the general buffer pool.

[0010] According to embodiments of the present invention, a method is provided for reading data in a data tree using near data processing (NDP) in a cloud-native database. The method includes: applying a shared page lock on the root page and internal pages of the data tree in a top-down manner until reaching P0, where P0 refers to a page located at a level immediately above a leaf level of the data tree; obtaining a desired log sequence number (LSN) while maintaining the shared page lock on P0. After obtaining the desired LSN, for each extracted sub-page, the method further includes: allocating the page from a free list of a buffer pool to an NDP cache region of the buffer pool, the NDP cache region being designated for a specific query; determining whether the sub-page is found in a regular page region of the buffer pool when allocating the page to the NDP cache region; and pinning the sub-page to the page allocated in the NDP cache region based on the determination. The method further includes: releasing the shared page lock applied on the P0; processing the pages allocated to the NDP cache region; and, after processing, releasing each page in the pages allocated to the NDP cache region back to the free list of the buffer pool.

[0011] According to an embodiment of the present invention, a device is provided that supports near data processing (NDP) in cloud-native databases. The device includes: a network interface for receiving data from and sending data to devices connected to the network; a processor; and a machine-readable storage device for storing machine-executable instructions. When executed by the processor, the instructions cause the device to perform one or more methods defined above or elsewhere herein.

[0012] Embodiments have been described above in conjunction with aspects of the invention, and these embodiments can be implemented based on these aspects. Those skilled in the art will understand that embodiments can be implemented in conjunction with the aspects described therein, but may also be implemented together with other embodiments of that aspect. It will be apparent to those skilled in the art that embodiments are mutually exclusive or incompatible with each other. Some embodiments may be described in conjunction with one aspect, but may also be applicable to other aspects, which will be apparent to those skilled in the art. Attached Figure Description

[0013] Furthermore, the features and advantages of the invention will be readily understood by reading the following detailed description taken in conjunction with the accompanying drawings, in which:

[0014] Figure 1 illustrates a typical database deployment using a traditional database architecture provided by existing technologies.

[0015] Figure 2 An example of the architecture of a cloud-native relational database system provided in the embodiments is shown.

[0016] Figure 3 The architecture of the cloud-native database system provided in the embodiment is illustrated.

[0017] Figure 4 The architecture of a cloud-native database system that performs near data processing (NDP) is shown in the embodiment.

[0018] Figure 5 This illustrates the potential problems caused by B+ tree modifications.

[0019] Figure 6 This illustrates another potential problem caused by B+ tree modifications.

[0020] Figure 7 An example illustrates an SQL node buffer pool containing regular pages and NDP pages.

[0021] Figure 8 An embodiment is shown that provides a method for executing queries using near data processing (NDP) in a cloud-native database.

[0022] Figure 9 An embodiment is shown that provides a method for reading data stored in a B+ tree data structure using NDP.

[0023] Figure 10 A schematic diagram of an electronic device provided in an embodiment is shown.

[0024] It should be noted that in all the accompanying drawings, the same features are identified by the same element symbols. Detailed Implementation

[0025] In this invention, the term "database page" or "page" refers to the basic internal structure used to organize data within a database file. For example, a database page can be a storage unit whose size can be configured on a system-wide, database-wide, or group-specific basis. Pages can be identified by an identifier (ID), such as a page ID or space ID. Data in the database is organized based on database pages. The size of a database page can vary, for example, 4KB, 8KB, 16KB, or 64KB. Typically, database pages are organized based on a tree structure (e.g., a "B-tree").

[0026] In this invention, the term "redo log" refers to the average of all changes made to a database (e.g., a file). Typically, a redo log record (or more typically, a database log) indicates one or more physical changes to page content, such as "Write 5 bytes of content on page 11, offset 120". Each redo log contains one or more redo log records (or more typically, database log records). A redo log record, also called a redo entry or log entry, stores a set of change vectors, each describing or representing a change made to a single block or page in the database. The term "redo log" may originate from a specific database management system (DBMS) model; however, it can also be used generally in a generic way to refer to the database log. MySQL TM This is an exemplary database model that uses the term redo log. MySQL TM This is an exemplary database model that can be used to implement the examples described herein. It should be understood that the invention can be equally applied to other database models. A database model is a data model that defines or determines the logical structure of a database and how data can be stored, organized, and manipulated. An example of a database model is the relational model, which uses a table-based format to store, organize, and manipulate data.

[0027] In this invention, the term "log sequence number (LSN)" is a number indicating the position or location of a log record in a log file. The LSN order is the logical order in which the logs were generated. Log records are sorted in the global redo log buffer based on their LSN order. Typically, log records are sorted by ascending LSNs. The LSN order must be strictly adhered to; otherwise, the data retrieved from the database will be incorrect. Each page also has an LSN value, called the page LSN. The page LSN is the LSN of the redo log at the time the last page was updated.

[0028] In this invention, a compute node (e.g., an SQL node) uses an "expected LSN" to request a specific page version retrieved from one or more database components (e.g., Huawei Taurus, which stores multiple page versions). TM (Page storage in the database). The "expected LSN" can also be used to represent the physical state of the entire database at a given point in time. It's easy to understand that other identifiers, such as timestamps, hashes, or other identifiers, can be used to define a specific page version.

[0029] In this invention, the term "near data processing (NDP)" refers to a type of database processing that allows data to be processed locally rather than having its data transferred to a processor. Some database processing operations (such as column projection, predicate computation, and data aggregation) can be pushed to storage nodes (such as Huawei Taurus). TM This reduces network traffic between compute nodes (such as SQL nodes) and storage nodes by storing pages in the database.

[0030] In some embodiments of the invention, the term "node" refers to a physical or electronic device for performing one or more actions defined herein and associated with a "node". In some embodiments of the invention, a "node" may be configured as a logical structure, such as one or more of software, hardware, and firmware, for performing one or more actions defined herein and associated with a "node".

[0031] This invention provides a method and apparatus for reading / searching data stored in a tree data structure (e.g., a B+ tree) using near data processing (NDP) in a cloud-native database. According to embodiments, NDP scans (data reading / searching in NDP) do not prevent concurrent modifications to the B+ tree, and consistent B+ tree pages are retrieved even when the B+ tree is being concurrently modified. According to embodiments, NDP scans can also share the same memory pool (e.g., a buffer pool) as regular data scans. Furthermore, according to embodiments, NDP can utilize pages already existing in the buffer pool (e.g., a memory pool shared with regular data scans), thus avoiding input / output (IO) on those pages. A key advantage is that searching and reading the data tree does not prevent concurrent modifications to the B+ tree by other queries, while consistent B+ tree pages are retrieved even when the B+ tree is being concurrently modified.

[0032] This invention provides a method for reading data stored in a tree data structure, such as a B+ tree, using near data processing (NDP) in a cloud-native database. According to an embodiment, the expected LSN is used for NDP page reads on a primary compute node (e.g., a primary SQL node). When the primary compute node (e.g., a primary SQL node) reads a regular page, the maximum expected LSN of that regular page (e.g., the latest page version number) is used. Embodiments of this invention utilize the features of expected LSN and page locking. It is possible to obtain the correct version of a page by using the expected LSN associated with the page in conjunction with page locking, and to achieve consistent tree structure readings and good read / write concurrency.

[0033] Embodiments of the present invention utilize existing regular pages in the buffer pool of NDP pages. This reduces input / output (IO) and ensures that storage nodes can serve page versions requested from the primary compute node (primary SQL node).

[0034] According to embodiments, regular pages and NDP pages (e.g., custom NDP data) can share the same memory pool (i.e., buffer pool). In various embodiments, the memory pool (buffer pool) can be shared by dividing the entire buffer pool into logical regions: a regular page region for each NDP query and an NDP cache region or a dedicated buffer pool. If there is no NDP query currently being executed, the NDP cache region does not exist, so all buffer pool memory can be used by regular pages.

[0035] Embodiments of the present invention provide read / write concurrency while processing distributed storage nodes. For example, a B+ tree can concurrently modify the expected LSN during NDP reads while the storage node and SQL node are processing the leaf pages of the current batch.

[0036] Embodiments of the present invention provide read / write concurrency while achieving efficient use of memory and I / O resources. When no NDP reads occur, the entire buffer pool is available for regular pages. Furthermore, existing regular pages in the buffer pool can be used by NDPs.

[0037] Embodiments of the present invention can use short-term expected LSNs (i.e., expected LSNs with short lifespans). Specifically, a particular expected LSN value is only valid during the period in which the leaf pages of that batch are read (e.g., the period in which leaf pages of a specific level 1 page are read). This can be advantageous because the storage node does not need to retain a large number of old page versions. If long-term expected LSNs were used, it would place a heavy burden (e.g., overhead) on the storage node, for example, when there are a large number of updates to the database, because the storage node must retain many old versions of the pages.

[0038] As mentioned above, traditional database architectures, at least in some cases, can lead to poor performance and scalability. Therefore, cloud providers have recently built new cloud-native relational database systems designed specifically for cloud infrastructure. Typically, many cloud-native databases separate the compute and storage layers and offer beneficial features such as read replication support, fast failover and recovery, hardware sharing, and scalability, for example, up to 100 terabytes (TB).

[0039] At the conceptual level, cloud-native relational database systems have similar architectures, such as... Figure 2 As shown. Figure 2 Amazon Aurora was shown TM Microsoft Socrates TM and Huawei Taurus TM Examples of cloud-native relational database system architectures. (See references) Figure 2 In the cloud-native database architecture 200, the software is divided into two separate layers: a compute layer 210 and a storage layer 220, which communicate via a network 230. The compute layer 210 includes a single SQL master (controller) node 211 and multiple read-only SQL replica nodes 212. Each of the SQL master node 211 and SQL replica nodes 212 is communicatively connected to the storage layer 220. In effect, the SQL master node 211 and SQL replica nodes 212 share the storage layer 220. Compared to traditional database architectures, sharing the storage layer 220 saves storage costs. The SQL master node 211 handles all update transactions. The master node 211 transmits log records to the storage layer 220 via the network 230. The storage layer 220 writes the log records to a reliable shared cloud storage 221. The storage layer 220 has three main functions: storing database pages, keeping the log records received by the application up-to-date, and responding to page read requests. The storage layer 220 partitions the database pages and maintains them across several storage nodes.

[0040] While each cloud-native database system can have different components, further details about cloud-native database systems will be provided below, such as the components of the database system and alternative implementations.

[0041] Figure 3 The architecture of the cloud-native database system provided in the embodiment is illustrated. Figure 3 The architecture of the cloud-native database system shown is similar to that of Huawei Taurus. TM The database systems are similar.

[0042] According to an embodiment, the cloud-native database system 300 contains four main components. These four main components are a log store 321, a page store 322, a storage abstraction layer (SAL) 330, and a database front-end. These four main components are distributed between two physical layers (i.e., the compute layer 310 and the storage layer 320). Because there are only two physical layers, the amount of data sent over the network 334 can be minimized, and request latency can be reduced (e.g., a request can be completed with a single call over the network). The SAL 330 is a logical layer separating the compute layer 310 and the storage layer 320.

[0043] The compute layer 310 comprises a single SQL master node (database (DB) master node) 311 and multiple read replicas (DB replicas) 312. The single SQL master node (DB master node) 311 handles both read and write queries, and is therefore responsible for all database updates and data definition language (DDL) operations. The read replicas (DB replicas) 312 can only execute read queries, and are therefore responsible for handling read-only transactions. The read replica view of the database can be conceived as a slightly lagging view of the read replicas of the SQL master node 311. Database transactions, including multiple statements such as insert, update, delete, and select (i.e., read requests), are handled by the database nodes 311 and 312 in the compute layer 310. It should be noted that database nodes 311 and 312 may not be physical hardware servers, but rather software running on physical processing resources in the cloud. Database nodes 311 and 312 can be software (e.g., instances of database nodes 311 and 312) running on virtual machines (e.g., abstract machines) or containers (e.g., abstract operating systems) provided by the cloud. Typically, the instances of database nodes 311 and 312 can be considered to be physical, since any instance is implemented on a physical machine.

[0044] Each of database nodes 311 and 312 communicates with the storage abstraction layer (SAL) 330 through its corresponding primary SAL module 331 or replica SAL module 332. SAL 330 can be considered as spanning database services and virtualized storage resources, providing an abstraction layer that aggregates physical resources to serve both database services and virtualized storage resources. It should be noted that SAL 330 is not a typical traditional database service layer (e.g., a database service provided by a traditional cloud service provider). The cloud-native database system 300 includes SAL 330 and can use SAL 330 to implement the functionality discussed further below and elsewhere in this document. Each SAL module 331, 332 can be a software instance implemented within SAL 330. For simplicity, instances of SAL modules 331, 332 can be simply referred to as SAL modules 331, 332. SAL modules 331, 332 provide the functionality of the logical SAL 330. In some examples, one or more functions of SAL modules 331, 332 may alternatively be implemented in storage layer 320. SAL 330 is used to isolate the client-facing front end (provided by compute layer 310) from the organization and management of the database.

[0045] Data (including database redo logs and pages) is stored in storage tier 320. Storage tier 320 is accessible via a network, such as through Remote Direct Memory Access (RDMA). Storage tier 320 can be a distributed storage system, for example, provided by a virtualization layer, offering relatively fast, reliable, and scalable storage. Storage tier 320 includes a log storage server 321 and a page storage server 322. In other words, log storage 321 and page storage 322 are entirely part of storage tier 320. Log storage 321 is primarily used to store log records generated by the SQL master node 311. Log storage 321 can store database transaction logs, all redo logs, and write-ahead logs (WAL). Page storage 322 is primarily used to serve page read requests (i.e., requests to read data from one or more pages) from the database (DB) master node 311 or read replica node 312. Page storage 322 can construct pages based on redo logs from the SQL master node 311 and provide page lookup services. Page storage 322 can recreate versions of pages that database nodes 311 and 312 can request (e.g., earlier or current versions). In the cloud-based database system 300, one or more page storage servers 322 are operated by a storage resource cluster. Each page storage server 322 receives or has access to all log records generated for the pages it is responsible for. The page storage server 322 then merges the logs into the database pages.

[0046] Although described in the context of a single database above, it should be understood that in some examples, two or more databases may be managed using a cloud-native database system 300 (e.g., using logical separation to separate the databases). Each database is divided into a smaller, fixed-size subset of pages called slices (e.g., 10 gigabytes (GB)). Each page store 322 is responsible for a corresponding number of slices. A slice managed by a single page store 322 may include slices with pages from different databases. Furthermore, each page store server 322 only receives (redo) logs of pages belonging to the slices managed by that page store server. Typically, a database may have multiple slices, and each slice may be replicated to multiple page stores 322 for persistence and availability. For example, if a particular page store 322 is unavailable, another page store 322 to which the slice has been replicated may be used to continue servicing requests to access data from that slice (i.e., read data from that slice) or modify data stored in that slice (i.e., write data to that slice). In some embodiments, the functionality of a log store 321 may be integrated into the page store 322.

[0047] Operations performed by the primary SAL module 331 include sending redo log record updates from the database master node 311 to one or more database replica nodes 312 (arrow 302); sending information about the physical location of the redo log (i.e., identifying the log store 321) so that one or more database replica nodes 312 know where to access (i.e., read) the latest redo log record, for example, as also shown by arrow 302. Operations that the primary module 331 may also perform include accessing (i.e., reading) pages from one or more page stores 322 (i.e., as shown by dashed arrow 304); and writing redo log records to one or more log stores 321 (i.e., as shown by arrow 305) and one or more page stores 322 (i.e., as shown by arrow 306). Operations performed by the replica SAL module 332 include receiving redo log record updates from the database master node 311 (i.e., as shown by arrow 302); and receiving updates to redo log records from one or more log stores 321 (i.e., as shown by arrow 308).

[0048] Storage Abstraction Layer (SAL) is a library that integrates existing database front-ends (such as MySQL). TM or Postgres TM The code is isolated from the underlying complexities of remote storage, database slicing, recovery, and most database replica synchronization. SAL 330 is responsible for writing log data to log storage 321 and page storage 322. SAL 330 is also responsible for reading pages from page storage 322. SAL 330 is also responsible for creating, managing, and destroying slices in one or more page storages 322 and allocating pages to slices. In some embodiments, each SAL module 331, 332 may be linked to database nodes 311, 312. In some embodiments, each SAL module 331, 332 may be implemented by another component in the cloud-native database system 300 (e.g., the SAL module may be implemented by another server not linked to database nodes 311, 312, such as a storage server). In some embodiments, as an alternative implementation, SAL 330 may run on a single node.

[0049] Each database node 311, 312 can be served by a corresponding SAL module 331, 332 (as shown in the figure). In some examples, a single instance of a SAL module serving the functionality of both SAL module 331 and replica SAL module 332 can serve two or more database nodes 311, 312. In some examples, a single instance of such a SAL module can serve all database nodes 311, 312 in the cloud-native database system 300. In some examples, SAL 330 can be implemented using one or more independent SAL modules 331, 332 that can run on virtual machines, containers, or physical servers. In some embodiments, the functionality of SAL 330 can be directly integrated into existing database code without using a separate library. However, it should be understood that the functionality of parsing logs and sending the parsed logs to different page memories 322 does not exist in traditional database kernel code.

[0050] Whenever the database is created or expanded, SAL 330 selects page store 322 and creates slices on the selected page store. Whenever the database frontend decides to flush the log records, log records in the global flush buffer are sent to SAL 330. SAL 330 first writes the log records to the currently active log store replica 321 to ensure their persistence. Once the log records have been successfully written to all log store replicas 321, the write is acknowledged to the database, and the log records are parsed and subsequently distributed to the per-slice flush buffers. These slice buffers are then flushed when the buffer is full or expires.

[0051] In traditional data processing, Structured Query Language (SQL) queries require retrieving all data from data storage on the network to compute nodes (such as SQL nodes), where projection, predicate computation, and aggregation are then performed. This can be a significant process, especially for analytical queries, which may require accessing and aggregating large amounts of data. Furthermore, such analytical queries may require performing full database table scans, which can be a very large task because analytical queries are typically ad-hoc and therefore lack pre-built indexes to serve them.

[0052] As mentioned above, near data processing (NDP) can process data locally, for example, by incorporating SQL operators into the storage node. Therefore, NDP performs pre-evaluation within the storage node to filter out unnecessary datasets, and then returns only matching subsets of this data to the compute node (such as the SQL node) for further processing.

[0053] Deploying NDP can prevent analytical queries from generating significant page traffic. For example, a portion of the NDP query task (such as some database processing tasks like projection, predicate computation, and aggregation) can be moved to storage nodes, freeing up at least some resources (such as the SQL layer central processing unit (CPU)). Therefore, with NDP, analytical queries have a smaller impact on online transaction processing (OLTP) operations. In some cases, transaction processing (e.g., OLTP) and analytical processing (e.g., online analytical processing (OLAP)) can be executed concurrently to some extent. For example, this parallel processing is possible when OLAP tasks use fewer compute node (e.g., SQL node) resources, thus making more compute node (e.g., SQL node) resources available for OLTP tasks.

[0054] As mentioned above, NDP reduces page reads and other network input / output (IO) operations, thus speeding up queries. Therefore, due to the reduction in data volume and the number of page reads, the required bandwidth and input / output operations per second (IOPS) can be reduced accordingly.

[0055] The following query example illustrates how to perform NDP:

[0056] SELECT c_name FROM customer WHERE c_age>20

[0057] Referring to the query above, considering column projection, only the column c_name needs to be sent from the storage node to the compute node (e.g., the SQL node). Considering predicate computation, only rows that satisfy "c_age > 20" need to be sent from the storage node to the compute node (e.g., the SQL node).

[0058] Here is another query example illustrating how to perform NDP while considering aggregation.

[0059] SELECT count(*) FROM customer

[0060] The query above returns the number of rows, so it is only necessary to send the "number of rows" from the storage node to the compute node (e.g., the SQL node).

[0061] Figure 4The architecture of a cloud-native database system that performs near data processing (NDP) is shown in the embodiment. Figure 4 The architecture of the cloud-native database system shown is similar to GaussDB. TM (for MySQL) TM The architecture is similar to that of the other systems. System 400 includes compute nodes (SQL nodes) 410 and storage nodes 420.

[0062] The compute node (SQL node) 410 includes a parser query engine 411, an InnoDB storage engine 412, and a SAL SQL module 413. The parser query engine 411 is an SQL optimizer that analyzes SQL queries (e.g., parses and identifies queries) and determines efficient execution mechanisms such as NDP. The parser query engine 411 supports multi-threaded parallel execution of some SQL commands (e.g., sorting, aggregation, joining, and grouping). The parser query engine 411 provides code generation technology (e.g., code generation technology based on a low-level virtual machine (LLVM)) to improve query execution performance. The InnoDB storage engine 412 supports batch page pushing down to the storage node, thereby reducing I / O requests. The InnoDB storage engine 412 is a multi-version concurrency control (MVCC) storage engine, allowing multiple versions of a single row to coexist. The SAL SQL module 413 enables the compute node 410 to interact with the underlying storage node 420.

[0063] Storage node 420 includes page memory 421 that serves page read requests from compute node (SQL node) 410. Storage node 420 supports NDP operations such as projection, prediction, aggregation, and MVCC.

[0064] Assuming compute node 410 and storage node 420 are separate and there is no near data processing (NDP), data must traverse the network between compute node 410 and storage node 420. For example, after processing compute node 410, data will be written back to storage node 420 several times, which is inefficient. Using NDP, compute node 410 transmits redo logs to storage node 420, and storage node 420 only returns a matching subset (e.g., special pages) to compute node 410, filtering out unnecessary datasets. Since large datasets do not need to be transmitted over the network, the required network bandwidth and I / O resources can be significantly reduced.

[0065] Compute nodes (such as SQL nodes) have their local page caches (or buffer pools) to hold the pages that database transactions need to access. In traditional query processing (i.e., without NDP), all B+ tree pages are read into the buffer pool, and page locking is used to coordinate concurrent reads and writes on the same data page. However, since NDP data received from storage nodes is tailored to a specific query, this NDP data should not be shared with other queries. Furthermore, distributed cloud storage nodes can receive updates to data pages at different times. These factors pose challenges to how to coordinate NDP reads of the B+ tree with other transactions that simultaneously modify the B+ tree.

[0066] Figure 5 This illustrates the potential problems caused by modifications to B+ trees. (Reference) Figure 5 The B+ tree 500 includes an internal page P1 510 and four child pages P3 530, P4 540, P5 550, and P6 560. Child pages P3 530, P4 540, P5 550, and P6 560 can be leaf pages. Data contained in page P5 550 is located in storage node 501, while data contained in page P6 560 is located in storage node 502. In other words, page P5 550 can directly access storage node 501, and page P6 560 can directly access storage node 502.

[0067] exist Figure 5 In this scenario, assume that NDP requests have been made for pages P5 550 and P6 560, but these requests have not yet reached storage nodes 501 and 502. Under this assumption, when concurrent transactions modify pages P5 550 and P6 560, storage node 501 will receive the data contained in the new page P5 551, but storage node 502 will not receive the data contained in the new page P6 561. In other words, during the NDP request, storage nodes 501 and 502 will return data containing the new page P5 551 and the old page P6 560, respectively. Therefore, the number "20" cannot be found in the old page P6 560 during the scan, resulting in incorrect query results.

[0068] Figure 6 This illustrates another potential problem caused by B+ tree modifications. Similar to... Figure 5 The B+ tree 500 shown is... Figure 6 The B+ tree 600 includes internal page P1 610 and its four child pages P3 630, P4 640, P5 650 and P6 660. Child pages P3 630, P4 640, P5 650 and P6 660 can be leaf pages.

[0069] exist Figure 6In this scenario, assume that the NDP request on page P4 640 has completed, and a query is processing rows in page P4 640. Also assume that while the query is being processed, another transaction occurs that leads to the deletion of page P5 650.

[0070] As part of deleting page P5 650, the "next page" identifier (ID) of page P4 640 should be changed to the page ID of page P6 660, such as... Figure 6 As shown at the bottom. However, since page P4 640 is tailored for the query described above (i.e., a query that processes rows in page P4 640), page P4 640 treats page P5 650 as the next page when processing the query. When the NDP query continues to read page P5 650, this will present incorrect query results because, for example, page P5 650 has been reassigned to another part of B+ tree 600, or even reassigned to a different B+ tree.

[0071] Amazon Aurora TM It offers "parallel queries," a feature similar to near-data processing. Amazon Aurora TM A buffer pool is not used when processing parallel queries. For example, Amazon Aurora. TM The buffer pool is bypassed when performing parallel queries to read B+ tree pages. In other words, even if the page already exists in the buffer pool, the query will still cause the storage node to be accessed to read the page.

[0072] It can be recognized that Amazon Aurora TM There are some issues with the provided parallel query functionality. One issue is the potential overhead of adding SQL nodes when tracking the number of pages containing table data in the buffer pool. Another issue is that Aurora uses information that is not accurate enough to determine whether executing parallel queries is more beneficial. For example, when pages containing "key1<50" are in the buffer pool, but the query requires "key1>100", executing parallel queries may still be more beneficial despite the large amount of table data in the buffer pool.

[0073] Embodiments of the present invention provide a method for reading data stored in a tree data structure, such as a B+ tree, using near data processing (NDP) in a cloud-native database. According to embodiments, NDP scans (e.g., data reads / searches in NDP) do not prevent concurrent modifications to the B+ tree, and consistent B+ tree pages are retrieved even when the B+ tree is undergoing concurrent modifications. NDP scans also share the same memory pool (e.g., the same buffer pool) as regular data scans. Furthermore, NDP can utilize one or more pages already present in the buffer pool (e.g., the memory pool shared with regular data scans) and can avoid input / output (IO) of these one or more pages.

[0074] According to an embodiment, the memory in the buffer pool is used to store NDP pages received from the storage node. Buffer pool pages used by NDPs associated with a specific query are not stored in the buffer pool data management structures in the compute node (e.g., the SQL node), such as hash maps, least recently used (LRU) lists, and flush lists. Therefore, buffer pool pages for NDPs associated with a specific query can be effectively hidden or separated from other database transactions, and thus are unaffected by buffer pool flushing and eviction processes in the compute node (e.g., the SQL node). The allocation and release of NDP pages are entirely controlled by the query requesting the NDP. The advantage of this is that it allows for sharing the memory pool between regular scans and NDP scans while keeping NDP pages and regular pages separate from each other.

[0075] According to an embodiment, the expected log sequence number (LSN) is included in the NDP request to the storage node. Since the expected LSN indicates the page version, the storage node can use the expected LSN to provide a specific version of the requested page. The NDP associated with this query will not see any concurrent modifications to these pages because it only reads the specific version of the page (e.g., versioned page reads). For ease of illustration, reading a page based on the expected LSN can be viewed as taking a snapshot of the B+ tree at a certain point in time. The advantage of using the expected LSN is that page consistency can be maintained for NDP B+ tree reads even if the B+ tree data structure is concurrently modified (i.e., concurrent modifications to the B+ tree) while the storage node and compute nodes (e.g., SQL nodes) are processing the NDP page.

[0076] According to an embodiment, before sending an NDP request to a storage node, a search of the buffer pool is performed to determine whether the (requested) page already exists in the buffer pool. If such a page exists, it is copied to the pre-allocated NDP page memory (also part of the buffer pool), thereby avoiding one or more additional I / O transactions associated with the storage node for that page.

[0077] Figure 7 This illustration shows an SQL node buffer pool comprising regular pages and NDP pages, provided by an embodiment of the present invention. References Figure 7 The SQL node buffer pool 700 includes a free list 710, a hash map 720, an LRU list 730, and a refresh list 740. On the other hand, the SQL node buffer pool 700 is logically divided into a public regular page area 760 (or a general buffer pool) and dedicated NDP cache areas 770 and 780, each area being a dedicated buffer pool associated with a specific query. Regular pages are contained in the public regular page area 760, and NDP pages are contained in the dedicated NDP cache areas 770 and 780. Regular pages are managed by buffer pool structures such as the hash map 720, the LRU list 730, and the refresh list 740. NDP pages are managed by (NDP) queries Q1 751 and Q2 752 that request NDP pages. Both regular pages and NDP pages are obtained from the free list 710 and can be returned to the free list 710 when evicted or released. Regular pages and NDP pages share the SQL node buffer pool 700.

[0078] According to an embodiment, the expected LSN will be used for NDP page reads. When the primary compute node (e.g., the primary SQL node) reads a regular page, the maximum expected LSN of that regular page (e.g., the latest page version number) will be used. Figure 7 In this scenario, NDP query Q2 752 reads page P3 from its dedicated NDP cache area 780. The NDP page read carries the expected LSN, for example, LSN = 5000. Another transaction in the primary SQL node reads page P3 from the regular page area 760. Assuming the SQL node buffer pool 700 is in the primary SQL node, regular page reads (e.g., reading page P3 from the regular page area 760) always carry the maximum expected LSN. This means that the latest version of the requested regular page is required.

[0079] According to an embodiment, if a page already exists in the public regular page area, then based on NDP query requests for these pages, the existing regular page will be copied to the NDP query's dedicated cache area. (See reference) Figure 7Suppose NDP query Q1751 needs to read pages P2, P3, and P4. In this example, since page P3 already exists in the public regular page area 760, page P3 will be copied to the dedicated NDP cache area 770 of NDP query Q1 751. NDP query Q1 751 reads pages P2 and P4 using the expected LSN (e.g., LSN = 6000).

[0080] Figure 8 The illustration shows a method for retrieving pages for querying using near data processing (NDP) in a cloud-native database, as provided in the embodiment. It should be understood that using NDP allows defining how processing is performed; for example, NDP refers to a database processing paradigm that enables processing of data locally, such as incorporating SQL operators into storage nodes.

[0081] The method includes receiving (810) a query that includes information indicating one or more desired pages, and that the information also indicates a version of each of the one or more desired pages. It should be understood that the query may define a question or a set of actions to be taken or performed on one or more desired pages stored in a database. The method also includes scanning (820) a generic buffer pool to identify one or more desired pages. For example, a scan may refer to one or more actions that can be performed to determine, locate, or identify the presence of a desired page in the generic buffer pool. For example, a scan may require searching for tags, hashes, log sequence numbers, or other identifiers that may be associated with a desired page. A scan may also be interpreted as a complete or partial comparison of pages present in the generic buffer pool with the desired pages. It should be understood that the type of scan may depend on the information included in the query.

[0082] When identifying one or more desired pages in a general buffer pool, the method further includes copying (830) the identified one or more desired pages to a dedicated buffer pool associated with the query. According to an embodiment, the dedicated buffer pool is part of a buffer associated with a node and is specifically allocated to the query. By being allocated to a query, the dedicated buffer pool is isolated from, for example, another query and is inaccessible.

[0083] When no other required pages are identified in the general buffer pool, the method further includes sending (840) a request to one or more storage nodes for one or more required pages among the remaining required pages to be identified. The method also includes receiving (850) one or more remaining required pages and copying (860) the received one or more remaining required pages to a dedicated buffer pool. According to an embodiment, one or more remaining required pages can be used to define one or more required pages that are not present in the general buffer pool. Because the query requires these one or more required pages, the node must still retrieve these unfound required pages (remaining required pages) in order to perform one or more actions associated with the query. For example, if the query includes information indicating that pages 1 and 2 are required pages, and only page 1 is identified as existing in the general buffer pool during a scan of the general buffer pool, then page 2 can be recognized as a remaining required page.

[0084] According to an embodiment, once the required page exists in a dedicated buffer pool, other actions can be taken to satisfy or respond to the query. For example, the query may define a question related to some data stored in a database, and the other actions may define one or more steps to be performed in conjunction with the required page in order to determine, evaluate, or compute the answer or result of the query.

[0085] In some embodiments, the method further includes applying a shared lock to the identified one or more desired pages before copying them. For example, the shared lock may define a feature whereby the page is locked to prevent modification, but remains open for access by other searches. In some embodiments, the method further includes releasing the shared lock once the identified one or more desired pages have been copied.

[0086] In some embodiments, the method further includes applying a shared lock on the root of the B+ tree associated with the one or more desired pages. In some embodiments, the method further includes releasing the shared lock on the root of the B+ tree.

[0087] In some embodiments, the version of one or more required pages is defined by a log sequence number (LSN). In some embodiments, the method further includes copying the received one or more remaining required pages to a general buffer pool.

[0088] Figure 9 This illustration shows a method 900 for reading data stored in a B+ tree data structure using near data processing (NDP), according to an embodiment of the present invention. It should be understood that the steps shown below can be performed by… Figure 3The components of the cloud-native database system 300 shown are executed. For example, steps involving communication with the database master node can be executed by the master SAL module 331 (i.e., the SAL module associated with the database master node 311), and steps involving communication with the database replica nodes can be executed by the replica SAL module 332 (i.e., the SAL module associated with the database replica node 312).

[0089] In step 910, the database system's compute nodes (e.g., SQL nodes) traverse the B+ tree structure from top to bottom, applying shared page locks at the root and on each internal page until reaching a level 1 page. The term "level 1" refers to the level immediately above the leaf level (i.e., level 0) in the B+ tree data structure. A level 1 page is hereinafter referred to as page P0. According to an embodiment, all pages cached in the query's NDP cache area (e.g., a dedicated buffer pool associated with the query) are leaf pages (i.e., level 0 pages).

[0090] On page P0, the compute node searches for records pointing to the next level 0 page (leaf level page or leaf page) to be visited. This next level 0 page to be visited is hereinafter referred to as page Ps. In various embodiments, the B+ tree search may be initiated with the goal of locating the minimum or a specific value in the tree.

[0091] In step 920, if the compute node is the primary SQL node, it records the latest LSN of the page as the expected LSN for the NDP request to be submitted in step 940, while maintaining the shared lock on page P0. If the compute node is a replica SQL node, the expected LSN from the page read is used, since the page read carries the expected LSN.

[0092] According to an embodiment, the compute node records the latest LSN while maintaining a shared lock on page P0. The shared lock prevents another query from modifying page P0 so that the LSN can represent a consistent subtree rooted at page P0 (e.g., ensuring the consistency of the tree structure). In other words, the shared lock ensures that no other transaction (e.g., actions from other queries) can change the tree structure on the subtree rooted at page P0, because any such changes would require an exclusive lock on page P0. For example, no other transaction can delete a leaf page of page P0 or move a row from a leaf to its sibling. Therefore, the LSN represents a consistent subtree structure rooted at page P0.

[0093] According to the embodiment, the value of a specific expected LSN is only valid for the period during which the leaf pages of a specific level 1 page are read. That is, the lifetime of a specific expected LSN value is short, so the storage node does not need to retain all older versions of the pages. If the expected LSN had a longer lifetime, this could place a heavy burden on the storage node (e.g., increase required overhead), for example, when there are many updates to the database, because the storage node must retain many older versions of the pages to serve the expected LSN for a longer lifetime.

[0094] Starting with the records after page Ps, repeat steps 910 and 920 for the remaining records on page P0. Each record on page P0 indicates the page ID of each subpage Pi of page P0, where 1 ≤ i ≤ n, and n is the number of subpages of page P0. It should be noted that steps 910 and 920 are executed only once for a given P0. Once P0 is shared locked, iteration of records on P0 continues. It should be noted that P0 remains shared locked during these iterations. Therefore, P0 is not released and relocked (shared lock) during the iteration of records on P0.

[0095] In step 930, the compute node extracts the page ID from each record on page P0. The extracted page ID is recorded or represented as the page ID of page Pi. In other words, the compute node extracts the subpage ID from page P0.

[0096] For each page Pi, repeat the following tasks shown in steps 940 to 950 until all records in page P0 have been processed, or until a predefined condition is met. In step 940, the compute node allocates a buffer pool free list (e.g., ...). Figure 7 Pages in the free list (710) are used to store pages. Compute nodes allocate pages from the free list to the NDP cache region of the query (e.g., a dedicated buffer pool associated with a specific query). Compute nodes do not allocate pages from the buffer pool to areas designated for hash mapping (e.g., ...). Figure 7 The hash map 720 in the LRU list (e.g.) Figure 7 LRU list 730) or refresh list (e.g. Figure 7 The refresh list (740) is located in the memory region.

[0097] Once a page in the buffer pool's free list is allocated, the compute node searches for the page ID of page Pi in the buffer pool's regular page area. If page Pi is found in the buffer pool's regular page area, the compute node applies a shared lock to page Pi, copies page Pi to the allocated page (e.g., thus protecting the allocated page Pi), and then releases the shared lock on page Pi. For example, a shared lock could define a characteristic where a page is locked to prevent modification, but remains open for other searches. However, if an exclusive lock is on page Pi, indicating that another transaction is modifying page Pi, the compute node will not apply a shared lock to page Pi until the exclusive lock is released. For example, an exclusive lock could define a characteristic where a page is locked to prevent all modifications and access until the exclusive lock is released, but remains open for other searches. In this case, page Pi copied to the NDP cache area can be considered a version newer than the expected LSN. However, it can be recognized that the delay in applying the shared lock to page Pi is acceptable because modifying page Pi during the exclusive lock period does not change the subtree structure.

[0098] On the other hand, if page Pi is not found in the regular page area of the buffer pool, the compute node submits an NDP request for page Pi to the storage node using the expected LSN. This request can be an asynchronous NDP request. The compute node then retrieves page Pi from the storage node, and the retrieved page is allocated to the buffer pool's free list and also stored in the query's NDP cache area, i.e., the dedicated buffer pool associated with the query.

[0099] According to an embodiment, page P is searched from the regular page area of the buffer pool. i The page ID (e.g., the search performed in step 940) is used to reduce input / output (IO). Furthermore, if the compute node is the primary SQL node, a search is performed to ensure that the storage node can serve the expected LSN requested on the page. In some embodiments, if the page does not exist in the primary SQL node's buffer pool, the storage node can provide the latest version of the page. If the page exists in the primary SQL node's buffer pool, the buffer pool contains the latest version of the page. Therefore, it is necessary to identify the expected LSN value before searching for page Pi in the regular page area to mitigate contention. For example, this can mitigate the situation where, once step 940 concludes that page Pi does not exist in the regular page area, another transaction or query brings page Pi to the regular page area.

[0100] According to an embodiment, the last page sent to the storage node in steps 930 and 940 is marked as the "last page". When the storage node processes this "last page", in step 950, the storage node can send the last row of the last page (Pn) back to the compute node (SQL node), even if the query does not require that row. This value (e.g., the value of the last record on the page) will be used in step 970. According to an embodiment, the action of the storage node sending the last row of the last page (Pn) back to the compute node does not necessarily have to be provided as a separate IO request, and this action can be included in the request for the last page.

[0101] According to the embodiment, the pages generated in steps 930 and 940 are linked or sorted in the list according to their logical order (e.g., sibling order of leaf pages). If the page has not yet appeared in the regular page area, the compute node loads page Ps into the regular page area. Shared page locks are released, except for shared locks on page Ps. It should be noted that page Ps (i.e., the first page accessed during the next NDP B+ tree traversal) is read as a regular page.

[0102] According to an embodiment, if multiple pages need to be submitted to the storage node in steps 930 to 950, these pages can be included in a single IO request. This optimization of page submission to the storage node can reduce the number of input / output operations per second (IOPS) between the compute node (SQL node) and the storage node.

[0103] In step 960, the compute node releases the shared lock applied to page P0, allowing concurrent modifications to the subtree. According to an embodiment, if asynchronous I / O is used during step 940, the shared lock on P0 can be released once the I / O request on Pn (e.g., the last page) is submitted. Therefore, in this case, the compute node is not required to wait for the I / O request to complete before releasing the shared lock on P0. Thus, being able to release the shared lock before I / O completion can be beneficial in an NDP context, as NDP-IO typically takes longer to complete than regular I / O, for example, because the storage node must perform SQL-related computations for NDP-IO.

[0104] After processing the rows on page Ps, in step 970, the compute node processes the first cached page (the NDP page generated in steps 930 and 940, such as a page already copied to the dedicated buffer pool associated with the query). The cached pages (NDP pages) are processed in logical order (e.g., based on the fact that they are linked in a list). When processing of a particular cached page is complete, the compute node releases that particular page back to the buffer pool's free list. Step 970 is repeated until all cached pages are processed.

[0105] When no more cached pages need to be processed (i.e., after processing rows on NDP pages P1 through Pn), the compute node determines the next page ID based on whether the current page is in the NDP cache region or the regular page region (e.g., relocating to the B+ tree). If the current page is in the NDP cache region, the value of the last record on the page is used to initiate a B+ tree search (i.e., returning to step 910), because the next page ID recorded in page Pn is not trusted. The next page ID of page Pn (e.g., the page ID of page Pn+1 located in the next sibling leaf) is not trusted and therefore cannot be used, as page Pn+1 may have been removed from the B+ tree. Therefore, another B+ tree search is needed to locate the next sibling leaf page.

[0106] On the other hand, if the current page is a regular page, the next page ID is reliable because the shared lock is applied to the regular page. Therefore, the next page ID is the next page ID of page Pn. If the compute node is the primary SQL node, it records or represents the latest LSN of the required page as the expected LSN while maintaining the shared lock on page P0. If the compute node is a replica SQL node, the expected LSN in the page read is used because the page read carries the expected LSN. The compute node releases the shared lock on the current page and performs step 940 as described above. The page ID of page Pn can be marked as "last page". The compute node then performs step 970.

[0107] Figure 10 This is a schematic diagram of an electronic device 1000 provided by different embodiments of the present invention, capable of performing any or all of the operations of the methods and features described explicitly or implicitly herein. For example, dedicated hardware capable of executing instructions for the operations of the methods and features described above may be configured as electronic device 1000. Electronic device 1000 may be a device forming part of a coordinator, platform controller, physical machine or server, physical storage server, or data storage device.

[0108] As shown in the figure, the device includes a processor 1010, such as a central processing unit (CPU) or a dedicated processor, such as a graphics processing unit (GPU) or other such processor unit; memory 1020; non-transient mass storage 1030; I / O interface 1040; network interface 1050; and transceiver 1060, all of which are communicatively coupled via a bidirectional bus 1070. According to some embodiments, any or all of the depicted elements may be used, or only a subset of the elements may be used. Furthermore, device 1000 may include multiple instances of certain elements, such as multiple processors, memories, or transceivers. Additionally, elements of the hardware device may be directly coupled to other elements that do not have a bidirectional bus. Besides the processor and memory, other electronic components such as integrated circuits may be used to perform the required logical operations.

[0109] Memory 1020 may include any type of non-transitory memory, such as static random access memory (SRAM), dynamic random access memory (DRAM), synchronous DRAM (SDRAM), read-only memory (ROM), or combinations thereof. Mass storage 1030 may include any type of non-transitory storage device, such as a solid-state drive, hard disk drive, disk drive, optical disk drive, USB drive, or any computer program product for storing data and machine-executable program code. According to some embodiments, memory 1020 or mass storage 1030 may have statements and instructions executable by processor 1010 for performing any of the above-described methods recorded thereon.

[0110] Based on the description of the above embodiments, the present invention can be implemented solely in hardware, or it can be implemented using software and necessary general-purpose hardware platforms. Based on this understanding, the technical solution of the present invention can be embodied in the form of a software product. The software product can be stored in a non-volatile or non-transient storage medium, such as a compact disk read-only memory (CD-ROM), a USB flash drive, or a portable hard drive. The software product includes numerous instructions that enable a computer device (personal computer, server, or network device) to perform the methods provided in the embodiments of the present invention. For example, such execution may correspond to the simulation of logical operations as described herein. According to exemplary embodiments, the software product may additionally or alternatively include multiple instructions that enable a computer device to perform operations configuring or programming digital logic devices.

[0111] Although the invention has been described with reference to specific features and embodiments thereof, it will be apparent that various modifications and combinations thereof can be made without departing from the invention. The specification and drawings are to be regarded merely as an illustration of the invention as defined in the appended claims and any and all modifications, variations, combinations or equivalents falling within the scope of this specification are to be considered.

Claims

1. A method for retrieving one or more pages in response to a query using near data processing (NDP) in a cloud-native database, characterized in that, The method includes: Receive the query, the query including information indicating one or more desired pages, the information further indicating a version of each of the one or more desired pages; Scan the general buffer pool to identify one or more of the required pages; When identifying one or more required pages from the required pages in the general buffer pool, the identified one or more required pages are copied to the dedicated buffer pool associated with the query; When it is determined that there are no other required pages in the general buffer pool, a request for one or more required pages from the remaining required pages to be identified is sent to one or more storage nodes. Receive the one or more remaining required pages; Copy the received one or more remaining required pages into the dedicated buffer pool.

2. The method according to claim 1, characterized in that, It also includes applying a shared lock to the identified one or more desired pages before copying them.

3. The method of claim 2, wherein, It also includes releasing the shared lock after one or more of the identified desired pages have been copied.

4. The method of claim 1, wherein, The version of the one or more required pages is defined by the log sequence number (LSN).

5. The method of claim 4, wherein, The latest version of the LSN maximum value definition.

6. The method according to claim 1, characterized in that, It also includes applying a shared lock on the root of the B+ tree associated with the one or more required pages.

7. The method of claim 6, wherein, It also includes releasing the shared lock on the root of the B+ tree.

8. The method of claim 1, wherein, It also includes copying one or more remaining required pages received into the general buffer pool.

9. A device supporting near data processing (NDP) in cloud-native databases, characterized in that, The device includes: A network interface is used to receive data from and send data to devices connected to the network. processor; A machine-readable storage device that stores machine-executable instructions, which, when executed by the processor, cause the device to: Receive a query, the query including information indicating one or more desired pages, the information further indicating a version of each of the one or more desired pages; Scan the general buffer pool to identify one or more of the required pages; When identifying one or more required pages from the required pages in the general buffer pool, the identified one or more required pages are copied to the dedicated buffer pool associated with the query; When it is determined that there are no other required pages in the general buffer pool, a request for one or more required pages from the remaining required pages to be identified is sent to one or more storage nodes. Receive the one or more remaining required pages; Copy the received one or more remaining required pages into the dedicated buffer pool.

10. A method for reading data from a data tree using near data processing (NDP) in a cloud-native database, characterized in that, The method includes: A shared page lock is applied to the root page and internal pages of the data tree in a top-down manner until P0 is reached, where P0 refers to the page located at the level immediately above the leaf level of the data tree. Obtain the desired log sequence number (LSN) while maintaining the shared page lock on P0; After obtaining the desired LSN, for each extracted subpage: The page is allocated from the free list of the buffer pool to the NDP cache region of the buffer pool, the NDP cache region being specified for a particular query. When allocating the page to the NDP cache area, it is determined whether the child page is found in the regular page area of the cache pool. Based on the determination, the subpage is fixed to the page allocated in the NDP cache area; Release the shared page lock applied on P0; Process the page allocated to the NDP cache region; After processing is complete, each page in the NDP cache region is released back to the free list of the buffer pool.

11. The method according to claim 10, characterized in that, Also includes: After processing the page assigned to the NDP cache region, the next page to be accessed is determined based on whether the last subpage was in the NDP cache region or in the regular page region.

12. The method according to claim 11, characterized in that, The last subpage is in the NDP cache area, and the next accessed page is determined based on another search of the data tree initiated using the last record on the last subpage.

13. The method of claim 11, wherein, The last subpage is located in the regular page area, and the next accessed page is determined based on the next page identifier (ID) of the last subpage.

14. The method of claim 10, wherein, The subpage is found in the regular page area of the buffer pool, and fixing the subpage includes: For each subpage: Apply a shared page lock to the subpage; The subpage found in the regular page area is copied to the page allocated in the NDP cache area; Release the shared page lock on the subpage.

15. The method of claim 10, wherein, If the subpage is not found in the regular page area of the buffer pool, fixing the subpage includes: Use the desired LSN to submit an asynchronous NDP request for the subpage to the storage node.

16. The method of claim 15, wherein, Multiple subpages are submitted to the storage node through a single input / output (IO) request.

17. The method of claim 10, wherein, The expected LSN is only valid during the period when the subpage of P0 is read.

18. The method of claim 10, wherein, The allocation and release of pages assigned to the NDP cache region are performed solely under the control of the query that requested the NDP.

19. The method of claim 10, wherein, The data tree is a B+ tree.

20. A device supporting near data processing (NDP) in cloud-native databases, characterized in that, include: A network interface is used to receive data from and send data to devices connected to the network. processor; A machine-readable storage device that stores machine-executable instructions, which, when executed by the processor, cause the device to: A shared page lock is applied to the root page and internal pages of the data tree in a top-down manner until P0 is reached, where P0 refers to the page located at the level immediately above the leaf level of the data tree. Obtain the desired log sequence number (LSN) while maintaining the shared page lock on P0; After obtaining the desired LSN, for each extracted subpage: The page is allocated from the free list of the buffer pool to the NDP cache region of the buffer pool, the NDP cache region being specified for a particular query. When allocating the page to the NDP cache area, it is determined whether the child page is found in the regular page area of the cache pool. Based on the determination, the subpage is fixed to the page allocated in the NDP cache area; Release the shared page lock applied on P0; Process the page allocated to the NDP cache region; After processing is complete, each page in the NDP cache region is released back to the free list of the buffer pool.

Citation Information

Patent Citations

Dynamic hash table for efficient data access in relational database system
CN102362273A
Systems and methods for database management using append-only storage devices
CN111936977A

Patent Information

AI Technical Summary

Abstract

Description

Patent Citations

Dynamic hash table for efficient data access in relational database system

Systems and methods for database management using append-only storage devices