A method and system for realizing high availability of a massively parallel database
By using persistent memory devices and the RDMA protocol in the MPP database, the problems of slow data transfer speed and increased latency are solved, achieving efficient data access and synchronous transmission, and improving the reliability and scalability of the system.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- INSPUR SUZHOU INTELLIGENT TECH CO LTD
- Filing Date
- 2022-08-26
- Publication Date
- 2026-06-19
AI Technical Summary
Existing MPP databases suffer from slow data transfer speeds when using conventional hard disks and UDP network connections, impacting CPU computing efficiency and data persistence efficiency, and also experience increased latency due to multiple replications.
Persistent memory devices are used as storage media, and the RDMA protocol is used for data transmission to reduce data copying within the node and send data directly from the user-space cache to the target node's cache area.
It improves data access rate and synchronous transmission speed, simplifies data transmission process, and enhances system reliability and scalability.
Smart Images

Figure CN115510024B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of database high availability design technology, and specifically to a method and system for achieving high availability of large-scale parallel databases. Background Technology
[0002] Massively Parallel Processing (MPP) is a technology that utilizes a shared-nothing database cluster. In this cluster, each database node has its own independent hard disk storage and memory system. The business system distributes data across these nodes according to the database model. These nodes are interconnected via dedicated or general-purpose networks, collaborating to provide database services as a whole. Shared-nothing database clusters offer advantages such as full scalability, high availability, high performance, excellent cost-effectiveness, and resource sharing. Simply put, MPP distributes database query tasks in parallel across multiple servers and nodes. After computation on each node, the results are aggregated to obtain the final result (similar to a distributed file system). MPP technology consists of a cluster of multiple parallel nodes, each with computing, storage, and network resources. Computational tasks are distributed across the nodes in the cluster for parallel computation. After computation, the results are returned to the coordinating node via the network. An MPP database is a distributed parallel structured database cluster with a shared-nothing architecture. It boasts high performance, high availability, and high scalability, providing a cost-effective general-purpose computing platform for ultra-large-scale data management and is widely used to support various data warehouse systems, BI systems, and decision support systems. MPP employs a fully parallel and shared-nothing distributed flat architecture. In this architecture, each node is independent, self-sufficient, and peer-to-peer. Moreover, there are no single points of failure in the entire system, resulting in very strong scalability.
[0003] The current high-availability solution for MPP database technology deploys a backup node, coordinating node S, in the MPP database cluster, which is the same as the coordinating node M. Coordinating node M and coordinating node S are connected via InterConnect network UDP. When coordinating node M is running, it synchronizes system metadata to coordinating node S in real time to maintain real-time data consistency between the two nodes. When coordinating node M fails, coordinating node S becomes the master coordinating node of the MPP database.
[0004] Current MPP databases use conventional hard disks (SDD, SCSI, STAT, SAS, etc.) as storage media for persistence. However, the data read / write speed and latency of conventional hard disks are significantly slower than CPU and memory frequencies, impacting CPU computing efficiency and data persistence efficiency. Current high-availability solutions for MPP databases connect the primary coordinating node, backup coordinating node, and segment nodes via an InterConnect UDP network. During data transmission, the UDP network requires data to be copied from user space to kernel space, and then from kernel space to the network card cache. Similarly, the reverse process is required at the target node. This results in multiple data copies within the nodes, increasing data transmission latency and reducing data transmission speed. Summary of the Invention
[0005] Current MPP databases use conventional hard disks (SDD, SCSI, STAT, SAS, etc.) as storage media for persistence. However, the data read / write speed and latency of conventional hard disks are significantly slower than CPU and memory frequencies, impacting CPU computing efficiency and data persistence efficiency. Current high-availability solutions for MPP databases connect the primary coordinating node, backup coordinating node, and segment nodes via an InterConnect UDP network. During data transmission, the UDP network requires data to be copied from user space to kernel space, and then from kernel space to the network card cache; similarly, the reverse process is required at the target node. This results in multiple data copies within the nodes, increasing data transmission latency and reducing data transmission speed. This invention provides a method and system for achieving high availability for large-scale parallel databases.
[0006] In a first aspect, the present invention provides a method for achieving high availability of a massively parallel database, based on a high-availability architecture for the massively parallel database. The high-availability architecture includes several nodes, each node comprising a coordinating node and computing nodes communicating with the coordinating node. The method includes the following steps:
[0007] Install persistent memory devices on all nodes and configure persistent memory parameters in the operating system;
[0008] Install a network interface card (NIC) that supports the RDMA protocol on the coordinating node and configure the NIC parameters in the operating system;
[0009] Configure the RDMA interface in the metadata management module of the coordinating node;
[0010] When the coordinating node receives a data write request, it configures the persistent memory interface to write the data to the persistent memory device of the compute node and writes the metadata file location data to the MPP library file manager.
[0011] When the coordinating node receives a read request, it configures the read interface to read the relevant data.
[0012] When coordinating nodes synchronize metadata files, the RDMA protocol is enabled, and the RDMA interface set by the metadata management module is called to transfer the metadata files.
[0013] Preferably, when the coordinating node receives a data write request, the steps of configuring the persistent memory interface to write the data to the persistent memory device of the compute node and writing the metadata file location data to the MPP library file manager include:
[0014] When the coordinating node receives a data write request, it determines whether each node has installed a persistent memory device.
[0015] If not, use the hard drive interface to write the data to the hard drive;
[0016] If so, call the persistent memory interface to write the data to the persistent memory device of the compute node, record the metadata to the metadata file, and write the metadata file location data to the MPP library file manager.
[0017] Preferably, the method further includes:
[0018] Real-time monitoring of the remaining storage space of persistent memory devices;
[0019] When the remaining storage space of the persistent memory device is less than the set first threshold, the persistent memory device data transfer algorithm is activated to transfer the data in the persistent memory device to the hard disk.
[0020] Preferably, the steps of calling the persistent memory interface to write data to the persistent memory device of the compute node and recording metadata to the metadata file include:
[0021] The coordinating node calls the persistent memory interface to forward the write request to the compute node where the persistent memory interface is located;
[0022] The persistent memory interface requests storage space from the persistent memory manager of the compute node.
[0023] If the application is successful, the data will be written to a persistent memory device and the metadata will be recorded to a metadata file.
[0024] If storage space is insufficient, the persistent memory device data transfer algorithm is activated to transfer the data in the persistent memory device to the hard disk, thereby expanding the remaining storage space of the persistent memory device;
[0025] In response to a persistent memory interface request, the persistent memory manager allocates storage space; the execution steps are: writing data to the persistent memory device and recording metadata to a metadata file.
[0026] Preferably, the step of initiating a persistent memory device data transfer algorithm to transfer data from the persistent memory device to the hard disk includes:
[0027] The size of the data to be transferred is determined based on the requested storage space and the remaining storage space; where, the size of the data to be transferred = requested storage space - remaining storage space + first threshold.
[0028] Sort the data in persistent memory devices according to their popularity;
[0029] Data is selected from low to high popularity to generate data blocks, and the size of each data block is equal to the determined size of the data to be transferred.
[0030] The generated data blocks are transferred to the hard drive, and the transferred data is deleted from the persistent memory device.
[0031] Preferably, the coordination nodes include a primary coordination node and a backup coordination node;
[0032] When coordinating nodes synchronize metadata files, the steps for transferring metadata files by enabling the RDMA protocol and calling the RDMA interface set by the metadata management module include:
[0033] When the primary coordinating node synchronizes metadata files to the backup coordinating node, it determines whether the coordinating node is configured with a network interface card that supports the RDMA protocol.
[0034] If so, enable the RDMA protocol, call the metadata management module to set the RDMA interface, and transfer the metadata file of the primary coordinating node to the backup coordinating node;
[0035] Otherwise, the metadata file of the primary coordinating node is transmitted to the backup coordinating node via the TCP / IP protocol.
[0036] Preferably, the step of determining whether the coordinating node is configured with a network interface card that supports the RDMA protocol includes:
[0037] Read system network card configuration parameters;
[0038] Read the persistent memory configuration parameters of the coordinating node;
[0039] Based on the read network card configuration parameters and the coordinator node's persistent memory configuration parameters, determine whether the coordinator node is configured with a network card that supports the RDMA protocol.
[0040] Preferably, when the coordinating node receives a read request, the steps for configuring the read interface to read the relevant data include:
[0041] When the coordinating node receives a request to read metadata, it configures the metadata file reading interface to read the metadata file directory of the MPP library file manager;
[0042] When the coordinating node receives a read data request, it determines the data storage location;
[0043] When data is in persistent memory, configure the persistent memory read interface to read data from the persistent memory device;
[0044] When the data is on the hard drive, configure the hard drive read interface to read the data on the hard drive.
[0045] Secondly, the technical solution of the present invention provides a system for realizing high availability of large-scale parallel databases, including several nodes, wherein the nodes include a coordinating node and computing nodes that communicate with the coordinating node;
[0046] Each node has a persistent memory device installed, and the coordinating node has a network card that supports the RDMA protocol.
[0047] The coordinating node is equipped with an RDMA interface. When coordinating nodes synchronize metadata files, the RDMA protocol is enabled and the RDMA interface is called to transfer the metadata files.
[0048] The persistent memory device is equipped with a persistent memory interface, which is configured to write data to the persistent memory device of the compute node when the coordinating node receives a data write request.
[0049] Preferably, the coordinating node is used to call the persistent memory interface to forward write requests to the computing node where the persistent memory interface is located; request storage space from the persistent memory manager of the computing node through the persistent memory interface; if the request is successful, write the data to the persistent memory device and record the metadata to the metadata file; if the storage space is insufficient, start the persistent memory device data transfer algorithm to transfer the data in the persistent memory device to the hard disk and expand the remaining storage space of the persistent memory device; for the persistent memory interface request, the persistent memory manager allocates storage space.
[0050] As can be seen from the above technical solutions, this invention has the following advantages: First, using persistent memory (PMM) as the persistent medium for the system library in the MPP database coordination node greatly improves the data access speed compared to the current use of conventional hard disks (SDD, SCSI, STAT, SAS, etc.) as storage media for MPP database persistence. Second, this invention uses RDMA (Remote Direct Data Access) technology as the data transmission scheme between the master and slave nodes in the coordination node, reducing data replication within the nodes. According to the RDMA protocol, data is sent directly from the user-space cache of the MPP database data source node to the cache area of the MPP data target node, improving the speed of data synchronization transmission.
[0051] Furthermore, the design principle of this invention is reliable, the structure is simple, and it has a very wide range of application prospects.
[0052] Therefore, it is evident that the present invention has outstanding substantive features and significant progress compared with the prior art, and the beneficial effects of its implementation are also obvious. Attached Figure Description
[0053] To more clearly illustrate the technical solutions in the embodiments of the present invention or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, for those skilled in the art, other drawings can be obtained based on these drawings without creative effort.
[0054] Figure 1 This is a schematic flowchart of a method according to an embodiment of the present invention.
[0055] Figure 2 This is a schematic block diagram of a system according to an embodiment of the present invention. Detailed Implementation
[0056] Current MPP database persistence uses conventional hard disks (SDD, SCSI, STAT, SAS, etc.) as storage media. However, the data read / write speed and latency of conventional hard disks are significantly slower than CPU and memory frequencies, impacting CPU computing efficiency and data persistence efficiency. Current high-availability solutions for MPP databases connect the primary coordinating node, backup coordinating node, and segment nodes (compute nodes) via an InterConnect UDP network. During data transmission, the UDP network requires copying from user space to kernel space, then from kernel space to the network card cache, and vice versa at the target node. This results in multiple data copies within the node, increasing latency and reducing speed. This invention uses a PMM persistent memory device as the system library persistence medium in the MPP database coordinating node. Compared to current MPP database persistence using conventional hard disks (SDD, SCSI, STAT, SAS, etc.), this significantly improves data access speed. Secondly, this invention uses RDMA (Remote Direct Data Access) technology for data transmission between the primary and backup coordinating nodes, reducing data copying within the node. According to the RDMA protocol, data is sent directly from the user-space cache of the MPP library data source node to the cache of the MPP data target node, improving the speed of data synchronization and transmission. To enable those skilled in the art to better understand the technical solutions of this invention, the technical solutions of the embodiments of this invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only a part of the embodiments of this invention, and not all of them. Based on the embodiments of this invention, all other embodiments obtained by those skilled in the art without creative effort should fall within the scope of protection of this invention.
[0057] It should be noted that PMM (Persistent Memory Module) is a persistent memory device (non-volatile memory).
[0058] RDMA (Remote Direct Memory Access);
[0059] MPP (Massively Parallel Processing)
[0060] CPU (central processing unit)
[0061] DB (Database) database;
[0062] A segment is a subset of data divided according to logical rules.
[0063] Memory;
[0064] A Node is a computer / server.
[0065] PB is a database storage unit. 1PB = 1024TB, 1TB = 1024GB. GB stands for gigabyte.
[0066] Big data;
[0067] SQL (Structured Query Language)
[0068] API (Application Programming Interface) is an interface used for application programming (development).
[0069] like Figure 1 As shown, this embodiment of the invention provides a method for achieving high availability of a massively parallel database. Based on a high-availability architecture for the massively parallel database, the high-availability architecture includes several nodes, each node including a coordinating node and computing nodes communicating with the coordinating node. The method includes the following steps:
[0070] Step 1: Install persistent memory devices on all nodes and configure persistent memory parameters in the operating system;
[0071] Step 2: Install a network card that supports the RDMA protocol on the coordinating node and configure the network card parameters in the operating system;
[0072] Step 3: Configure the RDMA interface in the metadata management module of the coordinating node;
[0073] Step 4: When the coordinating node receives a data write request, it configures the persistent memory interface to write the data to the persistent memory device of the compute node and writes the metadata file location data to the MPP library file manager;
[0074] Step 5: When the coordinating node receives a read request, it configures the read interface to read the relevant data;
[0075] Step 6: When coordinating nodes synchronize metadata files, enable the RDMA protocol and call the RDMA interface set by the metadata management module to transfer the metadata files.
[0076] It should be noted that massively parallel databases (MPP) possess the following technical characteristics:
[0077] 1) Low hardware cost: Servers are built entirely on x86 / ARM architecture, eliminating the need for expensive servers and disk arrays; 2) Cluster architecture and deployment: Fully parallel, massively parallel, and shared-nothing distributed architecture, deployed with a coordinating node architecture and a flat, peer-to-peer node structure; 3) Massive data distributed compression storage: Capable of handling PB-level or higher structured data, using hash distribution and random storage strategies for data storage; advanced compression algorithms reduce the space required for data storage by 1-20 times, and correspondingly improve I / O performance; 4) High data loading efficiency: Policy-based data loading mode, with an overall cluster loading speed of up to 2TB / h; 5) High scalability and high reliability: Supports scaling up and down cluster nodes, and supports full and incremental backup / recovery; 6) High availability and easy maintenance: Data is protected by redundancy through replicas, with automatic fault detection and management, and automatic synchronization of metadata and business data. Provides graphical tools to simplify database management for administrators; 7) High concurrency: Read and write operations are not mutually exclusive, supporting data loading and querying simultaneously, with a single node having a concurrency capacity of more than 300 users; 8) Standardization: Supports SQL92 standard and interface specifications such as C API, ODBC, JDBC, and ADO.NET.
[0078] Each node in a high-availability architecture for a massively parallel database includes the following software modules:
[0079] Parser: The coordinating node acts as the brain of the entire distributed system cluster, responsible for receiving client connections and processing requests. Similar to a single-machine database node, for each connection request, a database engine process is started on the coordinating node to handle the query statement submitted for that connection. For each incoming query statement, the parser in the database engine process performs syntax analysis and lexical analysis to generate a parse tree.
[0080] Optimizer: The optimizer generates a query plan based on the parse tree produced by the parser. The query plan describes how to execute the query. The quality of the query plan directly affects the execution efficiency of the query.
[0081] Scheduler: In the MPP database, the scheduler is responsible for allocating the computing resources needed to process queries and sending query plans to each compute node. In the MPP database, compute nodes are called Segment nodes, and each Segment instance is essentially a database instance. The scheduler determines the computing resources required for the execution plan based on the query plan generated by the optimizer, and then sends connection requests to each Segment instance via the libpg protocol, creating query processes through the database processes on the Segment instances. The scheduler is also responsible for the entire lifecycle of these query processes.
[0082] Executor: After each query process receives the query plan sent from the scheduler, it executes the task assigned to it through the executor. An additional Motion operation node is added to handle data exchange between different query processes.
[0083] Interconnect, the data exchange component: A key difference between MPP databases and single-machine databases when executing queries is the data exchange between different computing nodes. In the MPP database system architecture, the Interconnect component is responsible for data exchange. To ensure data transmission efficiency and system scalability, MPP databases implement a data exchange component based on the UDP protocol. As mentioned earlier when discussing the executor, the Motion operation nodes introduced by MPP databases redistribute data among different computing nodes through the Interconnect component.
[0084] Distributed Transactions: MPP databases ensure system information consistency through distributed transactions, more specifically, through two-phase commit to ensure the consistency of system metadata. A distributed transaction manager on the coordinating node coordinates commit and rollback operations on the segment nodes. Each segment instance has its own transaction log, determining when to commit and rollback its transactions. Local transaction state is stored in its local transaction log.
[0085] Metadata Tables: Metadata tables are responsible for storing and managing metadata for databases, tables, fields, etc. The metadata table on the coordinating node contains metadata for global database objects, called the global metadata table; each Segment instance also has a local copy of the metadata for local database objects, called the local metadata table. Stateless components such as the parser, optimizer, scheduler, executor, and Interconnect need to access metadata table information during runtime to determine the logic to execute. Because metadata tables are distributed across different nodes, maintaining consistency of information in these tables is an extremely challenging task. If inconsistencies occur in the metadata tables, the entire distributed database system will not function properly.
[0086] Persistence: In a narrow sense, persistence refers only to permanently saving domain objects to the database; in a broad sense, persistence includes all kinds of database-related operations.
[0087] Save: Permanently saves the domain object to the database.
[0088] Update: Update the state of the domain objects in the database.
[0089] Delete: Remove a domain object from the database.
[0090] Loading: Loads a domain object from the database into memory based on a specific domain primary key.
[0091] Query: Based on specific query conditions, load one or more domain objects that meet the query conditions from the database into memory. In this article, persistence is defined in a broad sense, including the processes of saving, updating, deleting, loading, and querying data.
[0092] In some embodiments, step 4, when the coordinating node receives a data write request, includes configuring the persistent memory interface to write data to the persistent memory device of the compute node and writing the metadata file location data to the MPP library file manager.
[0093] Step 41: When the coordinating node receives a data write request, it determines whether each node has installed a persistent memory device;
[0094] If not, proceed to step 42; if yes, proceed to step 43.
[0095] Step 42: Use the hard disk interface to write data to the hard disk;
[0096] Step 43: Call the persistent memory interface to write data to the persistent memory device of the compute node, record metadata to the metadata file, and write the metadata file location data to the MPP library file manager.
[0097] In some embodiments, to ensure the availability of persistent memory devices, the method further includes:
[0098] Step 101: Monitor the remaining storage space of persistent memory devices in real time;
[0099] Step 102: When the remaining storage space of the persistent memory device is less than the set first threshold, start the persistent memory device data transfer algorithm to transfer the data in the persistent memory device to the hard disk.
[0100] Accordingly, step 43, which involves calling the persistent memory interface to write data to the persistent memory device of the compute node and recording metadata to the metadata file, includes:
[0101] Step 431: The coordinating node calls the persistent memory interface to forward the write request to the compute node where the persistent memory interface is located;
[0102] Step 432: The persistent memory interface requests storage space from the persistent memory manager of the compute node.
[0103] If the application is successful, proceed to step 433; if the storage space is insufficient, proceed to step 434.
[0104] Step 433: Write the data to the persistent memory device and record the metadata to the metadata file.
[0105] Step 434: Activate the persistent memory device data transfer algorithm to transfer the data in the persistent memory device to the hard disk, expanding the remaining storage space of the persistent memory device; proceed to step 435;
[0106] Step 435: In response to the persistent memory interface request, the persistent memory manager allocates storage space; Execution steps: Write data to the persistent memory device and record metadata to the metadata file.
[0107] In some embodiments, step 434, which involves initiating a persistent memory device data transfer algorithm to transfer data from the persistent memory device to the hard disk, includes:
[0108] Step 4341: Determine the size of the data to be transferred based on the requested storage space and the remaining storage space; where, the size of the data to be transferred = requested storage space - remaining storage space + first threshold;
[0109] Step 4342: Sort the data in the persistent memory device according to its popularity;
[0110] Step 4343: Select data from low to high popularity to generate data blocks. The size of the data block is equal to the determined size of the data to be transferred.
[0111] Step 4344: Transfer the generated data blocks to the hard drive and delete the transferred data from the persistent memory device.
[0112] In some embodiments, the coordinating node includes a primary coordinating node and a backup coordinating node;
[0113] In step 6, when coordinating nodes synchronize metadata files, the steps of enabling the RDMA protocol and calling the RDMA interface set by the metadata management module to transfer metadata files include:
[0114] Step 61: When the primary coordinating node synchronizes the metadata file to the backup coordinating node, determine whether the coordinating node is configured with a network card that supports the RDMA protocol;
[0115] If yes, proceed to step 62; otherwise, proceed to step 63.
[0116] Step 62: Enable the RDMA protocol, call the metadata management module to set the RDMA interface, and transfer the metadata file of the primary coordinating node to the backup coordinating node;
[0117] Step 63: Transmit the primary coordinating node metadata file to the backup coordinating node via TCP / IP protocol.
[0118] It should be noted that step 61, which involves determining whether the coordinating node is configured with a network interface card that supports the RDMA protocol, includes:
[0119] Step 611: Read the system network card configuration parameters;
[0120] Step 612: Read the persistent memory configuration parameters of the coordinating node;
[0121] Step 613: Determine whether the coordinating node is configured with a network card that supports the RDMA protocol based on the read network card configuration parameters and the coordinating node persistent memory configuration parameters.
[0122] In some embodiments, step 5, when the coordinating node receives a read request, includes configuring the read interface to read the relevant data, which includes:
[0123] Step 51a: When the coordinating node receives a request to read metadata, it configures the metadata file reading interface to read the MPP library file manager's metadata file directory;
[0124] Step 51b: When the coordinating node receives a read data request, it determines the data storage location;
[0125] When the data is in persistent memory, execute step 51b1; when the data is on disk, execute step 51b1.
[0126] Step 51b1: Configure the persistent memory read interface to read data from the persistent memory device;
[0127] Step 51b2: Configure the hard drive read interface to read data from the hard drive.
[0128] like Figure 2 As shown, this embodiment of the invention provides a system for achieving high availability of a large-scale parallel database, comprising several nodes, including a coordinating node and computing nodes that communicate with the coordinating node;
[0129] Each node has a persistent memory device installed, and the coordinating node has a network card that supports the RDMA protocol.
[0130] The coordinating node is equipped with an RDMA interface. When coordinating nodes synchronize metadata files, the RDMA protocol is enabled and the RDMA interface is called to transfer the metadata files.
[0131] The persistent memory device is equipped with a persistent memory interface, which is used to configure the persistent memory interface to write data to the persistent memory device of the compute node when the coordinating node receives a data write request. The CPU within each node is connected to the persistent memory device and the hard drive.
[0132] The coordinating node is used to call the persistent memory interface to forward write requests to the compute node where the persistent memory interface is located; it requests storage space from the persistent memory manager of the compute node through the persistent memory interface; if the request is successful, the data is written to the persistent memory device and the metadata is recorded to the metadata file; if the storage space is insufficient, the persistent memory device data transfer algorithm is started to transfer the data in the persistent memory device to the hard disk, expanding the remaining storage space of the persistent memory device; the persistent memory manager allocates storage space for the persistent memory interface request.
[0133] Although the present invention has been described in detail with reference to the accompanying drawings and preferred embodiments, the invention is not limited thereto. Various equivalent modifications or substitutions can be made to the embodiments of the invention by those skilled in the art without departing from the spirit and essence of the invention, and such modifications or substitutions should all be within the scope of the invention. Any variations or substitutions that can be easily conceived by those skilled in the art within the technical scope disclosed in the invention should also be covered within the protection scope of the invention. Therefore, the protection scope of the invention should be determined by the scope of the claims.
Claims
1. A method for achieving high availability of a massively parallel database, characterized in that, Based on a high-availability architecture for a massively parallel database, the high-availability architecture includes several nodes, each node including a coordinating node and computing nodes that communicate with the coordinating node. The method includes the following steps: Install persistent memory devices on all nodes and configure persistent memory parameters in the operating system; Install a network interface card (NIC) that supports the RDMA protocol on the coordinating node and configure the NIC parameters in the operating system; Configure the RDMA interface in the metadata management module of the coordinating node; When the coordinating node receives a data write request, it configures the persistent memory interface to write the data to the persistent memory device of the compute node and writes the metadata file location data to the MPP library file manager. When the coordinating node receives a read request, it configures the read interface to read the relevant data. When coordinating nodes synchronize metadata files, the RDMA protocol is enabled, and the RDMA interface set by the metadata management module is called to transfer the metadata files. The steps involved in configuring the persistent memory interface to write data to the persistent memory device of the compute node and writing the metadata file location data to the MPP library file manager when the coordinating node receives a data write request include: When the coordinating node receives a data write request, it determines whether each node has installed a persistent memory device. If not, use the hard drive interface to write the data to the hard drive; If so, call the persistent memory interface to write the data to the persistent memory device of the compute node, record the metadata to the metadata file, and write the metadata file location data to the MPP library file manager; The steps of calling the persistent memory interface to write data to the persistent memory device of the compute node and recording metadata to the metadata file include: The coordinating node calls the persistent memory interface to forward the write request to the compute node where the persistent memory interface is located; The persistent memory interface requests storage space from the persistent memory manager of the compute node. If the application is successful, the data will be written to a persistent memory device, and the metadata will be recorded to a metadata file; If storage space is insufficient, the persistent memory device data transfer algorithm is activated to transfer the data in the persistent memory device to the hard disk, thereby expanding the remaining storage space of the persistent memory device; In response to a persistent memory interface request, the persistent memory manager allocates storage space; the execution steps are: writing data to the persistent memory device and recording metadata to a metadata file; The steps to initiate a persistent memory device data transfer algorithm and transfer data from the persistent memory device to the hard disk include: The size of the data to be transferred is determined based on the requested storage space and the remaining storage space; where, the size of the data to be transferred = requested storage space - remaining storage space + first threshold. Sort the data in persistent memory devices according to their popularity; Data is selected from low to high popularity to generate data blocks, and the size of each data block is equal to the determined size of the data to be transferred. The generated data blocks are transferred to the hard drive, and the transferred data is deleted from the persistent memory device. The coordination nodes include the primary coordination node and the backup coordination node; When coordinating nodes synchronize metadata files, the steps for transferring metadata files by enabling the RDMA protocol and calling the RDMA interface set by the metadata management module include: When the primary coordinating node synchronizes metadata files to the backup coordinating node, it determines whether the coordinating node is configured with a network interface card that supports the RDMA protocol. If so, enable the RDMA protocol, call the metadata management module to set the RDMA interface, and directly transfer the metadata file of the primary coordinating node from the user-space cache to the cache area of the backup coordinating node to improve the speed of data synchronization and transmission. Otherwise, the primary coordinating node's metadata file is transmitted to the backup coordinating node via the TCP / IP protocol; The method also includes: Real-time monitoring of the remaining storage space of persistent memory devices; When the remaining storage space of the persistent memory device is less than the set first threshold, the persistent memory device data transfer algorithm is started to transfer the data in the persistent memory device to the hard disk; The steps to determine whether the coordinating node is configured with a network interface card that supports the RDMA protocol include: Read system network card configuration parameters; Read the persistent memory configuration parameters of the coordinating node; Determine whether the coordinating node is configured with a network card that supports the RDMA protocol based on the read network card configuration parameters and the coordinating node persistent memory configuration parameters; When the coordinating node receives a read request, the steps for configuring the read interface to read the relevant data include: When the coordinating node receives a request to read metadata, it configures the metadata file reading interface to read the metadata file directory of the MPP library file manager; When the coordinating node receives a read data request, it determines the data storage location; When data is in persistent memory, configure the persistent memory read interface to read data from the persistent memory device; When the data is on the hard drive, configure the hard drive read interface to read the data on the hard drive.
2. A system for achieving high availability of a large-scale parallel database, characterized in that, It includes several nodes, including a coordinating node and computing nodes that communicate with the coordinating node; Each node has a persistent memory device installed, and the coordinating node has a network card that supports the RDMA protocol. The coordinating node is equipped with an RDMA interface. When coordinating nodes synchronize metadata files, the RDMA protocol is enabled and the RDMA interface is called to transfer the metadata files. The persistent memory device is equipped with a persistent memory interface, which is used to configure the persistent memory interface to write data to the persistent memory device of the compute node when the coordinating node receives a data write request, and to write the metadata file location data to the MPP library file manager; When the coordinating node receives a data write request, it configures the persistent memory interface to write the data to the persistent memory device of the compute node and writes the metadata file location data to the MPP library file manager, including: When the coordinating node receives a data write request, it determines whether each node has installed a persistent memory device. If not, use the hard drive interface to write the data to the hard drive; If so, call the persistent memory interface to write the data to the persistent memory device of the compute node, record the metadata to the metadata file, and write the metadata file location data to the MPP library file manager; This includes calling the persistent memory interface to write data to the persistent memory device of the compute node and recording metadata to a metadata file, including: The coordinating node calls the persistent memory interface to forward the write request to the compute node where the persistent memory interface is located; The persistent memory interface requests storage space from the persistent memory manager of the compute node. If the application is successful, the data will be written to a persistent memory device, and the metadata will be recorded to a metadata file; If storage space is insufficient, the persistent memory device data transfer algorithm is activated to transfer the data in the persistent memory device to the hard disk, thereby expanding the remaining storage space of the persistent memory device; In response to a persistent memory interface request, the persistent memory manager allocates storage space; the execution steps are: writing data to the persistent memory device and recording metadata to a metadata file; This includes initiating a persistent memory device data transfer algorithm to transfer data from the persistent memory device to the hard disk, including: The size of the data to be transferred is determined based on the requested storage space and the remaining storage space; where, the size of the data to be transferred = requested storage space - remaining storage space + first threshold. Sort the data in persistent memory devices according to their popularity; Data is selected from low to high popularity to generate data blocks, and the size of each data block is equal to the determined size of the data to be transferred. The generated data blocks are transferred to the hard drive, and the transferred data is deleted from the persistent memory device. The coordination nodes include the primary coordination node and the backup coordination node; When coordinating nodes synchronize metadata files, the RDMA protocol is enabled, and the RDMA interface is called to transfer the metadata files, including: When the primary coordinating node synchronizes metadata files to the backup coordinating node, it determines whether the coordinating node is configured with a network interface card that supports the RDMA protocol. If so, enable the RDMA protocol, call the RDMA interface, and directly transfer the primary coordinating node's metadata file from the user-space cache to the backup coordinating node's cache to improve the speed of data synchronization and transmission. Otherwise, the primary coordinating node's metadata file is transmitted to the backup coordinating node via the TCP / IP protocol; The coordinating node is used to call the persistent memory interface to forward write requests to the compute node where the persistent memory interface is located; it requests storage space from the persistent memory manager of the compute node through the persistent memory interface; if the request is successful, the data is written to the persistent memory device and the metadata is recorded to the metadata file; if the storage space is insufficient, the persistent memory device data transfer algorithm is started to transfer the data in the persistent memory device to the hard disk, expanding the remaining storage space of the persistent memory device; the persistent memory manager allocates storage space for the persistent memory interface request. The process of determining whether the coordinating node is configured with a network interface card that supports the RDMA protocol includes: Read system network card configuration parameters; Read the persistent memory configuration parameters of the coordinating node; Determine whether the coordinating node is configured with a network card that supports the RDMA protocol based on the read network card configuration parameters and the coordinating node persistent memory configuration parameters; When the coordinating node receives a read request, it configures the read interface to read the relevant data. The implementation methods include: When the coordinating node receives a request to read metadata, it configures the metadata file reading interface to read the metadata file directory of the MPP library file manager; When the coordinating node receives a read data request, it determines the data storage location; When data is in persistent memory, configure the persistent memory read interface to read data from the persistent memory device; When the data is on the hard drive, configure the hard drive read interface to read the data on the hard drive.