Database cluster deployment system, method, electronic device, and storage medium

By optimizing the deployment of database clusters in a cloud-native environment through DRBD replication technology, the problems of resource silos and excessive network I/O consumption are solved, achieving high availability and rapid fault recovery, and reducing downtime of business systems.

CN117453652BActive Publication Date: 2026-06-23E SURFING VISION TECHNOLOGY CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
E SURFING VISION TECHNOLOGY CO LTD
Filing Date
2023-10-26
Publication Date
2026-06-23

Smart Images

  • Figure CN117453652B_ABST
    Figure CN117453652B_ABST
Patent Text Reader

Abstract

The application discloses a database cluster deployment system and method, electronic equipment and storage medium, and is used for solving the technical problems of poor database cluster deployment effect and easy occurrence of "resource island" in related technologies. The database cluster deployment system provided by the application is located in a K8s cluster and at least comprises a cluster topology controller and a storage management controller in communication connection with the cluster topology controller. The cluster topology controller is used for detecting a to-be-deployed database cluster created by a user, and a first cloud native resource and a second cloud native resource are associated with the to-be-deployed database cluster. The storage management controller is used for binding a local storage resource object and a write service to the first cloud native resource, and binding a shared storage resource object and a read service to the second cloud native resource, so that the data can be directly synchronized from a physical log layer, the write amplification problem is reduced, and the "resource island" phenomenon is further reduced.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of database technology, and in particular to a database cluster deployment system, method, electronic device, and storage medium. Background Technology

[0002] With the continuous development of internet technology, more and more enterprises are choosing to use containerization to migrate applications to the cloud. Kubernetes (K8s, a container orchestration system) technology possesses cross-environment consistency characteristics, enabling various applications to achieve automatic elastic scaling, rolling updates, and self-healing in different cloud-native environments. To meet the ever-increasing demands, applications experiencing sudden traffic surges, such as video, 5G, live streaming, and large-scale financial services, have become extremely demanding in terms of resource elasticity. Databases, as stateful applications closely linked to core business operations, naturally expect to possess the same elasticity, standardization, and automation capabilities as K8s. Consequently, this demand places higher requirements on high-availability database solutions in terms of data reliability and performance.

[0003] To maximize disk I / O (Input / Output) performance, database cluster architectures typically use local storage. However, this approach often presents the following problems:

[0004] First, when a node failure occurs or a new instance needs to be added for cluster expansion, data files and logical logs (BINLOG) are transferred over the network. These actions generate a large amount of network I / O, which is detrimental to the timeliness and stability of cluster expansion and fault recovery. In particular, the larger the data volume, the more difficult the disaster recovery becomes, which has a huge impact on the continuity of online services.

[0005] Secondly, different instances have different disk loads. Some instances have scarce storage resources, while others have idle storage resources. Storage resources cannot be shared, resulting in a "resource silo" problem. Summary of the Invention

[0006] This invention provides a database cluster deployment system, method, electronic device, and storage medium to solve or partially solve the technical problems of poor database cluster deployment performance and the tendency to create "resource silos" in existing related technologies.

[0007] This invention provides a database cluster deployment system located on a Kubernetes cluster. The system includes a cluster topology controller and a storage management controller communicatively connected to the cluster topology controller.

[0008] The cluster topology controller is used to detect the database cluster to be deployed created by the user, and to associate and create a first cloud-native resource and a second cloud-native resource for the database cluster to be deployed.

[0009] The storage management controller is used to bind a local storage resource object and a write service to the first cloud-native resource, and to bind a shared storage resource object and a read service to the second cloud-native resource.

[0010] Optionally, the first cloud-native resource includes a primary instance container group and a backup instance container group, the primary instance container group corresponding to a first local storage root directory, and the backup instance container group corresponding to a second local storage root directory. The storage management controller is further configured to:

[0011] Based on the DRBD replication method, a primary-backup relationship is established between the primary instance container group and the backup instance container group;

[0012] When data synchronization between primary and backup instances is required, the DRBD replication method is used to synchronize the data between the first local storage root directory and the second local storage root directory in real time.

[0013] Optionally, the second cloud-native resource includes a plurality of slave instance container groups, each slave instance container group corresponding to a shared storage subdirectory, and the storage management controller is further configured to:

[0014] When data synchronization between backup and slave instances is required, the DRBD replication method is used to synchronize the data between the shared storage subdirectory and the second local storage root directory in real time.

[0015] Optionally, an NFS server is deployed on the Kubernetes cluster, and the storage management controller is further configured to:

[0016] Set the node control mode of the K8s cluster to a three-node mode;

[0017] Set the machine node where the main instance container group is located as the main node;

[0018] The machine node where the backup instance container group is located and the machine node where the NFS server is located are respectively set as secondary nodes, wherein the primary node and the two secondary nodes maintain three-node synchronization.

[0019] Optionally, the storage management controller is further configured to:

[0020] The system monitors the storage capacity and utilization of each database cluster in the K8s cluster in real time, identifies database clusters with storage capacity lower than a preset storage capacity threshold as target database clusters, and issues a storage space shortage warning signal to the target database clusters.

[0021] Optionally, the storage management controller is further configured to:

[0022] Real-time monitoring of the node running status of each machine node in the K8s cluster;

[0023] Machine nodes that experience operational failures are identified as faulty nodes, and resources are reallocated to these faulty nodes according to custom rules.

[0024] Optionally, the database cluster deployment system further includes an instance scheduling controller, which is communicatively connected to the cluster topology controller and the storage management controller; wherein, the instance scheduling controller is used for:

[0025] Obtain the running status data of each machine node in the K8s cluster;

[0026] Based on the operational status data, the database instance scheduling results of each machine node are adjusted to make real-time adjustments to the native scheduling mechanism of the K8s cluster.

[0027] Optionally, the instance scheduling controller is further configured to:

[0028] When the running status data indicates that there are machine nodes in the K8s cluster that meet the overload conditions, the machine nodes that meet the overload conditions are identified as overloaded nodes. When selecting machine node instance scheduling, the overloaded nodes are automatically skipped, and the instance elastic scaling resource object corresponding to the overloaded nodes is automatically generated.

[0029] The instance elastic scaling resource object is bound to the second cloud-native resource, and the instance elastic scaling resource object is used to dynamically add or delete instances of the second cloud-native resource.

[0030] Optionally, the instance scheduling controller is further configured to:

[0031] Configure a scheduling policy, which includes at least setting a disk capacity usage threshold and a disk IOPS upper limit.

[0032] Optionally, the cluster topology controller is further configured to:

[0033] Detect multiple machine nodes in the K8s cluster and identify the disk type of each machine node;

[0034] For each machine node, a local storage resource object corresponding to the machine node is automatically created according to the disk type.

[0035] Optionally, the cluster topology controller is further configured to:

[0036] The addition and deletion of instances in the K8s cluster and the role distribution of each instance are tracked in real time to maintain the topology of the K8s cluster.

[0037] This invention also provides a database cluster deployment method, which is applied to a database cluster deployment system located in a Kubernetes cluster. The database cluster deployment system includes a cluster topology controller and a storage management controller communicatively connected to the cluster topology controller. The method includes:

[0038] The cluster topology controller detects the database cluster to be deployed created by the user and associates and creates a first cloud-native resource and a second cloud-native resource with the database cluster to be deployed.

[0039] The storage management controller binds a local storage resource object and a write service to the first cloud-native resource, and binds a shared storage resource object and a read service to the second cloud-native resource.

[0040] Optionally, the method further includes:

[0041] Install the cluster topology controller, instance scheduling controller, and storage management controller into the Kubernetes cluster, and establish mutual communication between the cluster topology controller, the instance scheduling controller, and the storage management controller.

[0042] Optionally, the method further includes:

[0043] The instance is distributed to each machine node of the K8s cluster, and DRBD compatibility testing is performed on each machine node.

[0044] Machine nodes that do not have the DRBD program installed are identified as machine nodes to be installed, and deployment actions are performed on the machine nodes to be installed.

[0045] If an installation failure occurs during deployment, the reason for the failure will be recorded in the error output stream.

[0046] The present invention also provides an electronic device, the device comprising a processor and a memory:

[0047] The memory is used to store program code and transmit the program code to the processor;

[0048] The processor is used to execute the database cluster deployment method as described above, according to the instructions in the program code.

[0049] The present invention also provides a computer-readable storage medium for storing program code for executing the database cluster deployment method as described in any of the preceding claims.

[0050] As can be seen from the above technical solutions, the present invention has the following advantages:

[0051] To address the problems existing in current technologies, a database cluster deployment system and method are proposed. The database cluster deployment system is located in a Kubernetes cluster and includes at least a cluster topology controller and a storage management controller that communicates with the cluster topology controller. Specifically, the cluster topology controller detects the database cluster to be deployed created by the user and associates and creates a first cloud-native resource and a second cloud-native resource with the database cluster to be deployed. The storage management controller binds a local storage resource object and a write service to the first cloud-native resource, and binds a shared storage resource object and a read service to the second cloud-native resource. This enables direct data synchronization from the physical log layer, reduces write amplification problems, and further alleviates the "resource silo" phenomenon. Attached Figure Description

[0052] To more clearly illustrate the technical solutions in the embodiments of the present invention or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are only some embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0053] Figure 1 This is a schematic diagram of the structure of a database cluster deployment system provided in an embodiment of the present invention;

[0054] Figure 2 This is a schematic diagram of the topology of a database cluster provided in an embodiment of the present invention;

[0055] Figure 3 A flowchart illustrating the steps of a database cluster deployment method provided in an embodiment of the present invention. Detailed Implementation

[0056] This invention provides a database cluster deployment system, method, electronic device, and storage medium to solve or partially solve the technical problems of poor database cluster deployment performance and the tendency to create "resource silos" in existing related technologies.

[0057] To make the objectives, features, and advantages of this invention more apparent and understandable, the technical solutions of the embodiments of this invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the embodiments described below are only some embodiments of this invention, and not all embodiments. Based on the embodiments of this invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of this invention.

[0058] To enable those skilled in the art to better understand the technical solutions in the embodiments of the present invention, some technical features involved in the embodiments of the present invention are explained and described below:

[0059] K8s (Kubernetes, a container orchestration system): An open-source container orchestration platform that primarily provides the ability to automatically deploy, scale, and manage containerized applications across host clusters.

[0060] Pod (container group): The basic orchestration unit in Kubernetes, which is a tightly linked collection of containers.

[0061] DRBD (Distributed Replicated Block Device): An open-source software for replicating block devices within a cluster. DRBD allows block devices to be transparently replicated to other nodes across the network, achieving high availability and redundancy for storage.

[0062] Operator: An extension of Kubernetes that encapsulates the operational knowledge for managing Kubernetes applications.

[0063] StatefulSet: A workload controller used by Kubernetes to manage stateful applications.

[0064] StorageClass: In Kubernetes, StorageClass is a resource object used to define persistent storage. It provides an abstraction layer for describing and managing persistent storage of different storage types.

[0065] CRD (Custom Resource Definition): A custom resource definition in Kubernetes that allows users to define their own objects and manage their lifecycle on Kubernetes.

[0066] NFS Storage Volume External Provisioner: Allows Kubernetes clusters to fully utilize the isolated directory capabilities of NFS storage servers, supports multi-tenancy and storage performance isolation, and is a good solution for connecting Kubernetes and external NFS storage, and is widely used in production environments.

[0067] Machine node: In a distributed system, each machine that makes up the cluster can be a node, and therefore can be called a machine node.

[0068] As an example, for database cluster deployments, in order to maximize disk I / O performance, local storage is typically used in current database cluster architectures. However, this approach often presents the following problems:

[0069] First, when a node failure occurs or a new instance needs to be added for cluster expansion, data files and logical logs (BINLOG) are transferred over the network. These actions generate a large amount of network I / O, which is detrimental to the timeliness and stability of cluster expansion and fault recovery. In particular, the larger the data volume, the more difficult the disaster recovery becomes, which has a huge impact on the continuity of online services.

[0070] Secondly, different instances have different disk loads. Some instances have scarce storage resources, while others have idle storage resources. Storage resources cannot be shared, resulting in a "resource silo" problem.

[0071] Therefore, one of the core inventive points of this invention is: addressing the problems existing in the prior art, it proposes a cloud-native database cluster deployment system and method based on DRBD replication. First, through the collaboration of the cluster topology controller, instance scheduling controller, and Kubernetes orchestration service, when an instance fails, if it is a primary instance, it can be automatically transferred to a backup instance; if it is a slave instance, the instance replica can be quickly rebuilt. This allows for corresponding remedial measures to be taken based on different instance failures. When a new slave instance is added, it can be automatically scheduled to a machine node with low disk load. This reduces downtime of business systems and balances the utilization of storage resources through automated and highly stable operation and maintenance. Second, by regulating database storage resources through the storage management controller, instances can be scheduled to available nodes where other data volumes of DRBD replication are located in a very short time. This avoids the problem that data migration may require a large amount of network I / O due to the size of user data. At the same time, establishing data synchronization relationships between primary, backup, and slave instances based on DRBD replication can reduce write amplification and further alleviate the "resource silo" phenomenon. By adopting the technical solution of the present invention, data can be directly synchronized from the physical log layer, avoiding the previous logical log replay operation. It can achieve automated, highly stable, and near-real-time fault self-healing that is not affected by the size of the data volume, which can greatly improve database availability and minimize the downtime of business systems.

[0072] Reference Figure 1 The diagram shows a schematic representation of a database cluster deployment system provided in an embodiment of the present invention.

[0073] The database cluster deployment system resides on a Kubernetes cluster, from Figure 1 As can be seen, the database cluster deployment system mainly includes a cluster topology controller 101, a storage management controller 102, and an instance scheduling controller 103, wherein:

[0074] The functions implemented by the cluster topology controller 101 can be summarized as topology discovery, resource association, and role management. Specifically, the cluster topology controller 101 is mainly used to discover the database cluster to be deployed, monitor changes in instances within the cluster, and maintain the cluster's topology status. Specifically, through the cluster topology controller 101, newly added and deleted instances in the K8s cluster can be tracked, and the distribution of roles (master, standby, slave) among instances can be monitored to manage different roles. Furthermore, the cluster topology controller 101 can automatically create local storage resource objects corresponding to machine nodes based on the disk type of the machine nodes in the cluster to achieve resource association. In this configuration, the master and standby instances in the database cluster are carried by the same workload (defined as the first cloud-native resource StatefulSet in this embodiment), while the slave instance is carried by a different workload (defined as the second cloud-native resource StatefulSet in this embodiment).

[0075] The main functions implemented by the storage management controller 102 can be summarized as space alerting, categorized mounting, and data recovery. Specifically, the storage management controller 102 is mainly used to manage the lifecycle of storage resources in the database cluster, including the following aspects:

[0076] (1) Storage space management: Real-time monitoring of the storage capacity and utilization of each database cluster in the K8s cluster. When the storage space is insufficient, an early warning is issued. Specifically, database clusters with storage capacity lower than the preset storage capacity threshold can be identified as target database clusters, and an early warning signal for insufficient storage space can be issued for the target database clusters.

[0077] (2) Storage Classification: Supports setting different storage classes for primary (backup) workloads and slave workloads respectively, so as to connect to different storage backends.

[0078] (3) Fault detection and recovery: Real-time monitoring of the node operation status of each machine node in the K8s cluster. When a problem occurs, the machine node that has a running failure can be identified as a fault node. Resources will be reallocated for the fault node and the resource allocation work will be started according to the custom rules. This process does not require manual intervention. The custom rules refer to the allocation rules that the operation and maintenance personnel set in advance according to the actual situation. For example, it can be set that when a fault node occurs, the storage management controller 102 can automatically add the fault node to the list of nodes that are temporarily not allocated resources and issue a fault warning signal to inform the operation and maintenance personnel to troubleshoot in time. At the same time, the storage management controller 102 can continue to allocate the unallocated resources to the remaining nodes that have not failed. Or, in order to avoid resource processing errors, the resources that have been allocated to the fault node can be reallocated to other nodes that are idle or have large storage space to continue processing, etc. It is understood that the present invention does not limit this.

[0079] The main functions implemented by the instance scheduling controller 103 can be summarized as status monitoring, policy intervention, and replica scaling. Specifically, the instance scheduling controller 103 is mainly used to monitor the status of instances in the database cluster, obtain the instance load and resource usage, and then determine whether the machine node where the instance resides is overloaded based on a pre-set load threshold. For overloaded nodes, the instance scheduling controller 103 will automatically skip the instance scheduling selection during the machine node instance scheduling process. Simultaneously, the instance scheduling controller 103 can also support the configuration of scheduling policies, such as disk capacity usage thresholds and disk IOPS (Input / Output Operations Per Second) limits. Based on load metrics such as QPS (Queries Per Second) and TPS (Transactions Per Second), the system can also provide the ability to automatically scale slave instances. For example, based on the current load metrics, it can determine whether to increase or decrease slave instance replicas to optimize system efficiency. For instance, when TPS is high, it indicates that the current data processing speed is fast, and the number of slave instance replicas can be appropriately reduced to further accelerate the data synchronization rate and improve efficiency. When TPS is low, it indicates that the current data processing speed is slow, and in this case, the number of instance replicas can be appropriately increased to improve the data synchronization rate. It is understood that this invention does not impose any limitations on this.

[0080] In this embodiment of the invention, a database cluster deployment system based on DRBD replication in a cloud-native environment is provided. Combined with the database cluster deployment method provided in this embodiment of the invention, data can be directly synchronized from the physical log layer, avoiding the previous logical log replay operation. It can achieve automation, high stability, and near real-time fault self-healing unaffected by the data volume, greatly improving database availability and minimizing the downtime of business systems.

[0081] Based on the foregoing introduction of the database cluster deployment system, to enable those skilled in the art to better understand, the following embodiments of the present invention will combine the use of the database cluster deployment system with the appendix. Figure 2 The diagram illustrates the topology of the database cluster and further explains the main deployment process.

[0082] First, the database cluster deployment system can be initialized into the K8s cluster, and the Operator mode can be started to create related custom resources (CRDs) and their controllers, which may include cluster topology controllers, instance scheduling controllers, and storage management controllers.

[0083] Secondly, through the cluster topology controller, the user-created custom resource DatabaseCluster can be automatically discovered, and the custom resource DatabaseCluster can be used as the database cluster to be deployed. Two native resources StatefulSet are created for the database cluster to be deployed. To better distinguish them, in this embodiment of the invention, these two native resources StatefulSet are defined as the first cloud native resource and the second cloud native resource, respectively. The first cloud native resource is the master StatefulSet, and the second cloud native resource is the slave StatefulSet.

[0084] This can be achieved by binding a local storage resource object (StorageClass) LocalPV to the main StatefulSet through the storage management controller. Figure 2 The LocalStoragePV in the main StatefulSet is used for replication based on DRBD to establish the master-slave relationship between the two instances (Pods) in the main StatefulSet and to synchronize data in real time.

[0085] Next, you can bind a shared storage object (StorageClass) SharedPV to the StatefulSet through the storage management controller. Figure 2 The SharedStoragePV in the dataset has an initial instance replica count of one.

[0086] The primary instance mounts a shared storage subdirectory, while the primary and standby instances of the primary StatefulSet each mount a local storage root directory. To distinguish them, the local storage root directory mounted by the primary instance is designated as the first local storage root directory, and the local storage root directory mounted by the standby instance is designated as the second local storage root directory. Data synchronization between the shared storage subdirectory and the second local storage root directory is also based on DRBD replication.

[0087] Meanwhile, in this embodiment of the invention, two Services are associated with the database cluster to be deployed to achieve read-write separation. Specifically, one Service is bound as the main StatefulSet to provide write services to the outside world, and the other Service is bound as the slave StatefulSet to provide read services to the outside world.

[0088] in, Figure 2 In the processing flow of the primary StatefulSet on the left, the data direction arrow from the write service to the backup instance Pod is represented by a dashed line. This can be understood as follows: the backup instance Pod is only needed when the primary instance Pod fails and a failover is performed. Therefore, the data flow from the write service to the instance on the left is as follows: when the primary instance is running normally, there is only write service → primary instance; when the primary instance fails and switches to the backup instance, there is only write service → backup instance.

[0089] In its implementation, the cluster topology controller can detect user-created database clusters to be deployed and associate and create first and second cloud-native resources for them. Simultaneously, the cluster topology controller can also detect multiple machine nodes in the Kubernetes cluster and identify the disk type of each machine node. For each machine node, it automatically creates the corresponding local storage resource object based on the disk type. During cluster operation, the cluster topology controller can also track the addition and deletion of Kubernetes cluster instances and the role distribution of each instance in real time to maintain the topology state of the Kubernetes cluster.

[0090] The storage management controller can be used to bind a local storage resource object and a write service to the first cloud-native resource, and a shared storage resource object and a read service to the second cloud-native resource.

[0091] Furthermore, based on the foregoing, the first cloud-native resource can include a primary instance container group and a standby instance container group. The primary instance container group corresponds to the first local storage root directory, and the standby instance container group corresponds to the second local storage root directory. The second cloud-native resource includes several secondary instance container groups, each corresponding to a shared storage subdirectory. Therefore, the storage management controller can also be used to: establish a primary-standby relationship between the primary instance container group and the standby instance container group based on DRBD replication; when data synchronization between the primary and standby instances is required, DRBD replication is used to synchronize the data between the first and second local storage root directories in real time; and when data synchronization between the standby and secondary instances is required, DRBD replication is used to synchronize the data between the shared storage subdirectory and the second local storage root directory in real time.

[0092] Furthermore, to achieve storage performance isolation between Kubernetes and external NFS, an NFS server can be deployed on the Kubernetes cluster. To achieve data synchronization, the NFS server can be used to configure primary and secondary nodes according to the importance of the primary and secondary instances. The storage management controller can also be used for:

[0093] First, set the node control mode of the K8s cluster to three-node mode; then set the machine node where the primary instance container group is located as the primary node; at the same time, set the machine node where the backup instance container group is located and the machine node where the NFS server is located as secondary nodes, and keep the primary node and the two secondary nodes synchronized.

[0094] During data synchronization, the primary instance container group, acting as the master node, and the NFS server, acting as the secondary node, continuously synchronize data over the network at the kernel level. Therefore, by configuring node hierarchy through the storage management controller, it is possible to eliminate the need to pull the full amount of data from the master node when starting a standby instance, thus greatly improving synchronization efficiency.

[0095] Furthermore, since the storage management controller can also perform global control over the cluster's storage and resource allocation, it can also be used for:

[0096] For storage management, the system can monitor the storage capacity and utilization of each database cluster in the Kubernetes cluster in real time, identify database clusters with storage capacity below the preset storage capacity threshold as target database clusters, and issue a storage space shortage warning signal for the target database clusters.

[0097] Regarding resource allocation, the system can monitor the running status of each machine node in the K8s cluster in real time. When a fault occurs, the machine node that is experiencing a running fault can be identified as a faulty node, and resources can be reallocated to the faulty node according to custom rules.

[0098] Then, the instance scheduling controller can pull and parse the running status data of each machine node in the K8s cluster (since the database is an I / O-heavy application, the IOPS metric is prioritized by default), and automatically intervene in the native K8s scheduling mechanism based on the running status data to adjust the scheduling results of the database instance (Pod), so that the topology of the instance is as dispersed as possible to avoid disk I / O resource contention.

[0099] Finally, when the instance scheduling controller detects that the database load exceeds the set threshold, it can automatically generate an instance elastic scaling resource object HPA (Horizontal Pod Autoscaler, a mechanism for automatically scaling the number of Kubernetes workload replicas). The instance elastic scaling resource object HPA is then bound to a StatefulSet, enabling the dynamic addition and deletion of instance replicas (Pods). Based on shared storage, new slave instances can provide read services without requiring a large amount of network data transfer to recover data.

[0100] In the specific implementation, the instance scheduling controller can be used to: first obtain the running status data of each machine node in the K8s cluster; and then adjust the database instance scheduling results of each machine node based on the running status data, so as to make real-time adjustments to the native scheduling mechanism of the K8s cluster.

[0101] Furthermore, the instance scheduling controller can also be used to: when the running status data indicates that there are machine nodes in the K8s cluster that meet the overload conditions (such as exceeding a certain preset load threshold), identify the machine nodes that meet the overload conditions as overload nodes, automatically skip the overload nodes when selecting machine node instances for scheduling, and automatically generate the instance elastic scaling resource object (HPA) corresponding to the overload node; then bind the instance elastic scaling resource object (HPA) to the second cloud-native resource, wherein the instance elastic scaling resource object (HPA) can be used to dynamically add or delete instances of the second cloud-native resource.

[0102] Additionally, the instance scheduling controller can also be used to configure scheduling policies, such as setting disk capacity usage thresholds, disk IOPS limits, and so on.

[0103] In this embodiment of the invention, combined with the database cluster deployment system provided by this embodiment, a method for deploying a database cluster based on DRBD replication in a cloud-native environment is provided. First, through the collaboration of the cluster topology controller, instance scheduling controller, and Kubernetes orchestration service, when an instance fails, if it is a primary instance, it can be automatically transferred to a backup instance; if it is a slave instance, the instance replica can be quickly rebuilt. This allows for corresponding remedial measures to be taken based on different instance failures. When a new slave instance is added, it can be automatically scheduled to a machine node with low disk load. This reduces downtime of the business system and balances the utilization of storage resources through automated and highly stable operation and maintenance. Second, by regulating database storage resources through the storage management controller, instances can be scheduled to available nodes where other data volumes of DRBD replication are located in a very short time. This avoids the problem that data migration may require a large amount of network I / O due to the size of user data. At the same time, establishing data synchronization relationships between primary, backup, and slave instances based on DRBD replication reduces write amplification and further alleviates the "resource silo" phenomenon.

[0104] For ease of understanding, the following description uses a specific example to illustrate an embodiment of the present invention.

[0105] Step 1: Install the dependent software packages.

[0106] In this context, a dependent package refers to a package that requires other packages or libraries to function properly. On each machine node of the Kubernetes cluster, LVM2 (Logical Volume Manager Version 2) and a kernel-devel (a kernel header file) consistent with the currently running kernel version are installed and configured. Independent network logical interfaces are also planned to enable data synchronization between machine nodes based on DRBD.

[0107] Step 2: Start the Operator program.

[0108] Install the cluster topology controller, instance scheduling controller, and storage management controller to the Kubernetes cluster. Then, distribute task instances to each machine node for DRBD compatibility testing. If the machine node does not have the DRBD program installed, perform the deployment action. If the installation fails, record the error reason to the error output stream to facilitate troubleshooting by operations and maintenance personnel.

[0109] Step 3: Prepare storage resource objects.

[0110] The storage management controller can automatically discover the disk type of each machine node and automatically create a local storage resource object (StorageClass) named LocalPV.

[0111] The instance scheduling controller can obtain the disk operation status and remaining disk size of each machine node in the K8s cluster. After sorting and analyzing, it sends the machine node information (IP address 192.168.11.22) suitable for deploying the NFS server to the storage management controller.

[0112] Next, start the NFS server program using StatefulSet as the workload and LocalPV as the storage resource object on the appropriate machine node.

[0113] Install the external Provisioner for NFS storage volumes and create a shared storage resource object SharedPV-sample. The parameters for nfs.server are 192.168.11.22, and nfs.path is / mysql / sample.

[0114] Step 4: Create a database cluster.

[0115] Users can create or update DatabaseCluster instances, custom resources, using the "kubectl apply -f" command. For example, a portion of the code for a request content example is described below:

[0116]

[0117]

[0118] Wherein, name represents the name of the DatabaseCluster instance; namespace represents the namespace of the DatabaseCluster instance; topology represents the desired configuration of the cluster topology, including whether to enable standby instances and the number of replicas of slave instances; template represents the template of the database instance, including the image name, desired specification quota and storage space size, as well as disk type and file system type.

[0119] Step 5: Wait for the automatic deployment to finish.

[0120] When the cluster topology controller detects a DatabaseCluster resource object, it generates a list of creation requests for the primary StatefulSet and the secondary StatefulSets.

[0121] Before submitting the request to the Kubernetes orchestration service, the storage management controller intercepts the request and populates the LocalPV and SharedPV's StorageClass into the VolumeClaimTemplates field of both the master and slave StatefulSets.

[0122] Based on the example in step three above, if three slave instances are set up at this point, a total of four Pods will need to be created and scheduled (the backup instances will only be created during failover). Therefore, in the end, the primary instance (Pod) and three slave instances (Pods) will be in the RUNNING state.

[0123] Step 6: Rebuild the storage replication relationship.

[0124] Using the storage management controller, you can first set the node control mode of the Kubernetes cluster to three-node mode using "drbaadm adjust". Then, use "drbaadm primary --force" to set the node containing the primary instance of the primary StatefulSet as the primary node. Next, use "drbaadm secondary" to set the nodes containing the backup instance of the primary StatefulSet and the NFS server as secondary nodes. Finally, check if the current running status is that the three nodes are synchronized. If they are synchronized, the deployment of the database cluster based on DRBD replication is complete. If they are not synchronized, you can repeat the above steps until the three nodes are synchronized.

[0125] It should be noted that in this embodiment of the invention, since the data storage used by the slave instances is shared, each slave instance is a copy of the others. Horizontal scaling can be achieved by rebuilding a copy with the exact same resource specifications and parameter configuration as the slave instance. During data processing, if a host-level failure occurs, the slave instance will automatically migrate to another host. However, when there are insufficient host resources, only the original slave instance can provide services to the outside world.

[0126] Reference Figure 3 This document illustrates a flowchart of a database cluster deployment method according to an embodiment of the present invention. The method is applied to a database cluster deployment system located in a Kubernetes cluster. The database cluster deployment system includes a cluster topology controller and a storage management controller communicatively connected to the cluster topology controller. Specifically, the method may include the following steps:

[0127] Step 301: Detect the database cluster to be deployed created by the user through the cluster topology controller, and create a first cloud-native resource and a second cloud-native resource associated with the database cluster to be deployed;

[0128] Step 302: Bind a local storage resource object and a write service to the first cloud-native resource through the storage management controller, and bind a shared storage resource object and a read service to the second cloud-native resource.

[0129] In an optional embodiment, the method further includes:

[0130] Install the cluster topology controller, instance scheduling controller, and storage management controller into the Kubernetes cluster, and establish mutual communication between the cluster topology controller, the instance scheduling controller, and the storage management controller.

[0131] In an optional embodiment, the method further includes:

[0132] The instance is distributed to each machine node of the K8s cluster, and DRBD compatibility testing is performed on each machine node.

[0133] Machine nodes that do not have the DRBD program installed are identified as machine nodes to be installed, and deployment actions are performed on the machine nodes to be installed.

[0134] If an installation failure occurs during deployment, the reason for the failure will be recorded in the error output stream.

[0135] In one optional embodiment, the first cloud-native resource includes a primary instance container group and a backup instance container group, the primary instance container group corresponding to a first local storage root directory, and the backup instance container group corresponding to a second local storage root directory; the method further includes:

[0136] Based on the DRBD replication method, the primary and backup relationships between the primary instance container group and the backup instance container group are established through the storage management controller.

[0137] When data synchronization between primary and backup instances is required, the DRBD replication method is used, and the data between the first local storage root directory and the second local storage root directory is synchronized in real time through the storage management controller.

[0138] In one optional embodiment, the second cloud-native resource includes a plurality of slave instance container groups, each slave instance container group corresponding to a shared storage subdirectory; the method further includes:

[0139] When data synchronization between backup and slave instances is required, the DRBD replication method is used, and the data between the shared storage subdirectory and the second local storage root directory is synchronized in real time through the storage management controller.

[0140] In one optional embodiment, an NFS server is deployed on the Kubernetes cluster, and the method further includes:

[0141] The storage management controller is used to set the node control mode of the K8s cluster to a three-node mode.

[0142] Set the machine node where the main instance container group is located as the main node;

[0143] The machine node where the backup instance container group is located and the machine node where the NFS server is located are respectively set as secondary nodes, wherein the primary node and the two secondary nodes maintain three-node synchronization.

[0144] In an optional embodiment, the method further includes:

[0145] The storage management controller monitors the storage capacity and utilization of each database cluster in the K8s cluster in real time, identifies database clusters with storage capacity lower than a preset storage capacity threshold as target database clusters, and issues a storage space shortage warning signal to the target database clusters.

[0146] In an optional embodiment, the method further includes:

[0147] The storage management controller monitors the running status of each machine node in the K8s cluster in real time.

[0148] Machine nodes that experience operational failures are identified as faulty nodes, and resources are reallocated to these faulty nodes according to custom rules.

[0149] In an optional embodiment, the database cluster deployment system further includes an instance scheduling controller, which is communicatively connected to the cluster topology controller and the storage management controller; wherein, the method further includes:

[0150] The instance scheduling controller obtains the running status data of each machine node in the K8s cluster.

[0151] Based on the operational status data, the database instance scheduling results of each machine node are adjusted to make real-time adjustments to the native scheduling mechanism of the K8s cluster.

[0152] In an optional embodiment, the method further includes:

[0153] When the running status data indicates that there are machine nodes in the K8s cluster that meet the overload conditions, the machine nodes that meet the overload conditions are identified as overloaded nodes.

[0154] When selecting machine node instance scheduling, the overloaded node is automatically skipped, and the instance scheduling controller automatically generates the instance elastic scaling resource object corresponding to the overloaded node.

[0155] The instance scheduling controller binds the instance elastic scaling resource object to the second cloud-native resource, and the instance elastic scaling resource object is used to dynamically add or delete instances of the second cloud-native resource.

[0156] In an optional embodiment, the method further includes:

[0157] The instance scheduling controller configures a scheduling policy, which includes at least setting a disk capacity usage threshold and a disk IOPS upper limit.

[0158] In an optional embodiment, the method further includes:

[0159] The cluster topology controller detects multiple machine nodes in the K8s cluster and identifies the disk type of each machine node.

[0160] For each machine node, a local storage resource object corresponding to the machine node is automatically created by the cluster topology controller according to the disk type.

[0161] In an optional embodiment, the method further includes:

[0162] The cluster topology controller tracks the addition and deletion of instances in the K8s cluster and the role distribution of each instance in real time to maintain the topology status of the K8s cluster.

[0163] As the method embodiments are basically similar to the device embodiments, they are described in a relatively simple manner. For relevant details, please refer to the description of the aforementioned device embodiments.

[0164] In this embodiment of the invention, in conjunction with the database cluster deployment system provided in this embodiment, a database cluster deployment method based on DRBD replication in a cloud-native environment is provided. By adopting the method, data can be directly synchronized from the physical log layer, avoiding the previous logical log replay operation. It can achieve automation, high stability, and near real-time fault self-healing unaffected by the data volume, greatly improving database availability and minimizing the downtime of business systems.

[0165] This invention also provides an electronic device, which includes a processor and a memory:

[0166] The memory is used to store program code and transfer the program code to the processor;

[0167] The processor is used to execute the database cluster deployment method of any embodiment of the present invention according to the instructions in the program code.

[0168] This invention also provides a computer-readable storage medium for storing program code for executing the database cluster deployment method of any embodiment of this invention.

[0169] Those skilled in the art will clearly understand that, for the sake of convenience and brevity, the specific working processes of the systems, devices, and units described above can be referred to the corresponding processes in the foregoing method embodiments, and will not be repeated here.

[0170] In the several embodiments provided in this application, it should be understood that the disclosed systems, apparatuses, and methods can be implemented in other ways. For example, the apparatus embodiments described above are merely illustrative; for instance, the division of units is only a logical functional division, and in actual implementation, there may be other division methods. For example, multiple units or components may be combined or integrated into another system, or some features may be ignored or not executed. Furthermore, the coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection between apparatuses or units through some interfaces, and may be electrical, mechanical, or other forms.

[0171] The units described as separate components may or may not be physically separate. The components shown as units may or may not be physical units; that is, they may be located in one place or distributed across multiple network units. Some or all of the units can be selected to achieve the purpose of this embodiment according to actual needs.

[0172] Furthermore, the functional units in the various embodiments of the present invention can be integrated into one processing unit, or each unit can exist physically separately, or two or more units can be integrated into one unit. The integrated unit can be implemented in hardware or as a software functional unit.

[0173] If the integrated unit is implemented as a software functional unit and sold or used as an independent product, it can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present invention, in essence, or the part that contributes to the prior art, or all or part of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of the present invention. The aforementioned storage medium includes various media capable of storing program code, such as USB flash drives, portable hard drives, read-only memory (ROM), random access memory (RAM), magnetic disks, or optical disks.

[0174] The above-described embodiments are only used to illustrate the technical solutions of the present invention, and are not intended to limit it. Although the present invention has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that modifications can still be made to the technical solutions described in the foregoing embodiments, or equivalent substitutions can be made to some of the technical features. Such modifications or substitutions do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

1. A database cluster deployment system, characterized in that, The database cluster deployment system is located on a Kubernetes cluster, and the database cluster deployment system includes a cluster topology controller and a storage management controller communicatively connected to the cluster topology controller; wherein: The cluster topology controller is used to detect the database cluster to be deployed created by the user, and to associate and create a first cloud-native resource and a second cloud-native resource for the database cluster to be deployed. The storage management controller is used to bind a local storage resource object and a write service to the first cloud-native resource, and to bind a shared storage resource object and a read service to the second cloud-native resource; The first cloud-native resource includes a backup instance container group, the backup instance container group corresponding to a second local storage root directory; the second cloud-native resource includes several slave instance container groups, the several slave instance container groups corresponding to a shared storage subdirectory; the storage management controller is further configured to: When data synchronization between backup and slave instances is required, the DRBD replication method is used to synchronize the data between the shared storage subdirectory and the second local storage root directory in real time.

2. The database cluster deployment system according to claim 1, characterized in that, The first cloud-native resource also includes a primary instance container group, which corresponds to the first local storage root directory; the storage management controller is further configured to: Based on the DRBD replication method, a primary-backup relationship is established between the primary instance container group and the backup instance container group; When data synchronization between primary and backup instances is required, the DRBD replication method is used to synchronize the data between the first local storage root directory and the second local storage root directory in real time.

3. The database cluster deployment system according to claim 2, characterized in that, The Kubernetes cluster has an NFS server deployed on it, and the storage management controller is also used for: Set the node control mode of the K8s cluster to a three-node mode; Set the machine node where the main instance container group is located as the main node; The machine node where the backup instance container group is located and the machine node where the NFS server is located are respectively set as secondary nodes, wherein the primary node and the two secondary nodes maintain three-node synchronization.

4. The database cluster deployment system according to claim 1, characterized in that, The storage management controller is also used for: The system monitors the storage capacity and utilization of each database cluster in the K8s cluster in real time, identifies database clusters with storage capacity lower than a preset storage capacity threshold as target database clusters, and issues a storage space shortage warning signal to the target database clusters.

5. The database cluster deployment system according to claim 1, characterized in that, The storage management controller is also used for: Real-time monitoring of the node running status of each machine node in the K8s cluster; Machine nodes that experience operational failures are identified as faulty nodes, and resources are reallocated to these faulty nodes according to custom rules.

6. The database cluster deployment system according to claim 1, characterized in that, The database cluster deployment system further includes an instance scheduling controller, which communicates with the cluster topology controller and the storage management controller; wherein, the instance scheduling controller is used for: Obtain the running status data of each machine node in the K8s cluster; Based on the operational status data, the database instance scheduling results of each machine node are adjusted to make real-time adjustments to the native scheduling mechanism of the K8s cluster.

7. The database cluster deployment system according to claim 6, characterized in that, The instance scheduling controller is also used for: When the running status data indicates that there are machine nodes in the K8s cluster that meet the overload conditions, the machine nodes that meet the overload conditions are identified as overloaded nodes. When selecting machine node instance scheduling, the overloaded nodes are automatically skipped, and the instance elastic scaling resource object corresponding to the overloaded nodes is automatically generated. The instance elastic scaling resource object is bound to the second cloud-native resource, and the instance elastic scaling resource object is used to dynamically add or delete instances of the second cloud-native resource.

8. The database cluster deployment system according to claim 6, characterized in that, The instance scheduling controller is also used for: Configure a scheduling policy, which includes at least setting a disk capacity usage threshold and a disk IOPS upper limit.

9. The database cluster deployment system according to any one of claims 1 to 8, characterized in that, The cluster topology controller is also used for: Detect multiple machine nodes in the K8s cluster and identify the disk type of each machine node; For each machine node, a local storage resource object corresponding to the machine node is automatically created according to the disk type.

10. The database cluster deployment system according to claim 9, characterized in that, The cluster topology controller is also used for: The addition and deletion of instances in the K8s cluster and the role distribution of each instance are tracked in real time to maintain the topology of the K8s cluster.

11. A database cluster deployment method, characterized in that, The method is applied to a database cluster deployment system, which is located in a K8s cluster. The database cluster deployment system includes a cluster topology controller and a storage management controller communicatively connected to the cluster topology controller. The method includes: The cluster topology controller detects the database cluster to be deployed created by the user and associates and creates a first cloud-native resource and a second cloud-native resource with the database cluster to be deployed. The storage management controller binds a local storage resource object and a write service to the first cloud-native resource, and binds a shared storage resource object and a read service to the second cloud-native resource. Wherein, the first cloud-native resource includes a backup instance container group, the backup instance container group corresponding to a second local storage root directory; the second cloud-native resource includes a plurality of slave instance container groups, the plurality of slave instance container groups corresponding to a shared storage subdirectory; the method further includes: When data synchronization between backup and slave instances is required, the storage management controller uses DRBD replication to synchronize the data between the shared storage subdirectory and the second local storage root directory in real time.

12. The database cluster deployment method according to claim 11, characterized in that, Also includes: Install the cluster topology controller, instance scheduling controller, and storage management controller into the Kubernetes cluster, and establish mutual communication between the cluster topology controller, the instance scheduling controller, and the storage management controller.

13. The database cluster deployment method according to claim 11 or 12, characterized in that, Also includes: The instance is distributed to each machine node of the K8s cluster, and DRBD compatibility testing is performed on each machine node. Machine nodes that do not have the DRBD program installed are identified as machine nodes to be installed, and deployment actions are performed on the machine nodes to be installed. If an installation failure occurs during deployment, the reason for the failure will be recorded in the error output stream.

14. An electronic device, characterized in that, The device includes a processor and a memory: The memory is used to store program code and transmit the program code to the processor; The processor is used to execute the database cluster deployment method according to any one of claims 11-13 according to the instructions in the program code.

15. A computer-readable storage medium, characterized in that, The computer-readable storage medium is used to store program code for executing the database cluster deployment method according to any one of claims 11-13.