Partition balancing method and apparatus for distributed database

By constructing balanced group information and partition transfer, the problem of clustering adjacent partitions or non-partitioned tables in distributed databases is solved, thus improving system performance.

WO2026124696A1PCT designated stage Publication Date: 2026-06-18BEIJING OCEANBASE TECHNOLOGY CO LTD

Patent Information

Authority / Receiving Office
WO · WO
Patent Type
Applications
Current Assignee / Owner
BEIJING OCEANBASE TECHNOLOGY CO LTD
Filing Date
2026-02-10
Publication Date
2026-06-18

AI Technical Summary

Technical Problem

In existing technologies, distributed databases cannot effectively prevent adjacent partitions or non-partitioned tables from clustering in a single log stream when performing data balancing, leading to access hotspots and system performance degradation.

Method used

By constructing balance group information, the partitions in the database are divided into corresponding balance groups, and intra-group and inter-group balance processing is performed, including transferring specific partitions between the source log stream and the destination log stream, to ensure that adjacent partitions or non-partitioned tables are scattered across different log streams.

🎯Benefits of technology

This achieves balanced log stream volume, avoids hotspot access issues, and improves system performance.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN2026078434_18062026_PF_FP_ABST
    Figure CN2026078434_18062026_PF_FP_ABST
Patent Text Reader

Abstract

Provided in the present description are a partition balancing method and apparatus for a distributed database. Data stored in the database is partitioned into several partitions, and data operations on the partitions are recorded in several log streams. The method comprises: constructing balance group information, so as to classify partitions in a database into corresponding balance groups; determining whether the number of partitions of each balance group on each log stream is balanced, and if not, performing intra-group balancing on the balance group; and determining whether the total number of partitions on each log stream is balanced, and if not, performing inter-group balancing on the balance groups, wherein the intra-group balancing and the inter-group balancing comprise: transferring a specified partition between a source log stream and a destination log stream.
Need to check novelty before this filing date? Find Prior Art

Description

Distributed database partition balancing method and device Technical Field

[0001] This specification relates to the field of database technology, and more particularly to a method and apparatus for partition balancing of a distributed database. Background Technology

[0002] With the development and integration of artificial intelligence and mobile internet technologies, users are generating more and more data in their work and lives, making data management increasingly important and crucial. Currently, there are more and more types of databases used for data management, such as cloud-based encrypted databases and distributed databases. A distributed database is a data storage system where data is distributed across multiple physical locations, but logically treated as a single database. These physical locations can be different computers within the same local area network (LAN) or different geographical locations within a wide area network (WAN). The design purpose of distributed databases is to improve the speed, reliability, and availability of data access, while supporting large-scale data processing and high-concurrency access.

[0003] Data in distributed databases needs to be balanced to achieve the aforementioned effects, such as supporting high-concurrency access. However, the data balancing performance of related technologies is generally not very good. Summary of the Invention

[0004] In view of the above, this specification provides a method and apparatus for partition balancing of a distributed database, an electronic device and a storage medium, through one or more embodiments.

[0005] To achieve the above objectives, one or more embodiments of this specification provide the following technical solutions:

[0006] According to a first aspect of one or more embodiments of this specification, a partition balancing method for a distributed database is proposed; wherein the data stored in the database is divided into several partitions; data operations on the partitions are recorded in several log streams; the method includes:

[0007] Construct balance group information to divide each partition in the database into its corresponding balance group;

[0008] Determine whether the number of partitions in each log stream is balanced across all balancer groups. If not, perform intra-group balancing on that balancer group.

[0009] Determine whether the total number of partitions on each log stream is balanced. If not, perform inter-group balancing on each balanced group.

[0010] The intra-group balancing process and the inter-group balancing process include: transferring specific partitions between the source log stream and the destination log stream.

[0011] According to a second aspect of one or more embodiments of this specification, a partition balancing device for a distributed database is provided; wherein the data stored in the database is divided into several partitions; data operations on the partitions are recorded in several log streams; the device includes:

[0012] The balance group construction module constructs balance group information to divide each partition in the database into its corresponding balance group.

[0013] The intra-group balancing module determines whether the number of partitions in each log stream is balanced across all balancing groups. If not, it performs intra-group balancing processing on that balancing group.

[0014] The inter-group balancing module determines whether the total number of partitions on each log stream is balanced. If not, it performs inter-group balancing processing on each balanced group.

[0015] The intra-group balancing process and the inter-group balancing process include: transferring specific partitions between the source log stream and the destination log stream.

[0016] According to a third aspect of one or more embodiments of this specification, a computer program product is provided, comprising a computer program / instructions that, when executed by a processor, implement the steps of the method described in the first aspect.

[0017] According to a fourth aspect of one or more embodiments of this specification, an electronic device is provided, comprising:

[0018] processor;

[0019] Memory used to store processor-executable instructions;

[0020] The processor implements the method as described in the first aspect by running the executable instructions.

[0021] According to a fifth aspect of one or more embodiments of this specification, a computer-readable storage medium is provided that stores computer instructions thereon, which, when executed by a processor, implement the steps of the method as described in the first aspect.

[0022] The technical solutions provided in the embodiments of this specification may include the following beneficial effects:

[0023] In the distributed database partition balancing method provided in the embodiments of this specification, the data stored in the database is divided into several partitions, and data operations on the partitions are recorded in several log streams. In this case, balancing group information can be constructed to assign each partition in the database to a corresponding balancing group. Subsequently, on the one hand, it can be determined whether the number of partitions in each log stream is balanced across balancing groups; if not, intra-group balancing processing is performed on the balancing group. On the other hand, it can be determined whether the total number of partitions in each log stream is balanced; if not, inter-group balancing processing is performed on the balancing groups. The intra-group balancing processing and the inter-group balancing processing include: transferring specific partitions between the source log stream and the destination log stream. By constructing balancing group information to assign each partition in the database to a corresponding balancing group, and then sequentially performing partition balancing processing within a balancing group and partition balancing processing between balancing groups based on balancing processing principles, adjacent partitions within a balancing group can be distributed across different log streams, and the difference in the number of partitions on different log streams can meet preset requirements. Attached Figure Description

[0024] Figure 1 is a schematic diagram of the architecture of a distributed database provided in an exemplary embodiment.

[0025] Figure 2 is a flowchart of a partition balancing method for a distributed database provided in an exemplary embodiment.

[0026] Figure 3 is a diagram of the balancing effect of a balancing group consisting of multiple non-partitioned tables provided in an exemplary embodiment.

[0027] Figure 4 is a diagram illustrating the balancing effect of a balancing group consisting of multiple partitions of a partition table, provided by an exemplary embodiment.

[0028] Figure 5 is a schematic diagram of the structure of a device provided in an exemplary embodiment.

[0029] Figure 6 is a block diagram of a partition balancing device for a distributed database provided in an exemplary embodiment. Detailed Implementation

[0030] Exemplary embodiments will now be described in detail, examples of which are illustrated in the accompanying drawings. When the following description relates to the drawings, unless otherwise indicated, the same numerals in different drawings denote the same or similar elements. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with one or more embodiments of this specification. Rather, they are merely examples of apparatuses and methods consistent with some aspects of one or more embodiments of this specification as detailed in the appended claims.

[0031] It should be noted that the steps of the corresponding methods are not necessarily performed in the order shown and described in this specification in other embodiments. In some other embodiments, the methods may include more or fewer steps than described in this specification. Furthermore, a single step described in this specification may be broken down into multiple steps in other embodiments; and multiple steps described in this specification may be combined into a single step in other embodiments.

[0032] The following section explains some of the concepts involved in this manual.

[0033] Partition (or Tablet): In a distributed database, a table is divided into multiple groups of data according to specified rules. Each group of data is called a partition. For example, a partition can be a non-partitioned table; or, a partition can be a specific partition within a partitioned table.

[0034] Log Stream: An organizational structure of logs in a distributed database system, containing several partitions. A log stream is a logical or physical sequence of logs used to record data change operations sequentially in a database system. It serves as the basic unit for log replication, persistence, and fault recovery. In distributed environments, it is typically bound to a consistency protocol (e.g., Paxos, Raft) to form an independent log replication unit.

[0035] Migration: Moving log streams from one log stream to another, on a partition-by-partition basis.

[0036] Partition balancing groups: In distributed databases, this describes the relationship between partitions and their distribution. For example, all non-partitioned tables under a tenant form a partition balancing group, all first-level partitions of a first-level partitioned table form a partition balancing group, and all second-level partitions under each first-level partition of a second-level partitioned table form a partition balancing group. During partition balancing, the partitions in each partition balancing group are distributed across different log streams.

[0037] In related technologies, when distributed databases perform data balancing, the number of partitions or non-partitioned tables on different log streams is relatively balanced. However, it is impossible to guarantee that adjacent partitions or non-partitioned tables will be scattered. That is, it is impossible to avoid adjacent non-partitioned tables (such as non-partitioned tables under the same database of a tenant) from clustering in one log stream, and it is also impossible to avoid adjacent partitions (such as adjacent partitions in a tenant's partitioned table) from clustering in one log stream. This causes some log streams to have access hotspots, resulting in a decrease in system performance.

[0038] Based on the above-mentioned technical problems, at least one embodiment of this specification provides a partition balancing method for distributed databases. This method aims to enable distributed databases to distribute adjacent partitions or non-partitioned tables by balancing the number of log streams when performing data balancing (e.g., partition balancing), thereby avoiding access hotspots caused by adjacent partitions or non-partitioned tables clustering in one log stream.

[0039] For example, this method can be applied to the distributed database system shown in Figure 1. In practical applications, this method can also be applied to a centralized database system that includes multiple database instances, and this specification does not impose any special limitations on it.

[0040] Please refer to Figure 2, which exemplarily illustrates a flowchart of a partition balancing method for a distributed database, including steps S201 to S203.

[0041] In this embodiment, the database system may include several distributed databases, and the data stored in each database may be divided into several partitions; the database system may also include several log streams, which are used to record data operations on data tables (e.g., INSERT / UPDATE / DELETE operations that modify data in data tables).

[0042] It should be noted that data tables can generally be divided into non-partitioned tables and partitioned tables. Accordingly, a partition can be a non-partitioned table; or, a partition can be a specific partition within a partitioned table.

[0043] In step S201, balance group information is constructed to divide each partition in the database into its corresponding balance group.

[0044] The database system can be shared by multiple tenants, and the data of different tenants is isolated from each other at the physical and / or logical levels. A tenant's data can be divided into multiple database instances, and the data within different database instances under a tenant is isolated from each other at the physical and / or logical levels. For example, this embodiment can specifically perform partitioning and data balancing for a particular tenant's data within the database system.

[0045] Data requiring partition balancing can include several non-partitioned tables and / or several partitioned tables, with each partitioned table further divided into several specific partitions. To achieve partition balancing on the log stream of a database system, several balancing groups can be constructed based on the data requiring partition balancing.

[0046] Specifically, balancing group information can be constructed to assign each partition in the database to its corresponding balancing group. For example, each non-partitioned table in the database can be assigned to a first-class balancing group, and each partition in each partitioned table in the database can be assigned to a second-class balancing group corresponding to that partitioned table.

[0047] In one possible embodiment, the partition may include partitions of non-partitioned tables and / or partitioned tables. Accordingly, the constructed balanced groups may include a first type of balanced group formed by non-partitioned tables, and / or a second type of balanced group formed by partitions in partitioned tables, such as a balanced group corresponding to the first partitioned table formed by all first-level partitions in a first-level partitioned table, or a balanced group corresponding to the second-level partitioned table formed by all second-level partitions under each first-level partitioned table in a second-level partitioned table.

[0048] In step S202, it is determined whether the number of partitions in each log stream of each balancing group is balanced. If they are not balanced, then intra-group balancing processing is performed on the balancing group.

[0049] In step S203, it is determined whether the total number of partitions on each log stream is balanced. If they are not balanced, inter-group balancing is performed on each balanced group.

[0050] With the above-mentioned balanced groups constructed, on the one hand, for any balanced group, it can be determined whether the number of partitions in each log stream of the balanced group is balanced. If they are not balanced, then the balanced group is subjected to intra-group balancing processing.

[0051] On the other hand, for any log stream, it can be determined whether the total number of partitions on the log stream is balanced. If it is not balanced, then inter-group balancing can be performed on each balanced group.

[0052] It should be noted that the intra-group balancing process and the inter-group balancing process include: transferring specific partitions between the source log stream and the destination log stream.

[0053] In one possible embodiment, during intra-group balancing, for any balancing group, the source log stream corresponding to the balancing group can be the log stream containing the most partitions in the balancing group, and the destination log stream corresponding to the balancing group can be the log stream containing the fewest partitions in the balancing group. If the difference between the number of partitions on the log stream with the most partitions (i.e., the source log stream corresponding to the balancing group) and the number of partitions on the log stream with the fewest partitions (i.e., the destination log stream corresponding to the balancing group) is greater than a preset threshold, it can be determined that the number of partitions in the balancing group is unbalanced across the log streams. Therefore, at least one partition from the log stream with the most partitions can be transferred to the log stream with the fewest partitions to split adjacent partitions on the log stream with the most partitions, ensuring that the difference between the number of partitions on the log stream with the most partitions and the number of partitions on the log stream with the fewest partitions is not greater than the preset threshold.

[0054] The preferred preset threshold is 1.

[0055] In other words, during intra-group balancing, the convergence condition is that the difference in the number of partitions in any two log streams within the balancing group does not exceed a preset threshold. For example, when the difference between the number of partitions in the log stream with the most partitions and the number of partitions in the log stream with the fewest partitions exceeds the preset threshold, partition transfer can be performed as follows to achieve convergence:

[0056] First, the smaller of the difference between the number of partitions on the log stream with the most partitions and the average number of log streams, and the difference between the average number of log streams and the number of partitions on the log stream with the fewest partitions, is determined as the number of partitions to be transferred. The average number of log streams is the ratio of the number of partitions in the balancing group to the number of log streams in the database system.

[0057] The number of partitions to be transferred, X, is determined by the following formula:

[0058] X=min(part_num_max-part_num / ls_num,part_num / ls_num-part_num_min)

[0059] In the above formula, part_num_max is the number of partitions on the log stream with the most partitions, part_num_min is the number of partitions on the log stream with the fewest partitions, part_num is the number of partitions in the load balancer group, and ls_num is the number of log streams in the database system.

[0060] Similarly, during inter-group balancing, the source log stream can be the log stream with the largest total number of partitions, and the destination log stream can be the log stream with the smallest total number of partitions. If the difference between the number of partitions in the log stream with the most partitions and the number of partitions in the log stream with the fewest partitions is greater than a preset threshold, it can be determined that the total number of partitions on each log stream is unbalanced. Therefore, at least one partition from the log stream with the most partitions can be transferred to the log stream with the fewest partitions to split adjacent partitions on the log stream with the most partitions, ensuring that the difference between the number of partitions in the log stream with the most partitions and the number of partitions in the log stream with the fewest partitions is not greater than the preset threshold.

[0061] The preferred preset threshold is 1.

[0062] In other words, the inter-group balancing process iterates based on the convergence condition that the difference in the total number of partitions on any two log streams does not exceed a preset threshold. For example, when the difference between the number of partitions on the log stream with the most partitions and the log stream with the fewest partitions exceeds the preset threshold, partition shifting can be performed as follows to achieve convergence:

[0063] First, the smaller of the difference between the number of partitions on the log stream with the most partitions and the average number of log streams, and the difference between the average number of log streams and the number of partitions on the log stream with the fewest partitions, is determined as the number of partitions to be transferred. The average number of log streams is the ratio of the total number of partitions in all balanced groups to the number of log streams in the database system.

[0064] The number of partitions to be transferred, X, is determined by the following formula:

[0065] X=min(part_num_max-part_num / ls_num,part_num / ls_num-part_num_min)

[0066] In the above formula, part_num_max is the number of partitions on the log stream with the most partitions, part_num_min is the number of partitions on the log stream with the fewest partitions, part_num is the total number of partitions in all balanced groups, and ls_num is the number of log streams in the database system.

[0067] In one possible implementation, for each load balancing group, the partitions within that group can be divided into several subgroups. Further, based on the log stream currently containing each partition within the load balancing group, distribution information of the load balancing group on each log stream can be established (this can be part of the load balancing group information). It should be noted that this distribution information can be used to record the partitions of the load balancing group on each log stream and their respective subgroups.

[0068] For example, if a database system has n log streams, this step can establish n distribution information entries for each load balancer group, corresponding one-to-one with the n log streams. Each distribution information entry can be a tree structure with the log stream as the root node and the subgroups as leaf nodes; therefore, the distribution information can also be figuratively called a distribution tree. If a load balancer group has a partition on a certain log stream, then the distribution information of that load balancer group on that log stream can record that partition and the subgroup to which that partition belongs.

[0069] Therefore, without deleting or hiding empty subgroups, the subgroups of a load balancer group, as indicated by its distribution information on each log stream, are usually the same across different log streams. However, the partitions of that load balancer group on different log streams are different. For example, the distribution information of load balancer group 1 on log stream 1 indicates that load balancer group 1 has subgroup 1 and subgroup 2 on log stream 1. Subgroup 1 can contain partition 1, and subgroup 2 can be empty. The distribution information of load balancer group 1 on log stream 2 indicates that load balancer group 1 has subgroup 1 and subgroup 2 on log stream 2. Subgroup 1 can be empty, and subgroup 2 can contain partition 2 and partition 3.

[0070] In one possible implementation, for a balanced group formed by non-partitioned tables, the data tables are non-partitioned tables, and the partitions are also non-partitioned tables. All non-partitioned tables can form a balanced group. Adjacent data within this type of balanced group refers to non-partitioned tables belonging to the same database instance. For example, a tenant has two database instances, database instance 1 and database instance 2. Database instance 1 contains non-partitioned tables T1, T2, and T3, while database instance T2 contains non-partitioned tables T4, T5, and T6. In this case, non-partitioned tables T1, T2, and T3 are adjacent partitions, and non-partitioned tables T4, T5, and T6 are also adjacent partitions.

[0071] For example, for any balanced group formed by non-partitioned tables, the distribution information of the balanced group on each log stream can be established as follows: non-partitioned tables in the same database instance within the balanced group are divided into a subgroup; that is, non-partitioned tables in the same database instance belong to the same subgroup, and non-partitioned tables in different database instances belong to different subgroups.

[0072] In other words, for a balanced group formed by non-partitioned tables, the distribution information of the non-partitioned tables on each log stream causes the non-partitioned tables under each database instance to be clustered into a subgroup.

[0073] Since all partitions of a partitioned table must reside within the same database instance, for a balanced group formed by the partitions of a partitioned table, the data table is a partitioned table, and the partitions are partitions of the partitioned table. Each partitioned table can form a balanced group. Adjacent partitions within this type of balanced group refer to adjacent partitions within the partitioned table itself. For example, if a tenant's database instance 1 contains four first-level partitions P0, P1, P2, and P3 in a first-level partitioned table T1, then P0 and P1 are adjacent partitions, P1 and P2 are adjacent partitions, and P2 and P3 are adjacent partitions.

[0074] For example, for any log stream distribution group formed by partitions of a partitioned table, the distribution information of the distribution group on each log stream can be established as follows: The partitions within the distribution group (usually all partitions within the partitioned table) are divided into subgroups of the same number as the number of log streams in the database system. For example, assuming there are 3 log streams in the database system, the partitions within the distribution group can be divided into 3 subgroups. Specifically, the partitions within the distribution group can be assigned to these created subgroups so that adjacent partitions reside in different subgroups.

[0075] In other words, for a balanced group formed by partitions of a partitioned table, its distribution information on each log stream ensures that adjacent partitions are distributed in different subgroups. For example, if a database system has three log streams, and a balanced group consists of nine first-level partitions (P1, P2, P3, P4, P5, P6, P7, P8, and P9) on a certain log stream, then the distribution information of this balanced group on that log stream can include three subgroups: subgroup 1, subgroup 2, and subgroup 3. Subgroup 1 contains three non-adjacent partitions (P1, P4, and P7); subgroup 2 contains three non-adjacent partitions (P2, P5, and P8); and subgroup 3 contains three non-adjacent partitions (P3, P6, and P9).

[0076] It should be understood that this method, when performing partition balancing, aims to distribute non-partitioned tables within the same database instance across different log streams, and to distribute adjacent partitions within partitioned tables across different log streams. The established distribution information can aggregate non-partitioned tables within the same database instance and split adjacent partitions within partitioned tables. Subsequent steps in data balancing can employ different balancing strategies to achieve the distribution of adjacent partitions within balancing groups formed by multiple non-partitioned tables and balancing groups formed by multiple partitions within partitioned tables, respectively.

[0077] In one possible embodiment, when performing intra-group balancing, if the balancing group is a first type of balancing group composed of non-partitioned tables, the non-partitioned tables belonging to the same database instance in the first type of balancing group can be divided into the same subgroup, and the target subgroup with the smallest ratio of the number of partitions on the destination log stream corresponding to the first type of balancing group to the number of partitions on the source log stream corresponding to the first type of balancing group can be determined, so as to transfer some partitions of the target subgroup on the source log stream to the destination log stream.

[0078] Correspondingly, if the balancing group is a second type of balancing group composed of partitions in the partition table, the partitions in the second type of balancing group can be divided into several subgroups according to the preset balancing target, and the target subgroup with the largest ratio of the number of partitions on the destination log stream corresponding to the second type of balancing group to the number of partitions on the source log stream corresponding to the second type of balancing group can be determined, so as to transfer several partitions of the target subgroup on the source log stream to the destination log stream.

[0079] In one possible embodiment, when transferring several partitions of the target subgroup on the source log stream to the destination log stream, the following steps may be repeated until the number of partitions transferred from the source log stream to the destination log stream reaches the number of partitions transferred: transferring any partition of the target subgroup on the source log stream to the destination log stream.

[0080] Taking a first-type balance group for intra-group balancing as an example, each time, a partition from the log stream with the most partitions (i.e., the source log stream corresponding to the first-type balance group) is transferred to the log stream with the fewest partitions (i.e., the destination log stream corresponding to the first-type balance group) in the following manner, until the number of partitions transferred reaches the partition transfer limit:

[0081] The subgroup with the smallest first balance ratio in the first type of balance group is identified as the target subgroup, and one partition of the target subgroup on the log stream with the most partitions is transferred to the log stream with the fewest partitions; wherein, the first balance ratio is the ratio of the number of partitions of the target subgroup on the log stream with the most partitions to the number of partitions of the target subgroup on the log stream with the fewest partitions.

[0082] That is, the target subgroup u is determined each time according to the following formula:

[0083]

[0084] In the above formula, d i Let s be the number of partitions of the i-th subgroup on the log stream (destination log stream) with the fewest partitions. i The number of partitions in the i-th subgroup on the log stream (source log stream) with the most partitions.

[0085] Taking a second-type balance group for intra-group balancing as an example, each time, a partition from the log stream with the most partitions (i.e., the source log stream corresponding to the second-type balance group) is transferred to the log stream with the fewest partitions (i.e., the destination log stream corresponding to the second-type balance group) in the following manner, until the number of partitions transferred reaches the partition transfer limit:

[0086] The subgroup with the largest second balance ratio in the second type of balance group is identified as the target subgroup, and one partition of the target subgroup on the log stream with the most partitions is transferred to the log stream with the fewest partitions; wherein, the second balance ratio is the ratio of the number of partitions of the target subgroup on the log stream with the most partitions to the number of partitions of the target subgroup on the log stream with the fewest partitions.

[0087] That is, the target subgroup k is determined each time according to the following formula:

[0088]

[0089] In the above formula, d i Let s be the number of partitions of the i-th subgroup on the log stream with the fewest partitions. i The number of partitions in the log stream with the most partitions for the i-th subgroup.

[0090] Taking a first-type load balancing group formed by multiple non-partitioned tables as an example, the number of partitions to be transferred can be determined according to the method in this embodiment during the load balancing process within the group. Each time a non-partitioned table is transferred, a target subgroup is determined according to the method in the example above, and one non-partitioned table within that target subgroup is transferred from the log stream with the most non-partitioned tables to the log stream with the fewest non-partitioned tables. For example, in the load balancing shown in Figure 3, the load balancing group includes nine non-partitioned tables: DB1:T1, DB1:T2, DB1:T3, DB2:T1, DB2:T2, DB2:T3, DB3:T1, DB3:T2, and DB3:T3. DB1:T1, DB1:T2, and DB1:T3 belong to database instance DB1; DB2:T1, DB2:T2, and DB2:T3 belong to database instance DB2; and DB3:T1, DB3:T2, and DB3:T3 belong to database instance DB3. The database system has three log streams: LS1, LS2, and LS3. Before load balancing, all nine non-partitioned tables are on log stream LS1. After load balancing, DB1:T1, DB2:T1, and DB3:T1 are on log stream LS1, DB1:T2, DB2:T2, and DB3:T2 are on log stream LS2, and DB1:T3, DB2:T3, and DB3:T3 are on log stream LS3. In other words, the nine non-partitioned tables are evenly distributed across the three log streams, and the non-partitioned tables within the same database instance are scattered across different log streams.

[0091] Taking the second type of balanced group formed by multiple partitions of the partition table as an example, the number of partitions to be transferred can be determined in the manner of this embodiment when balancing within the group, and the target subgroup can be determined in the manner of the above example each time a partition is transferred, and one partition in the target subgroup can be transferred from the log stream with the most partitions to the log stream with the fewest partitions. For example, the intra-group balancing shown in Figure 4 includes nine partitions: T1:P1, T1:P2, T1:P3, T1:P4, T1:P5, T1:P6, T1:P7, T1:P8, and T1:P9. The database system has three log streams: LS1, LS2, and LS3. Before balancing, all nine partitions are on log stream LS1. After balancing, T1:P1, T1:P4, and T1:P7 are on log stream LS1, T1:P2, T1:P5, and T1:P8 are on log stream LS2, and T1:P3, T1:P6, and T1:P9 are on log stream LS3. That is, the nine partitions are evenly distributed across the three log streams, and adjacent partitions are scattered across different log streams.

[0092] In one possible embodiment, when performing inter-group load balancing, the number of partitions transferred from the source log stream containing the most partitions can be transferred to the destination log stream containing the fewest partitions.

[0093] In one possible embodiment, each equalization group can be traversed, and the equalization group whose number of partitions on the source log stream is greater than the number of partitions on the destination log stream can be determined as the target equalization group. Any partition of the target equalization group on the source log stream can be transferred to the destination log stream until the number of partitions transferred from the source log stream to the destination log stream reaches the number of partitions transferred.

[0094] If a balanced log stream meets the following condition, then a partition within that balanced log stream will be moved from the log stream with the most partitions to the log stream with the fewest partitions:

[0095] bg_part_num_ls_max>bg_part_num_ls_min

[0096] In the above formula, bg_part_num_ls_max is the number of partitions of the traversed load balancer group on ls_max, ls_max is the source log stream with the largest total number of partitions, bg_part_num_ls_min is the number of partitions of the traversed load balancer group on ls_min, and ls_min is the destination log stream with the smallest total number of partitions.

[0097] In one possible embodiment, similar to intra-group balancing, when the target balancing group is a first-type balancing group composed of non-partitioned tables, the non-partitioned tables belonging to the same database instance in the first-type balancing group can be divided into the same subgroup. The target subgroup with the smallest ratio of the number of partitions on the destination log stream (which is the log stream with the fewest total number of partitions at this time) to the number of partitions on the source log stream (which is the log stream with the most total number of partitions at this time) can be determined so that any partition of the target subgroup on the source log stream can be transferred to the destination log stream.

[0098] When the target balancing group is a second type of balancing group composed of partitions in the partition table, the partitions in the second type of balancing group can be divided into several subgroups according to the preset balancing target, and the target subgroup with the largest ratio of the number of partitions on the destination log stream to the number of partitions on the source log stream can be determined, so that any partition of the target subgroup on the source log stream can be transferred to the destination log stream.

[0099] In the distributed database partition balancing method provided in the embodiments of this specification, the data stored in the database is divided into several partitions, and data operations on the partitions are recorded in several log streams. In this case, balancing group information can be constructed to assign each partition in the database to a corresponding balancing group. Subsequently, on the one hand, it can be determined whether the number of partitions in each log stream is balanced across balancing groups; if not, intra-group balancing processing is performed on the balancing group. On the other hand, it can be determined whether the total number of partitions in each log stream is balanced; if not, inter-group balancing processing is performed on the balancing groups. The intra-group balancing processing and the inter-group balancing processing include: transferring specific partitions between the source log stream and the destination log stream. By constructing balancing group information to assign each partition in the database to a corresponding balancing group, and then sequentially performing partition balancing processing within a balancing group and partition balancing processing between balancing groups based on balancing processing principles, adjacent partitions within a balancing group can be distributed across different log streams, and the difference in the number of partitions on different log streams can meet preset requirements.

[0100] Figure 5 is a schematic structural diagram of a device provided in an exemplary embodiment. Referring to Figure 5, at the hardware level, the device includes a processor 502, an internal bus 504, a network interface 506, a memory 508, and a non-volatile memory 510, and may also include other hardware required for tasks. One or more embodiments of this specification can be implemented in software, for example, the processor 502 reads the corresponding computer program from the non-volatile memory 510 into the memory 508 and then runs it. Of course, in addition to software implementation, one or more embodiments of this specification do not exclude other implementation methods, such as logic devices or a combination of hardware and software, etc. That is to say, the execution subject of the following processing flow is not limited to each logic unit, but can also be hardware or logic devices.

[0101] Please refer to Figure 6. The partition balancing device for a distributed database can be applied to the device shown in Figure 5 to implement the technical solution of this specification. The data stored in the database is divided into several partitions; data operations on these partitions are recorded in several log streams. This partition balancing device for a distributed database may include:

[0102] Balance group construction module 602 constructs balance group information to divide each partition in the database into corresponding balance groups;

[0103] The intra-group balancing module 604 determines whether the number of partitions in each log stream is balanced across all balancing groups. If not, it performs intra-group balancing processing on that balancing group.

[0104] The inter-group balancing module 606 determines whether the total number of partitions on each log stream is balanced. If not, it performs inter-group balancing processing on each balanced group.

[0105] The intra-group balancing process and the inter-group balancing process include: transferring specific partitions between the source log stream and the destination log stream.

[0106] In one possible embodiment of this specification, the partition includes a non-partitioned table and partitions within a partitioned table;

[0107] The step of constructing the load balancing group information to assign each partition in the database to its corresponding load balancing group includes:

[0108] Balanced group information is constructed to assign each non-partitioned table in the database to a first-class balanced group, and each partition in each partitioned table in the database to a second-class balanced group corresponding to that partitioned table.

[0109] In one possible embodiment of this specification, the source log stream corresponding to each load balancer group is the log stream containing the most partitions in the load balancer group; the destination log stream corresponding to each load balancer group is the log stream containing the fewest partitions in the load balancer group.

[0110] The determination of whether the number of partitions in each log stream is balanced across each balancing group includes:

[0111] For each load balancing group, determine whether the difference between the number of partitions in the source log stream corresponding to the load balancing group and the number of partitions in the destination log stream corresponding to the load balancing group is greater than a preset threshold; if the difference is greater than the threshold, then determine that the load balancing group is unbalanced.

[0112] In one possible embodiment of this specification, when the balancing group is a first type of balancing group composed of non-partitioned tables, intra-group balancing processing is performed on the first type of balancing group, including:

[0113] Non-partitioned tables belonging to the same database instance in the first type of balanced group are divided into the same subgroup, and a target subgroup with the smallest ratio of the number of partitions on the destination log stream to the number of partitions on the source log stream is determined, so as to transfer some partitions of the target subgroup on the source log stream to the destination log stream.

[0114] In one possible embodiment of this specification, when the balancing group is a second type of balancing group composed of partitions in a partition table, intra-group balancing processing is performed on the second type of balancing group, including:

[0115] According to the preset balancing target, the partitions in the second type of balancing group are divided into several subgroups, and the target subgroup with the largest ratio of the number of partitions on the destination log stream to the number of partitions on the source log stream is determined, so as to transfer several partitions of the target subgroup on the source log stream to the destination log stream.

[0116] In one possible embodiment of this specification, the intra-group balancing module is further configured to:

[0117] Calculate the average number of partitions of the load balancer group across the log streams;

[0118] Calculate a first difference between the number of partitions in the source log stream and the average number of partitions, and calculate a second difference between the average number of partitions and the number of partitions in the destination log stream.

[0119] The smaller of the first difference and the second difference is determined as the number of partitions transferred from the source log stream to the destination log stream.

[0120] In one possible embodiment of this specification, transferring several partitions of the target subgroup on the source log stream to the destination log stream includes:

[0121] Repeat the following steps until the number of partitions transferred from the source log stream to the destination log stream reaches the partition transfer number: transfer any partition of the target subgroup on the source log stream to the destination log stream.

[0122] In one possible embodiment of this specification, the source log stream is the log stream containing the most partitions; the destination log stream is the log stream containing the fewest partitions.

[0123] The step of determining whether the total number of partitions on each log stream is balanced, and if not, performing inter-group balancing processing on each balanced group, includes:

[0124] Determine whether the difference between the number of partitions in the source log stream and the number of partitions in the destination log stream is greater than a preset threshold;

[0125] If the difference is greater than the threshold, then the log stream is determined to be unbalanced;

[0126] Transfer several partitions from the source log stream to the destination log stream.

[0127] In one possible embodiment of this specification, the inter-group balancing module is further configured to:

[0128] Calculate the average number of partitions on the aforementioned log streams;

[0129] Calculate a first difference between the number of partitions on the source log stream and the average number of partitions, and calculate a second difference between the average number of partitions and the number of partitions on the destination log stream;

[0130] The smaller of the first difference and the second difference is determined as the number of partitions transferred from the source log stream to the destination log stream.

[0131] In one possible embodiment of this specification, the inter-group balancing module is used for:

[0132] If the difference between the number of partitions in any two log streams for each balancing group is no greater than a preset threshold, then based on the distribution information of each balancing group in each log stream, the partitions in the log streams of the database system are balanced so that the partitions in different log streams of the database system meet the balancing requirements between balancing groups.

[0133] In one possible embodiment of this specification, transferring several partitions from the source log stream to the destination log stream includes:

[0134] Each load balancing group is traversed, and the load balancing group whose number of partitions on the source log stream is greater than the number of partitions on the destination log stream is determined as the target load balancing group. Any partition of the target load balancing group on the source log stream is transferred to the destination log stream until the number of partitions transferred from the source log stream to the destination log stream reaches the partition transfer number.

[0135] In one possible embodiment of this specification, transferring any partition of the target load balancer group on the source log stream to the destination log stream includes:

[0136] When the target balance group is a first-type balance group composed of non-partitioned tables, the non-partitioned tables belonging to the same database instance in the first-type balance group are divided into the same subgroup, and the target subgroup with the smallest ratio of the number of partitions on the destination log stream to the number of partitions on the source log stream is determined, so that any partition of the target subgroup on the source log stream can be transferred to the destination log stream.

[0137] When the target balancing group is a second type of balancing group composed of partitions in the partition table, the partitions in the second type of balancing group are divided into several subgroups according to the preset balancing target, and the target subgroup with the largest ratio of the number of partitions on the destination log stream to the number of partitions on the source log stream is determined, so that any partition of the target subgroup on the source log stream can be transferred to the destination log stream.

[0138] One or more embodiments of this specification also provide a computer program product, including a computer program / instructions that, when executed by a processor, implement the steps of the method provided in any of the above embodiments.

[0139] One or more embodiments of this specification also provide a computer-readable storage medium having computer instructions stored thereon that, when executed by a processor, implement the steps of the method provided in any of the above embodiments.

[0140]

[0141] The systems, devices, modules, or units described in the above embodiments can be implemented by computer chips or entities, or by products with certain functions. A typical implementation device is a computer, which can take the form of a personal computer, laptop computer, cellular phone, camera phone, smartphone, personal digital assistant, media player, navigation device, email sending and receiving device, game console, tablet computer, wearable device, or any combination of these devices.

[0142] In a typical configuration, a computer includes one or more processors (CPU), input / output interfaces, network interfaces, and memory.

[0143] Memory may include non-persistent storage in computer-readable media, such as random access memory (RAM) and / or non-volatile memory, such as read-only memory (ROM) or flash RAM. Memory is an example of computer-readable media.

[0144] Computer-readable media, including both permanent and non-permanent, removable and non-removable media, can store information using any method or technology. Information can be computer-readable instructions, data structures, modules of programs, or other data. Examples of computer storage media include, but are not limited to, phase-change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other memory technologies, CD-ROM, digital versatile optical disc (DVD) or other optical storage, magnetic tape, disk storage, quantum memory, graphene-based storage media or other magnetic storage devices, or any other non-transferable medium that can be used to store information accessible by a computing device. As defined herein, computer-readable media does not include transient computer-readable media, such as modulated data signals and carrier waves.

[0145] It should also be noted that the terms "comprising," "including," or any other variations thereof are intended to cover non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such a process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one..." does not exclude the presence of other identical elements in the process, method, article, or apparatus that includes said element.

[0146] The foregoing has described specific embodiments of this specification. Other embodiments are within the scope of the appended claims. In some cases, the actions or steps recited in the claims may be performed in a different order than that shown in the embodiments and may still achieve the desired result. Furthermore, the processes depicted in the drawings do not necessarily require the specific or sequential order shown to achieve the desired result. In some embodiments, multitasking and parallel processing are possible or may be advantageous.

[0147] The terminology used in one or more embodiments of this specification is for the purpose of describing particular embodiments only and is not intended to limit the scope of one or more embodiments of this specification. The singular forms “a,” “described,” and “the” used in one or more embodiments of this specification and in the appended claims are also intended to include the plural forms unless the context clearly indicates otherwise. It should also be understood that the term “and / or” as used herein refers to and includes any or all possible combinations of one or more associated listed items.

[0148] The user information (including but not limited to user device information, user personal information, etc.) and data (including but not limited to data used for analysis, data stored, data displayed, etc.) involved in this manual are all information and data authorized by the user or fully authorized by all parties. Furthermore, the collection, use and processing of related data must comply with the relevant laws, regulations and standards of the relevant countries and regions, and corresponding operation portals are provided for users to choose to authorize or refuse.

[0149] It should be understood that although the terms first, second, third, etc., may be used to describe various information in one or more embodiments of this specification, such information should not be limited to these terms. These terms are only used to distinguish information of the same type from one another. For example, first information may also be referred to as second information without departing from the scope of one or more embodiments of this specification, and similarly, second information may also be referred to as first information. Depending on the context, the word "if" as used herein may be interpreted as "when," "in response to a determination," or "when," or "in the event of a determination."

[0150] The above description is merely a preferred embodiment of one or more embodiments of this specification and is not intended to limit the scope of one or more embodiments of this specification. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of one or more embodiments of this specification should be included within the protection scope of one or more embodiments of this specification.

Claims

A partition balancing method for distributed databases; wherein, The data stored in the database is divided into several partitions; data operations on the partitions are recorded in several log streams; the method includes: Construct balance group information to divide each partition in the database into its corresponding balance group; Determine whether the number of partitions in each log stream is balanced across all balancer groups. If not, perform intra-group balancing on that balancer group. Determine whether the total number of partitions on each log stream is balanced. If not, perform inter-group balancing on each balanced group. The intra-group balancing process and the inter-group balancing process include: transferring specific partitions between the source log stream and the destination log stream. According to the method of claim 1, the partition includes a non-partitioned table and partitions in the partitioned table; The step of constructing the load balancing group information to assign each partition in the database to its corresponding load balancing group includes: Balanced group information is constructed to assign each non-partitioned table in the database to a first-class balanced group, and each partition in each partitioned table in the database to a second-class balanced group corresponding to that partitioned table. According to the method described in claim 1, the source log stream corresponding to each load balancer group is the log stream containing the most partitions in the load balancer group; the destination log stream corresponding to each load balancer group is the log stream containing the fewest partitions in the load balancer group. The determination of whether the number of partitions in each log stream is balanced across each balancing group includes: For each load balancing group, determine whether the difference between the number of partitions of the load balancing group on the source log stream corresponding to the load balancing group and the number of partitions of the load balancing group on the destination log stream corresponding to the load balancing group is greater than a preset threshold. If the difference is greater than the threshold, then the balanced group is determined to be unbalanced. According to the method of claim 3, when the balancing group is a first type of balancing group composed of non-partitioned tables, intra-group balancing processing is performed on the first type of balancing group, including: Non-partitioned tables belonging to the same database instance in the first type of balanced group are divided into the same subgroup, and a target subgroup with the smallest ratio of the number of partitions on the destination log stream to the number of partitions on the source log stream is determined, so as to transfer some partitions of the target subgroup on the source log stream to the destination log stream. According to the method of claim 3, when the balancing group is a second type of balancing group composed of partitions in the partition table, intra-group balancing processing is performed on the second type of balancing group, including: According to the preset balancing target, the partitions in the second type of balancing group are divided into several subgroups, and the target subgroup with the largest ratio of the number of partitions on the destination log stream to the number of partitions on the source log stream is determined, so as to transfer several partitions of the target subgroup on the source log stream to the destination log stream. The method according to claim 3, further comprising: Calculate the average number of partitions of the load balancer group across the log streams; Calculate a first difference between the number of partitions in the source log stream and the average number of partitions, and calculate a second difference between the average number of partitions and the number of partitions in the destination log stream. The smaller of the first difference and the second difference is determined as the number of partitions transferred from the source log stream to the destination log stream. According to the method of claim 6, the step of transferring several partitions of the target subgroup on the source log stream to the destination log stream includes: Repeat the following steps until the number of partitions transferred from the source log stream to the destination log stream reaches the partition transfer count: Transfer any partition of the target subgroup on the source log stream to the destination log stream. According to the method of claim 1, the source log stream is the log stream containing the most partitions; the destination log stream is the log stream containing the fewest partitions. The step of determining whether the total number of partitions on each log stream is balanced, and if not, performing inter-group balancing processing on each balanced group, includes: Determine whether the difference between the number of partitions in the source log stream and the number of partitions in the destination log stream is greater than a preset threshold; If the difference is greater than the threshold, then the log stream is determined to be unbalanced; Transfer several partitions from the source log stream to the destination log stream. The method according to claim 8, further comprising: Calculate the average number of partitions on the aforementioned log streams; Calculate a first difference between the number of partitions on the source log stream and the average number of partitions, and calculate a second difference between the average number of partitions and the number of partitions on the destination log stream; The smaller of the first difference and the second difference is determined as the number of partitions transferred from the source log stream to the destination log stream. According to the method of claim 9, the step of transferring a plurality of partitions on the source log stream to the destination log stream includes: Each load balancing group is traversed, and the load balancing group whose number of partitions on the source log stream is greater than the number of partitions on the destination log stream is determined as the target load balancing group. Any partition of the target load balancing group on the source log stream is transferred to the destination log stream until the number of partitions transferred from the source log stream to the destination log stream reaches the partition transfer number. According to the method of claim 10, transferring any partition of the target load balancer group on the source log stream to the destination log stream includes: When the target balance group is a first-type balance group composed of non-partitioned tables, the non-partitioned tables belonging to the same database instance in the first-type balance group are divided into the same subgroup, and the target subgroup with the smallest ratio of the number of partitions on the destination log stream to the number of partitions on the source log stream is determined, so that any partition of the target subgroup on the source log stream can be transferred to the destination log stream. When the target balancing group is a second type of balancing group composed of partitions in the partition table, the partitions in the second type of balancing group are divided into several subgroups according to the preset balancing target, and the target subgroup with the largest ratio of the number of partitions on the destination log stream to the number of partitions on the source log stream is determined, so that any partition of the target subgroup on the source log stream can be transferred to the destination log stream. A partition balancing device for a distributed database; wherein, The data stored in the database is divided into several partitions; data operations on the partitions are recorded in several log streams; the device includes: The balance group construction module constructs balance group information to divide each partition in the database into its corresponding balance group. The intra-group balancing module determines whether the number of partitions in each log stream is balanced across all balancing groups. If not, it performs intra-group balancing processing on that balancing group. The inter-group balancing module determines whether the total number of partitions on each log stream is balanced. If not, it performs inter-group balancing processing on each balanced group. The intra-group balancing process and the inter-group balancing process include: transferring specific partitions between the source log stream and the destination log stream. A computer program product comprising a computer program / instructions that, when executed by a processor, implement the steps of the method according to any one of claims 1 to 11. An electronic device, comprising: processor; Memory used to store processor-executable instructions; The processor implements the method as described in any one of claims 1 to 11 by executing the executable instructions. A computer-readable storage medium having computer instructions stored thereon, which, when executed by a processor, implement the steps of the method as claimed in any one of claims 1 to 11.