A distributed database data synchronization method, device and equipment and readable medium
By setting rewrite version numbers and log sequence numbers as markers in the distributed database, data consistency issues in cases of network partitions and system crashes are resolved. This enables handshakes to be completed under any circumstances, reducing the frequency and cost of full synchronization and improving database availability.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- ALIBABA CLOUD COMPUTING CO LTD
- Filing Date
- 2023-03-09
- Publication Date
- 2026-06-12
AI Technical Summary
Existing distributed databases struggle to maintain data consistency in situations such as network partitioning, changes in instance topology, or system crashes, leading to frequent full synchronizations that incur high costs and performance losses.
By setting markers for rewrite version numbers and log sequence numbers, incremental or full synchronization can be determined based on these markers, ensuring that a handshake can be completed under any circumstances and reducing the impact of network partitions, instance topology changes, and system crashes on database availability.
It enables handshakes to be completed through a single point of contact under any circumstances, significantly reducing the impact of network partitions, instance topology changes, and downtime on database availability, and reducing the frequency and cost of full synchronization.
Smart Images

Figure CN116361387B_ABST
Abstract
Description
Technical Field
[0001] This invention belongs to the field of distributed database technology, and in particular relates to a distributed database data synchronization method, apparatus, device, and readable medium. Background Technology
[0002] In-memory databases typically employ a distributed approach to improve availability, requiring checkpoints between multiple distributed instances to verify data consistency.
[0003] The existing solution uses only incrementally increasing point numbers, which easily triggers full resynchronization, which is costly. Furthermore, because full resynchronization involves limited point information stored in memory, it's easy for points to be unavailable, typically failing to maintain data consistency after network interruptions or changes in instance topology.
[0004] The information disclosed in this background section is intended only to enhance the understanding of the overall background of the invention and should not be construed as an admission or in any way implying that the information constitutes prior art known to those skilled in the art. Summary of the Invention
[0005] The purpose of this invention is to provide a distributed database data synchronization method, apparatus, device, and readable medium. By setting rewrite version numbers and log sequence numbers to mark logs in the database, incremental synchronization or full synchronization is determined based on the points, thereby enabling handshake to be completed with a single point under any circumstances, significantly reducing the impact of network partitions, instance topology changes, and downtime on database availability.
[0006] To achieve the above objectives, one aspect of this invention provides a distributed database data synchronization method, comprising the following steps performed on the primary database: receiving a backup database position sent by a backup database, wherein the backup database position is the position corresponding to the latest log in the backup database; searching for the corresponding log in the primary database based on the rewrite version number and log sequence number in the backup database position; and determining whether the position of the log in the primary database is consistent with the position of the backup database; if the position of the log in the primary database is consistent with the position of the backup database, sending a continue flag to the backup database, and starting from the next log corresponding to the position, sending data to the backup database for incremental synchronization.
[0007] In some implementations, the following steps are also performed on the primary database: in response to receiving a user update command, the command is written to the incremental database, a point is set for the command, and logs are generated based on the command and its point, wherein the database sequence number in the point is set to a unique identifier sequence of the primary database, the rewrite version in the point is set to 0, and the log sequence number in the point is set to the maximum log sequence number in the incremental database plus one; the logs in the incremental database are periodically synchronized to the full database; in response to each synchronization operation, the rewrite version number in all points in the full database is updated to the rewrite version number in the full database points before the synchronization operation plus one, and the log sequence number in the points corresponding to the synchronized logs is sequentially set to increment sequentially from the maximum log sequence number before the synchronization operation plus one; the synchronized logs in the incremental database are periodically and quantitatively deleted.
[0008] In some implementations, finding the corresponding log in the primary database based on the rewrite version number and log sequence number in the standby database location includes: determining whether the corresponding log is in the incremental database of the primary database based on the rewrite version number in the standby database location; if the rewrite version number in the standby database location is zero, confirming that the corresponding log is in the incremental database of the primary database, and searching for the corresponding log in the incremental database based on the log sequence number in the standby database location; if the rewrite version number in the standby database location is not zero, confirming that the corresponding log is in the full database of the primary database, and searching for the corresponding log in the full database based on the log sequence number in the standby database location.
[0009] In some implementations, if the log position in the primary database matches the position in the standby database, a continue flag is sent to the standby database, and incremental synchronization is performed starting from the next log entry corresponding to the specified position. This includes: if the log position in the primary database matches the position in the standby database and the log is in the full database, a continue flag is sent to the standby database, and incremental synchronization is performed starting from the next log entry corresponding to the specified position until the last log entry in the full database is sent, and then logs are continuously sent to the standby database starting from the first log entry in the incremental database; if the log position in the primary database matches the position in the standby database and the log is in the incremental database, a continue flag is sent to the standby database, and incremental synchronization is performed starting from the next log entry corresponding to the specified position.
[0010] In some implementations, the method further includes: if the log position in the primary database is inconsistent with the log position in the backup database, sending a full synchronization flag to the backup database to clear its logs; sending logs to the backup database starting from the first log in the full database of the primary database until the last log in the full database is sent; and continuously sending logs to the backup database starting from the first log in the incremental database of the primary database.
[0011] In some implementations, the method further includes: in response to a master-slave switchover operation, updating the database sequence number in the points of the full database of the new master database to the unique identifier sequence of the new master database, and updating the rewrite version in the points of the full database of the new master database to 1.
[0012] In some implementations, the method further includes: in response to a primary / standby switchover operation, using the point corresponding to the latest log in the new standby database as the standby database point, and sending the standby database point to the new primary database to request data synchronization.
[0013] In another aspect of this invention, a distributed database data synchronization device is provided, comprising: a judgment module configured to receive a standby database position sent by a standby database, wherein the standby database position is the position corresponding to the latest log in the standby database, the module searches for the corresponding log in the primary database based on the rewrite version number and log sequence number in the standby database position, and determines whether the position of the log in the primary database is consistent with the position of the standby database; and an incremental synchronization module configured to send a continue flag to the standby database if the position of the log in the primary database is consistent with the position of the standby database, and to send incremental synchronization to the standby database starting from the next log corresponding to the position.
[0014] In another aspect of the present invention, a computer device is provided, comprising: at least one processor; and a memory storing computer instructions executable on the processor, the instructions, when executed by the processor, implementing the steps of the above-described method.
[0015] In another aspect, the present invention provides a computer-readable storage medium storing a computer program that, when executed by a processor, implements the above-described method steps.
[0016] The present invention has at least the following beneficial technical effects: by setting the rewrite version number and log sequence number to mark the logs in the database, incremental synchronization or full synchronization is determined based on the point, thereby enabling handshake to be completed with a single point under any circumstances, significantly reducing the impact of network partitions, instance topology changes, downtime, etc. on database availability. Attached Figure Description
[0017] To more clearly illustrate the technical solutions in the embodiments of the present invention or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are only some embodiments of the present invention. For those skilled in the art, other embodiments can be obtained based on these drawings without creative effort.
[0018] Figure 1 This is a schematic diagram of an embodiment of the distributed database data synchronization method provided by the present invention;
[0019] Figure 2 This is a schematic diagram of the master database processing flow of an embodiment of the distributed database data synchronization method provided by the present invention;
[0020] Figure 3 This is a schematic diagram of the backup database processing flow of an embodiment of the distributed database data synchronization method provided by the present invention;
[0021] Figure 4 This is a schematic diagram of the database storage state of an embodiment of the distributed database data synchronization method provided by the present invention;
[0022] Figure 5 This is a schematic diagram of the database storage state of an embodiment of the distributed database data synchronization method provided by the present invention;
[0023] Figure 6 This is a schematic diagram of the database storage state of an embodiment of the distributed database data synchronization method provided by the present invention;
[0024] Figure 7 This is a schematic diagram of the database storage state of an embodiment of the distributed database data synchronization method provided by the present invention;
[0025] Figure 8 This is a schematic diagram of the database storage state of an embodiment of the distributed database data synchronization method provided by the present invention;
[0026] Figure 9 This is a schematic diagram of an embodiment of the distributed database data synchronization device provided by the present invention;
[0027] Figure 10 This is a schematic diagram of an embodiment of the computer device provided by the present invention;
[0028] Figure 11 This is a schematic diagram of an embodiment of the computer-readable storage medium provided by the present invention. Detailed Implementation
[0029] To make the objectives, technical solutions, and advantages of this invention clearer, the specific embodiments of this invention are described in detail below. However, it should be understood that the scope of protection of this invention is not limited to the specific embodiments.
[0030] It should be noted that all uses of "first" and "second" in the embodiments of the present invention are for the purpose of distinguishing two entities or parameters with the same name but different names. It is clear that "first" and "second" are only for the convenience of expression and should not be construed as limiting the embodiments of the present invention. Subsequent embodiments will not explain this in detail.
[0031] Unless otherwise expressly stated, throughout the specification and claims, the term "comprising" or its variations such as "including" or "comprises" shall be understood to include the stated elements or components without excluding other elements or other components.
[0032] To facilitate understanding of the technical solutions provided in the embodiments of this application by those skilled in the art, the relevant technologies are briefly described below: Existing distributed databases achieve data consistency through Redis. Redis uses replication_id (replication ID): offset to mark the position of the in-memory database. However, there are problems such as easy triggering of full synchronization, only full synchronization is possible after a system crash, only full synchronization is possible after the master database restarts, and only full synchronization is possible during master-slave switching. This leads to problems such as blocked engine execution, high CPU consumption, long packaging time, and large memory consumption.
[0033] Based on the above objectives, a first aspect of the present invention provides an embodiment of a distributed database data synchronization method. Figure 1 The diagram shown is a schematic representation of an embodiment of the distributed database data synchronization method provided by the present invention. Figure 1 As shown, the distributed database data synchronization method of this invention includes the following steps:
[0034] 001. Receive the standby database position sent by the standby database. The standby database position is the position corresponding to the latest log in the standby database. Based on the rewrite version number and log sequence number in the standby database position, find the corresponding log in the primary database and determine whether the position of the log in the primary database is consistent with the standby database position.
[0035] In this embodiment, both the primary and standby databases are in-memory databases, meaning all data is stored in memory. The primary database accepts user read and write requests, while the standby database typically does not accept user requests. Data in the primary database is continuously synchronized to the standby database, and generally, the data in the standby database is consistent with the primary database. The point is the Global Binlog Identifier (GBID), which has the format replica_id-rewritten_version-sequence_id (database sequence number-rewrite version number-log sequence number). For example, 16D1BF15B92B0DB7-5-2353426 represents the 2353426th log entry from the 5th rewrite of the database corresponding to 16D1BF15B92B0DB7. The GBID can be used to uniquely identify the entire binary log (binlog) or the data offset within the database. Different instances can compare their GBIDs to determine data consistency. By adding the rewrite version number field, the problem of unique log points is solved, allowing the database to complete a handshake under any circumstances using a single GBID point.
[0036] 002. If the log position in the primary database is consistent with that in the standby database, a continuation flag is sent to the standby database, and incremental synchronization is performed starting from the next log entry corresponding to the position.
[0037] In this embodiment, the standby database sends its latest GBID point to the primary database. The primary database checks if there is a corresponding GBID point in the full binary log on the persistent medium. If so, it continuously sends the full binary log after this point to the standby database for incremental resynchronization. GBID point consistency is confirmed if and only if the database sequence number, rewrite version number, and log sequence number are all equal.
[0038] Figure 2 This diagram illustrates the master database processing flow of an embodiment of the distributed database data synchronization method provided by the present invention. In this embodiment, as... Figure 2 As shown, the primary database receives the GBID point sent by the standby database, referred to as GBID_A. It checks the value of the rewritten_version in the GBID point. If it is 0, it searches for the GBID point of the full binary log binlog with the corresponding log sequence number sequence_id in the incremental database incr_binlog, referred to as GBID_B. It then determines whether GBID_A is consistent with GBID_B. If they are consistent, it sends the continuation flag +CONTINUE to the standby database and continuously sends the full binary log binlog to the standby database starting from the next entry of GBID_B.
[0039] In this embodiment, reference continues to be made to Figure 2 When the primary database receives the GBID point sent by the standby database, referred to as GBID_A, it checks the value of the rewritten_version in the GBID point. If it is 0, it searches for the GBID point of the full binary log binlog with the corresponding log sequence_id in the incremental database incr_binlog, referred to as GBID_B. It then determines whether GBID_A is consistent with GBID_B. If they are inconsistent, it enters the full resynchronization phase, sending a full synchronization flag +FULLRESYNC to the standby database. Starting from the first record in the full database base.binlog, it sends the full binary log binlog to the standby database. After the full database log is sent, it continuously sends the full binary log binlog to the standby database starting from the first record in the incremental database incr.binlog.
[0040] In this embodiment, reference continues to be made to Figure 2 When the primary database receives the GBID point sent by the standby database, referred to as GBID_A, it checks the value of the rewritten_version number in the GBID point. If it is greater than 0, it searches for the GBID point of the full binary log binlog with the corresponding log sequence_id in the full database base.binlog, referred to as GBID_C. It then determines whether GBID_A is consistent with GBID_C. If they are consistent, it sends a continuation flag +CONTINUE to the standby database and starts sending the full binary log binlog to the standby database starting from the next entry of GBID_C. After the full database log is sent, it continuously sends the full binary log binlog to the standby database starting from the first entry of the incremental database incr.binlog.
[0041] In this embodiment, reference continues to be made to Figure 2When the primary database receives the GBID point sent by the standby database, referred to as GBID_A, it checks the value of the rewritten version number (rewritten_version) in the GBID point. If it is greater than 0, it searches for the GBID point of the full binary log binlog with the corresponding log sequence number (sequence_id) in the full database (base.binlog), referred to as GBID_C. It then determines whether GBID_A is consistent with GBID_C. If they are inconsistent, it enters the full resynchronization phase, sending a full synchronization flag +FULLRESYNC to the standby database. Starting from the first entry in the full database (base.binlog), it sends the full binary log binlog to the standby database. After the full database log is sent, it continuously sends the full binary log binlog to the standby database starting from the first entry in the incremental database (incr.binlog).
[0042] Figure 3 This diagram illustrates the backup database processing flow of an embodiment of the distributed database data synchronization method provided by the present invention. Figure 3 As shown, the standby database initiates a handshake request to the primary database, sends its current GBID point, and checks the primary database's response. If the response is the "Continue" flag + CONTINUE, the standby database accepts the full binary log (binlog) and executes the commands in the binlog. If the response is the "Full Synchronization" flag + FULLRESYNC, the standby database clears all data, accepts the full binary log (binlog), and executes the commands in the binlog.
[0043] In this embodiment, after the database restarts, it loads all binlogs from the persistent media and uses the last binlog entry as its latest GBID. When this database, acting as a backup, attempts to connect to other databases, it sends this GBID for handshaking. When this database, acting as the primary, accepts handshakes from other databases using GBIDs, it also checks the persistent media for a matching GBID. Therefore, the GBID-based point system is restart-safe, and restarting will not lead to the loss of points.
[0044] Existing Redis methods are prone to full resynchronization because they store relatively little point information in memory, making it easy to encounter situations where points cannot be found. In contrast, the point system in this application guarantees that all current point information is stored on the persistent medium, in addition to retaining a certain amount of historical point information, virtually eliminating the possibility of point loss.
[0045] Existing Redis full resynchronization has a high cost. The full resynchronization proposed in this application has a lower cost. First, it does not trigger a fork; second, it naturally avoids the operation of packaging dump.rdb; third, it does not cache binlog during the process, and all binlog is appended to the persistent medium; fourth, because base.binlog also contains bit information, the full resynchronization process can continue from where it failed last time.
[0046] Existing Redis implementations suffer from master-slave failover issues after master restarts or crashes. This application addresses this by persisting point information to a persistent storage medium, ensuring that master restarts do not result in point information loss, and the slave can still perform low-cost partial resynchronization. When the master crashes for any reason, external components can promote the slave to master. Afterward, when the old master restarts and attempts to handshake with the new master, it can successfully enter the partial resynchronization phase. This is because, regardless of whether it's master or slave, point information can still be retrieved from the persistent storage medium after a restart.
[0047] In some embodiments of the present invention, the following steps are further performed in the master database: in response to receiving a user update command, the command is written to the incremental database, a point is set for the command, and a log is generated based on the command and its point, wherein the database sequence number in the point is set to the unique identifier sequence of the master database, the rewrite version in the point is set to 0, and the log sequence number in the point is set to the maximum log sequence number in the incremental database plus one; the logs in the incremental database are periodically synchronized to the full database; in response to each synchronization operation, the rewrite version number in all points in the full database is updated to the rewrite version number in the full database points before the synchronization operation plus one, and the log sequence number in the points corresponding to the synchronized logs is sequentially set to increment sequentially from the maximum log sequence number before the synchronization operation plus one; the synchronized logs in the incremental database are periodically and quantitatively deleted.
[0048] Figure 4 This diagram illustrates the database storage state of an embodiment of the distributed database data synchronization method provided by the present invention. Figure 4As shown, in this embodiment, XXXXXA is the database sequence number replica_id. Since all current full binary logs (binlog) are generated by this database, the replica_id is always XXXXXA. Five command keys (SET K_001V_001, SET K_002V_002, SET K_003V_003, SET K_004V_004, SET K_005V_005) are written to the database, causing a binlog rewrite, which is then written to the full database base.binlog. Subsequently, five user-updated command keys (SET K_011V_001, SET K_012V_002, SET K_013V_003, SET K_014V_004, SET K_015V_005) are received and written to the incremental database incr.binlog. At this point, the database only performs one binlog rewrite, so the rewritten version number (rewritten_version) in the full database base.binlog is 1. The rewritten version number (rewritten_version) in the incremental database (incr.binlog) is always 0. The log sequence number (sequence_id) in both the full database (base.binlog) and the incremental database (incr.binlog) increments from 1.
[0049] Tair, a cloud-native in-memory database, implements a log-based synchronization protocol, ensuring that all database data is stored in binlog format on persistent media. Because binlog is written sequentially and append-only, it can handle the high write pressure of an in-memory database. Since the incremental database `incr.binlog` records all real-time user requests, Tair periodically rewrites the binlog to prevent it from continuously expanding. This rewriting process deletes all binlog entries on persistent media and generates new full database `base.binlog` and incremental database `incr.binlog`.
[0050] Figure 5 This diagram illustrates the database storage state of an embodiment of the distributed database data synchronization method provided by the present invention. Figure 5As shown, in this embodiment, after the second binlog rewrite, the `rewritten_version` field of all rewritten versions in the full database is incremented by 1. Simultaneously, all command keys are written to the full database during the rewrite, so after the second rewrite, there are 10 command keys in the full database. After the rewrite, five more command keys (SET K_016V_001, SET K_017V_002, SET K_018V_003, SET K_019V_004, SET K_020V_005) are written and recorded in the incremental database, with the log sequence number `sequence_id` remaining auto-incrementing.
[0051] In some embodiments of the present invention, finding the corresponding log in the primary database based on the rewrite version number and log sequence number in the standby database location includes: determining whether the corresponding log is in the incremental database of the primary database based on the rewrite version number in the standby database location; in response to the rewrite version number in the standby database location being zero, confirming that the corresponding log is in the incremental database of the primary database, and searching for the corresponding log in the incremental database based on the log sequence number in the standby database location; in response to the rewrite version number in the standby database location not being zero, confirming that the corresponding log is in the full database of the primary database, and searching for the corresponding log in the full database based on the log sequence number in the standby database location.
[0052] In this embodiment, reference continues to be made to Figure 5 When the primary database receives the GBID point XXXXXA-0-8 sent by the standby database, where the rewrite version number is 0, it confirms that the corresponding log is in the incremental database of the primary database and searches for the corresponding log in the incremental database based on the log sequence number 8.
[0053] In this embodiment, reference continues to be made to Figure 5 If the primary database receives the GBID point XXXXXA-2-5 sent by the standby database, where the rewrite version number is 2, it confirms that the corresponding log is in the full database of the primary database and searches for the corresponding log in the incremental database based on the log sequence number 5.
[0054] In some embodiments of the present invention, if the log position in the primary database is consistent with the log position in the standby database, a continue flag is sent to the standby database, and incremental synchronization is performed starting from the next log entry corresponding to the position. This includes: if the log position in the primary database is consistent with the log position in the standby database and the log is in the full database, a continue flag is sent to the standby database, and incremental synchronization is performed starting from the next log entry corresponding to the position until the last log entry in the full database is sent, and logs are continuously sent to the standby database starting from the first log entry in the incremental database; if the log position in the primary database is consistent with the log position in the standby database and the log is in the incremental database, a continue flag is sent to the standby database, and incremental synchronization is performed starting from the next log entry corresponding to the position.
[0055] In this embodiment, reference continues to be made to Figure 5 When the primary database receives the GBID point XXXXXA-2-5 from the standby database, it finds the GBID corresponding to the fifth log in the primary database's full database as XXXXXA-2-5. Since the two GBIDs match, it starts sending logs to the standby database from the sixth log in the full database until the tenth log in the full database is sent. Then, it sends logs to the incremental database one by one. Incremental synchronization is completed when five logs in the incremental database are sent.
[0056] In this embodiment, reference continues to be made to Figure 5 When the primary database receives the GBID point XXXXXA-0-6 from the standby database, it finds the GBID corresponding to the first log in the primary database as XXXXXA-0-6. Since the two GBIDs match, it starts sending data to the standby database from the second log in the incremental database until the fifth log in the incremental data is sent, thus completing the incremental synchronization.
[0057] In some embodiments of the present invention, the method further includes: if the log position in the primary database is inconsistent with the log position in the standby database, a full synchronization flag is sent to the standby database to clear its logs; logs are sent to the standby database starting from the first log in the full database of the primary database until the last log in the full database is sent; and logs are continuously sent to the standby database starting from the first log in the incremental database of the primary database.
[0058] In this embodiment, the standby database sends its latest GBID point to the primary database. The primary database checks whether there is a corresponding GBID point in the full binary log on the persistent medium. If not, it sends all full binary logs to the standby database from the beginning to perform full resynchronization.
[0059] In this embodiment, reference continues to be made to Figure 5 When the primary database receives the GBID point XXXXXA-1-5 from the standby database, it finds the GBID corresponding to the fifth log in the primary database's full database to be XXXXXA-2-5. Since the GBIDs of the two are inconsistent, the primary database sends a full synchronization flag to the standby database and starts sending logs from the first log in the primary database's full database to the standby database. After sending 10 logs in the full database, the primary database sends logs one by one. After sending 5 logs in the incremental database, the full synchronization is completed.
[0060] In this embodiment, reference continues to be made to Figure 5When the primary database receives the GBID point XXXXXB-0-6 from the standby database, it finds the GBID corresponding to the first log in the primary database's incremental database to be XXXXXA-0-6. Since the two GBIDs are inconsistent, the primary database sends a full synchronization flag to the standby database and starts sending logs from the first log in the primary database's full database to the standby database. After sending 10 logs in the full database, the primary database sends logs in the incremental database one by one. The full synchronization is completed when 5 logs in the incremental database are sent.
[0061] In some embodiments of the present invention, the method further includes: in response to a master-slave switchover operation, updating the database sequence number in the points of the full database of the new master database to the unique identifier sequence of the new master database, and updating the rewrite version in the points of the full database of the new master database to 1.
[0062] Figure 6-8 This diagram illustrates the database storage state of an embodiment of the distributed database data synchronization method provided by the present invention. Figure 6 As shown in this embodiment, the left side is the primary database and the right side is the backup database. When a primary / backup switchover occurs, as follows: Figure 8 As shown, the left side is the new standby database, and the right side is the new primary database. The entire database of the new primary database is rewritten, the database sequence number is updated to the unique identifier sequence XXXXXB of the new primary database, and the rewritten version is updated to 1.
[0063] In some embodiments of the present invention, the method further includes: in response to a primary / standby switchover operation, using the point corresponding to the latest log in the new standby database as the standby database point, and sending the standby database point to the new primary database to request data synchronization.
[0064] In this embodiment, reference continues to be made to Figure 6-8 ,like Figure 6 As shown, in this embodiment, the left side is the primary database and the right side is the standby database. The standby database also rewrites its binlog, so the full database of the standby database may differ from that of the primary database. Figure 7 As shown, the standby database rewrote its binlog, and the user then wrote three command keys (SET K_021V_001, SET K_022V_002, SET K_023V_003). At this point, the full databases of the primary and standby databases are inconsistent, but the last few entries in the incremental database remain consistent. The database can still determine data consistency through these last few entries. After each binlog rewrite, incr.binlog should retain as many binlog entries as possible to improve the reconnection success rate. When the primary / standby switchover occurs, as follows... Figure 8As shown, the new primary database is XXXXXB. All requests will be sent to XXXXXB, and the replica_id in the subsequent binlog will also become XXXXXB. If the user continues to write three command keys (SET K_014V_001, SET K_015V002, and SET K_016V003), the replica_id field in the three newly written commands will already be XXXXXB, because the current primary database is XXXXXB, and these three binlog entries were generated by XXXXXB and sent to XXXXXA. The incremental database rewritten_version field is always 0, representing a user write request. The sequence_id field continues to increment. Therefore, regardless of whether a primary / standby switchover or a system crash restart occurs, the standby database can correctly know its current position information, i.e., the GBID of the last binlog entry, and then handshake with the primary database. The primary database can find the corresponding position information and begins the partial resynchronization process.
[0065] It should be noted that the steps in each embodiment of the above-mentioned distributed database data synchronization method can be interleaved, substituted, added, or deleted. Therefore, these reasonable permutations and combinations of the distributed database data synchronization method should also fall within the protection scope of this invention, and the protection scope of this invention should not be limited to the embodiments.
[0066] Based on the above objectives, a second aspect of the present invention provides a distributed database data synchronization device. Figure 9 The diagram shown is a schematic representation of an embodiment of the distributed database data synchronization device provided by the present invention. Figure 9 As shown, the distributed database data synchronization device of this embodiment includes the following modules: a judgment module 011, configured to receive the standby database position sent by the standby database, wherein the standby database position is the position corresponding to the latest log in the standby database, and to search for the corresponding log in the master database based on the rewrite version number and log sequence number in the standby database position, and to determine whether the position of the log in the master database is consistent with the position of the standby database; and an incremental synchronization module 012, configured to send a continue flag to the standby database if the position of the log in the master database is consistent with the position of the standby database, and to send incremental synchronization to the standby database starting from the next log corresponding to the position.
[0067] In view of the above objectives, a third aspect of the present invention provides a computer device. Figure 10 The diagram shown is a schematic representation of an embodiment of the computer device provided by the present invention. Figure 10As shown, the computer device of this embodiment includes the following means: at least one processor 021; and a memory 022, the memory 022 storing computer instructions 023 that can be executed on the processor, the instructions implementing the steps of the above method when executed by the processor.
[0068] The present invention also provides a computer-readable storage medium. Figure 11 The diagram shown is a schematic representation of an embodiment of the computer-readable storage medium provided by the present invention. Figure 11 As shown, computer-readable storage medium 031 stores a computer program 032 that, when executed by a processor, performs the methods described above.
[0069] Finally, it should be noted that those skilled in the art will understand that all or part of the processes in the above embodiments can be implemented by a computer program instructing related hardware. The program for the distributed database data synchronization method can be stored in a computer-readable storage medium. When executed, the program can include the processes of the embodiments of the above methods. The storage medium for the program can be a magnetic disk, optical disk, read-only memory (ROM), or random access memory (RAM), etc. The above computer program embodiments can achieve the same or similar effects as any of the corresponding foregoing method embodiments.
[0070] Furthermore, the method disclosed in the embodiments of the present invention can also be implemented as a computer program executed by a processor, which may be stored in a computer-readable storage medium. When the computer program is executed by the processor, it performs the functions defined in the method disclosed in the embodiments of the present invention.
[0071] Furthermore, the above-described method steps and system units can also be implemented using a controller and a computer-readable storage medium for storing a computer program that enables the controller to perform the functions of the above-described steps or units.
[0072] Those skilled in the art will also understand that the various exemplary logic blocks, modules, circuits, and algorithm steps described in conjunction with the disclosure herein can be implemented as electronic hardware, computer software, or a combination of both. To clearly illustrate this interchangeability between hardware and software, the functionality of various illustrative components, blocks, modules, circuits, and steps has been generally described. Whether this functionality is implemented as software or as hardware depends on the specific application and the design constraints imposed on the system as a whole. Those skilled in the art can implement the functionality in various ways for each specific application, but such implementation decisions should not be construed as departing from the scope of the embodiments disclosed herein.
[0073] In one or more exemplary designs, functionality may be implemented in hardware, software, firmware, or any combination thereof. If implemented in software, functionality may be stored as one or more instructions or code on or transmitted via a computer-readable medium. Computer-readable media include computer storage media and communication media, including any medium that facilitates the transfer of a computer program from one location to another. Storage media may be any available medium accessible to a general-purpose or special-purpose computer. By way of example, and not limitation, computer-readable media may include RAM, ROM, EEPROM, CD-ROM or other optical disc storage devices, disk storage devices or other magnetic storage devices, or any other medium that may be used to carry or store the required program code in the form of instructions or data structures and is accessible to a general-purpose or special-purpose computer or a general-purpose or special-purpose processor. Furthermore, any connection may be appropriately referred to as computer-readable media. For example, if software is transmitted from a website, server, or other remote source using coaxial cable, fiber optic cable, twisted pair, digital subscriber line (DSL), or wireless technologies such as infrared, radio, and microwave, then the aforementioned coaxial cable, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave are all included in the definition of media. As used herein, disks and optical discs include compact discs (CDs), laser discs, optical discs, digital versatile discs (DVDs), floppy disks, and Blu-ray discs, where disks typically reproduce data magnetically, while optical discs reproduce data optically using lasers. Combinations of the above should also be included within the scope of computer-readable media.
[0074] The above are exemplary embodiments disclosed in this invention. However, it should be noted that various changes and modifications can be made without departing from the scope of the embodiments of this invention as defined by the claims. The functions, steps, and / or actions of the methods according to the disclosed embodiments described herein do not need to be performed in any particular order. Furthermore, although the elements disclosed in the embodiments of this invention may be described or claimed individually, they may be understood as multiple unless explicitly limited to a singular number.
[0075] It should be understood that, as used herein, the singular form “a” is intended to include the plural form as well, unless the context clearly supports an exception. It should also be understood that, as used herein, “and / or” refers to any and all possible combinations of one or more of the associated listed items.
[0076] The embodiment numbers disclosed in the above embodiments of the present invention are for descriptive purposes only and do not represent the superiority or inferiority of the embodiments.
[0077] Those skilled in the art will understand that all or part of the steps of the above embodiments can be implemented by hardware or by a program instructing related hardware. The program can be stored in a computer-readable storage medium, such as a read-only memory, a disk, or an optical disk.
[0078] Those skilled in the art should understand that the discussion of any of the above embodiments is merely exemplary and is not intended to imply that the scope of the invention (including the claims) is limited to these examples. Within the framework of the invention, technical features of the above embodiments or different embodiments can be combined, and many other variations of different aspects of the invention exist, which are not provided in the details for the sake of brevity. Therefore, any omissions, modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of the invention should be included within the protection scope of the invention.
Claims
1. A method for synchronizing data in a distributed database, the method comprising: This includes performing the following steps on the primary database: Receive the standby database position sent by the standby database. The standby database position is the position corresponding to the latest log in the standby database. Based on the rewrite version number and log sequence number in the standby database position, find the corresponding log in the primary database and determine whether the position of the log in the primary database is consistent with the standby database position. If the log position in the primary database matches the position in the standby database, a continuation flag is sent to the standby database, and incremental synchronization is performed starting from the next log entry corresponding to the position. The process of finding the corresponding log in the primary database based on the rewrite version number and log sequence number in the backup database location includes: Based on the rewrite version number in the backup database location, determine whether the corresponding log is in the incremental database of the primary database; If the rewrite version number in the standby database location is zero, the corresponding log is confirmed to be in the incremental database of the primary database, and the corresponding log is searched in the incremental database based on the log sequence number in the standby database location. If the rewrite version number in the standby database location is not zero, then the corresponding log is confirmed to be in the full database of the primary database, and the corresponding log is searched in the full database based on the log sequence number in the standby database location.
2. The distributed database data synchronization method according to claim 1, characterized in that, This also includes performing the following steps on the primary database: In response to receiving a user update command, the command is written to the incremental database, a point is set for the command, and a log is generated based on the command and its point. The database sequence number in the point is set to the unique identifier sequence of the master database, the rewrite version in the point is set to 0, and the log sequence number in the point is set to the maximum log sequence number in the incremental database plus one. Periodically synchronize logs from the incremental database to the full database; In response to each synchronization operation, the rewrite version number in all points of the full database is updated to the rewrite version number of the full database points before the synchronization operation plus one, and the log sequence number in the points corresponding to the synchronized log is set to increment sequentially from the maximum log sequence number before the synchronization operation plus one. Periodically and quantitatively delete the logs that have been synchronized in the incremental database.
3. The distributed database data synchronization method according to claim 1 or 2, characterized in that, If the log position in the primary database matches the position in the standby database, a continuation flag is sent to the standby database, and incremental synchronization is performed starting from the next log entry corresponding to that position. If the log position in the primary database is consistent with the position in the standby database and the log is in the full database, then a continue flag is sent to the standby database, and logs are sent to the standby database starting from the next log corresponding to the position until the last log in the full database is sent, and logs are continuously sent to the standby database starting from the first log in the incremental database. If the log position in the primary database matches the position in the standby database and the log is in the incremental database, a continue flag is sent to the standby database, and logs are continuously sent to the standby database starting from the next log corresponding to the position.
4. The distributed database data synchronization method according to claim 1 or 2, characterized in that, Also includes: If the log points in the primary database are inconsistent with those in the standby database, a full synchronization flag is sent to the standby database to clear its logs. Logs are sent from the first log entry in the primary database's full database to the standby database until the last log entry in the full database is sent. Logs are also continuously sent from the first log entry in the primary database's incremental database to the standby database.
5. The distributed database data synchronization method according to claim 1, characterized in that, Also includes: In response to a master-slave switchover operation, the database sequence number in the full database of the new master is updated to the unique identifier sequence of the new master, and the rewrite version in the full database of the new master is updated to 1.
6. The distributed database data synchronization method according to claim 1, characterized in that, Also includes: In response to a primary / standby switchover operation, the point corresponding to the latest log in the new standby database is used as the standby database point, and the standby database point is sent to the new primary database to request data synchronization.
7. A distributed database data synchronization device, characterized in that, include: The judgment module is configured to receive the standby database position sent by the standby database. The standby database position is the position corresponding to the latest log in the standby database. Based on the rewrite version number and log sequence number in the standby database position, the module searches for the corresponding log in the primary database and determines whether the position of the log in the primary database is consistent with the standby database position. The incremental synchronization module is configured to send a continuation flag to the standby database if the log position in the master database is consistent with the position in the standby database, and to start sending to the standby database from the next log corresponding to the position to perform incremental synchronization. The process of finding the corresponding log in the primary database based on the rewrite version number and log sequence number in the backup database location includes: Based on the rewrite version number in the backup database location, determine whether the corresponding log is in the incremental database of the primary database; If the rewrite version number in the standby database location is zero, the corresponding log is confirmed to be in the incremental database of the primary database, and the corresponding log is searched in the incremental database based on the log sequence number in the standby database location. If the rewrite version number in the standby database location is not zero, then the corresponding log is confirmed to be in the full database of the primary database, and the corresponding log is searched in the full database based on the log sequence number in the standby database location.
8. A computer device, characterized in that, include: At least one processor; as well as A memory storing computer instructions executable on the processor, which, when executed by the processor, implement the steps of the method according to any one of claims 1-6.
9. A computer-readable storage medium storing a computer program, characterized in that, When the computer program is executed by a processor, it implements the steps of the method according to any one of claims 1-6.