Data replication method and apparatus, electronic device, and computer storage medium
By obtaining the binary log transaction and position information of the MySQL database, identifying and recording unexecuted transactions in the target database, the data replication problem of GTID unavailability is solved, and an efficient and reliable data replication process is achieved.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- TENCENT CLOUD COMPUTING (BEIJING) CO LTD
- Filing Date
- 2021-03-19
- Publication Date
- 2026-06-30
AI Technical Summary
In MySQL databases, when GTID is unavailable, existing technologies cannot effectively and quickly replicate database data, leading to duplicate transaction execution and data corruption during the data replication process.
By obtaining transaction and position information from the binary log of the database to be replicated, it is determined whether the transaction has been executed in the target database. Transactions that have not been executed are executed in the target database and their position information is recorded to prevent duplicate execution. Two database tables are used to handle insert and merge operations respectively to improve efficiency.
Ensuring the reentrancy and correctness of the data replication process guarantees data consistency even when multiple failures occur, avoiding data corruption caused by repeated transaction execution.
Smart Images

Figure CN115114258B_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of database technology, and more specifically, to a data copying method, apparatus, electronic device, and computer storage medium. Background Technology
[0002] In MySQL databases, the binary log records all statements that modify or alter the database. Replaying the binary log is primarily used for data replication between databases, such as creating a slave database from a master database and copying data from one database to the other.
[0003] In existing technologies, GTID (Global Transaction ID identifier) is used, which is a globally unique identifier associated with each committed transaction by the database. Each transaction is identified by GTID so that transactions that have already been executed are not executed again.
[0004] In situations where GTID is unavailable, such as when the version is too low to support it, GTID is explicitly disabled by the database, or cloud service providers prohibit its use in certain versions, GTID cannot be used for binary log replay. Furthermore, GTID requires authorization when interacting with databases outside of MySQL. In these scenarios where GTID is not applicable, efficient and rapid data replication from the database is impossible. Summary of the Invention
[0005] The present invention provides a data copying method, apparatus, electronic device, and computer storage medium that overcomes or at least partially solves the above-mentioned problems.
[0006] Firstly, a data replication method is provided, the method comprising:
[0007] Obtain at least one binary log from the database to be replicated. Each binary log includes at least one transaction and the location information of each transaction, which indicates the storage location of the transaction in the database to be replicated.
[0008] Traverse the transactions in the binary log. For each target transaction, determine whether the target transaction's position information is stored in the preset first database table.
[0009] If it is determined that the position information of the target transaction is not stored in the first database table, then the target transaction is executed in the target database, and the position information of the target transaction is recorded in the first database table, until all transactions in the binary log have been traversed.
[0010] In this case, the data generated after the target transaction is executed in the target database is the same as the data generated when the target data is executed in the database to be replicated.
[0011] In one possible implementation, traversing transactions in the binary log also includes:
[0012] If the location information of the target transaction is determined to be stored in the first database table, then continue to traverse the next transaction.
[0013] In another possible implementation, the target transaction is executed in the target database, preceded by:
[0014] Generate target operation instructions, which, when executed, store the location information of the target transaction in the first database table;
[0015] Insert the target operation instruction into the operation sequence of the transaction to be executed.
[0016] In another possible implementation, the binary log containing the target transaction is used as the target binary log, and the location information of the target transaction includes the file name of the target binary log and the storage location of the target transaction in the target binary log.
[0017] The location information of the target transaction is stored in the first database table, including:
[0018] Store the location information of the target transaction in a pre-defined second database table;
[0019] The filename of the target binary log is searched in the first database table. If the filename of the target binary log is found in the first database table, the storage location of the target transaction in the target binary log is merged into the storage location of all executed transactions in the target binary log in the first database table.
[0020] In yet another possible implementation, the filename of the target binary log is looked up in the first database table, followed by:
[0021] If it is determined that the filename of the target binary log does not exist in the first database table, then the filename of the target binary log is created in the first database table, and the storage location of all executed transactions in the target binary log is updated to the storage location of the target transaction in the target binary log.
[0022] In another possible implementation, determining whether the location information of the target transaction is stored in a preset first database table includes:
[0023] If the filename of the target binary log is stored in the first database table, then determine whether the target storage location is recorded in the storage location of all transactions executed in the target binary log in the first database table;
[0024] If the target storage location is recorded in the storage location of all executed transactions in the target binary log in the first database table, then the location information of the target transaction is stored in the first database table.
[0025] The target storage location is the location where the target transaction is stored in the target binary log.
[0026] In yet another possible implementation, the target transaction is stored in the binary log in chronological order of its recording time.
[0027] Continue iterating through the next transaction, including:
[0028] Based on the order of the target transaction in the binary log, the transactions adjacent to the target transaction in the binary log are traversed sequentially.
[0029] Secondly, a data copying apparatus is provided, the apparatus comprising:
[0030] The acquisition module is used to acquire at least one binary log in the database to be replicated. Each binary log includes at least one transaction and the position information of each transaction. The position information is used to indicate the storage location of the transaction in the database to be replicated.
[0031] The judgment module is used to traverse the transactions in the binary log. For each target transaction, it determines whether the target transaction's position information is stored in the preset first database table.
[0032] The execution module is used to execute the target transaction in the target database and record the target transaction's position information in the first database table if it is determined that the position information of the target transaction is not stored in the first database table, until all transactions in the binary log have been traversed.
[0033] In this case, the data generated after the target transaction is executed in the target database is the same as the data generated when the target data is executed in the database to be replicated.
[0034] Thirdly, embodiments of the present invention provide an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the program to implement the steps of the method provided in the first aspect.
[0035] Fourthly, embodiments of the present invention provide a computer-readable storage medium having a computer program stored thereon, which, when executed by a processor, implements the steps of the method provided in the first aspect.
[0036] The data replication method, apparatus, electronic device, and storage medium provided in this invention obtain binary logs from the database to be replicated, determine the transactions and transaction position information recorded in the binary logs, and uniquely identify transactions using the transaction position information recorded in the binary logs themselves. During data replication between databases, it determines which transactions have been executed in the target database by judging which transaction position information has been stored in a preset first database table, preventing the same transaction from being executed multiple times in the target database. Even if the program fails, data corruption will not occur, and the reentrancy of operations during data replication is guaranteed, ensuring data correctness even if multiple failures occur. Attached Figure Description
[0037] To more clearly illustrate the technical solutions in the embodiments of this application, the accompanying drawings used in the description of the embodiments of this application will be briefly introduced below.
[0038] Figure 1 This is a schematic flowchart of a data copying method provided in an embodiment of this application;
[0039] Figure 2 A format diagram of a binary log file provided in an embodiment of this application;
[0040] Figure 3 A format diagram of transaction location information provided in an embodiment of this application;
[0041] Figure 4 A format diagram of a transaction location information set provided for an embodiment of this application;
[0042] Figure 5 A relationship diagram of transactions and events provided in an embodiment of this application;
[0043] Figure 6 A schematic diagram illustrating site information merging provided in an embodiment of this application;
[0044] Figure 7 This is a schematic diagram of the structure of a data copying device provided in an embodiment of this application;
[0045] Figure 8 This is a schematic diagram of the structure of an electronic device provided in an embodiment of this application. Detailed Implementation
[0046] The embodiments of this application are described in detail below. Examples of the embodiments are shown in the accompanying drawings, wherein the same or similar reference numerals denote the same or similar elements or elements having the same or similar functions throughout. The embodiments described below with reference to the accompanying drawings are exemplary and are only used to explain this application, and should not be construed as limiting the invention.
[0047] Those skilled in the art will understand that, unless explicitly stated otherwise, the singular forms “a,” “an,” and “the” used herein may also include the plural forms. It should be further understood that the term “comprising” as used in the specification of this application means the presence of features, integers, steps, operations, elements, and / or components, but does not exclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and / or groups thereof. It should be understood that when we say an element is “connected” or “coupled” to another element, it can be directly connected or coupled to the other element, or there may be intermediate elements. Furthermore, “connected” or “coupled” as used herein can include wireless connections or wireless coupling. The term “and / or” as used herein includes all or any units and all combinations of one or more associated listed items.
[0048] To make the objectives, technical solutions, and advantages of this application clearer, the embodiments of this application will be described in further detail below with reference to the accompanying drawings.
[0049] First, we will introduce and explain several terms involved in the embodiments of this application:
[0050] Cloud technology refers to a hosting technology that unifies hardware, software, and network resources within a wide area network (WAN) or local area network (LAN) to achieve data computing, storage, processing, and sharing. Cloud technology is a collective term for network technologies, information technologies, integration technologies, management platform technologies, and application technologies applied to cloud computing business models. It can form resource pools, providing flexible and convenient on-demand access. Cloud computing technology will become a crucial support. Backend services of technical network systems require substantial computing and storage resources, such as video websites, image websites, and many portal websites. With the rapid development and application of the internet industry, every item may have its own identification mark in the future, requiring transmission to backend systems for logical processing. Data at different levels will be processed separately, and various industry data will require robust system support, which can only be achieved through cloud computing.
[0051] A database, simply put, can be viewed as an electronic filing cabinet—a place to store electronic files, where users can perform operations such as adding, querying, updating, and deleting data. A "database" is a collection of data stored together in a certain way, capable of being shared by multiple users, with minimal redundancy, and independent of application programs.
[0052] A Database Management System (DBMS) is a computer software system designed to manage databases, generally possessing basic functions such as storage, retrieval, security, and backup. DBMSs can be classified according to the database model they support, such as relational or XML (Extensible Markup Language); or according to the type of computer they support, such as server clusters or mobile devices; or according to the query language used, such as SQL (Structured Query Language) or XQuery; or according to performance priorities, such as maximum scale or maximum operating speed; or other classification methods. Regardless of the classification method used, some DBMSs can cross categories, for example, simultaneously supporting multiple query languages.
[0053] A transaction is a logical unit in the execution process of a database management system. A transaction in a database typically contains a sequence of read / write operations on the database. Transactions have four main properties: atomicity, consistency, isolation, and durability.
[0054] A binary log is a binary file that records all operations that modify the database or cause changes to the data in the database. In other words, the binary log records the write operations to the database in a transaction, and these operations are individual events.
[0055] Binary logs are primarily used for database data recovery and master-slave replication. For example, a slave database can be created from a master database, and data can be migrated or copied from the master database to the slave database. When using binary logs for data replication between databases, various factors can cause failures, leading to interruptions in the replication process, such as sudden power outages or aging electronic equipment. In such scenarios, existing technologies typically use binary logs based on GTID (Global Transaction ID) to resume data replication after the failure is recovered.
[0056] Specifically, the key is how to record the "position" of data replication—that is, where it has been replicated, how much data had been replicated before the failure, and where it should start again after the failure is repaired. In other words, which transactions in the database to be replicated have already been re-executed in the target database, and which transactions have not yet been executed? GTID-based replication identifies transactions that have already been re-executed in the target database and merges them into a set. After the failure is repaired, it determines which transactions have already been executed, ensuring that those already executed are not repeated, and resumes the data replication process from before the failure.
[0057] However, GTID may not be usable in many situations. For example, it may be unavailable if the database version is too low to support GTID, GTID is disabled or cannot be enabled in the database's explicit settings, or the cloud service provider disables GTID in certain database versions. Even if GTID can be used, insufficient permissions may occur, such as requiring Super privileges to use the GTID mechanism. In these scenarios, some transactions may be executed multiple times while others are not executed, leading to data corruption and other problems.
[0058] The data copying method, apparatus, electronic device, and computer storage medium provided in this application are intended to solve the above-mentioned technical problems of the prior art.
[0059] This application's embodiments can be applied to data replication or migration between various databases using binary logs. Specifically, in cases where the data replication process is suddenly interrupted or malfunctions due to unforeseen circumstances such as power outages, network failures, or aging or malfunctioning electronic devices, the data replication method provided in this application's embodiments can ensure that the previous replication process continues after the fault is recovered in such application scenarios.
[0060] The technical solution of this application and how the technical solution of this application solves the above-mentioned technical problems are described in detail below with specific embodiments. These specific embodiments can be combined with each other, and the same or similar concepts or processes may not be described again in some embodiments. The embodiments of this application will now be described with reference to the accompanying drawings.
[0061] This application provides a data copying method. Figure 1 A flowchart of a data copying method provided in this application embodiment is shown below. Figure 1 As shown, the method includes:
[0062] S101. Obtain at least one binary log from the database to be replicated. Each binary log includes at least one transaction and the location information of each transaction. The location information is used to indicate the storage location of the transaction in the database to be replicated.
[0063] When performing database replication, the first step is to obtain the binary log from the database to be replicated. The binary log records the operations that generate data in the database to be replicated or modify the data in the database to be replicated. In other words, these are the transactions that generate or modify data in the database to be replicated. These transactions are the target transactions to be executed in the target database.
[0064] The database to be replicated contains multiple binary logs. Each binary log records multiple transactions, and a transaction in a binary log consists of multiple write operations to the database, each operation being an event. It should be noted that a transaction begins with a "BEGIN" event and ends with a "COMMIT" event. For example, the transaction "Student A purchased 100 yuan worth of goods at store B" contains four events: the "BEGIN" event, the "Student A's account decreased by 100 yuan" event, the "Store B's account increased by 100 yuan" event, and the "COMMIT" event.
[0065] Figure 2 The following is a format diagram of a binary log file provided in an embodiment of this application. As shown in the figure, the Log_name column represents the filename of the binary log file, i.e., "binlog.000103" in the example of the figure; the Position column represents the starting storage position of the transaction in the binary log file, which can be represented by Arabic numerals and is strictly incremental, i.e., "950", "1034", "1148", etc. in the example of the figure; the Event_type column represents the type of event, i.e., "Query", "Xid", etc. in the example of the figure; the Server_id column represents the unique identifier of the database server, i.e., "1" in the example of the figure; the End_log_position column represents the ending storage position of the transaction in the binary log file, which can be represented by Arabic numerals, i.e., "1034", "1148", "1179", etc. in the example of the figure; the Info column represents the basic information of the event, i.e., "BEGIN", "Us`huige`INSERT INTO temp VALUSE(2,'jesen')", "COMMIT / *xid=415* / ", etc. in the example of the figure.
[0066] The information combination recorded in the binary log in the above diagram can uniquely identify a transaction, that is, it constitutes the transaction's position information. The transaction's position information includes the database's unique identifier, the filename of the binary log containing the transaction, and the transaction's storage location within the binary log. Specifically, the position information of a transaction is simply represented by the information recorded in the binary log, used to represent this transaction, for example... Figure 2The last transaction Xid in the log can be represented as “binlog.000103:1148-1179”, where “binlog.000103” describes the filename of the binary log; “000103” is the filename sequence number; and “1148-1179” represents the position range of the transaction Xid in the file.
[0067] This application provides a specific format for representing the position information of transactions in a binary log. Figure 3 The following is a format diagram of transaction location information provided in an embodiment of this application, as shown in the figure. Server_id represents the unique identifier of the database server, i.e., “3E11FA47-71CA-11E1-9E33-C80AA9429562” in the example figure; filenumber represents the filename of the binary log, which is strictly incremental, i.e., “001” in the example figure; Transaction_range represents the position range of the transaction in the binary log, i.e., 0-4 in the example figure, which can be represented as a left-closed, right-open interval [0,4). Specifically, these three pieces of information can be separated by colons and represented as “3E11FA47-71CA-11E1-9E33-C80AA9429562:001:0-4”, used to represent the location information of the transaction, which can uniquely identify this transaction.
[0068] Furthermore, Figure 4 A format diagram of a transaction location information set provided in an embodiment of this application, and Figure 3 The difference lies in the fact that the Transaction_range set represents the set of position ranges of multiple transactions in the binary log. Multiple transactions in the same binary log file in the same database can have their position ranges in the binary log written together, separated by commas to form a set, i.e., "0-4,10-14" in the example in the figure. Similarly, these three pieces of information can also be separated by colons, and the whole set can be represented as "3E11FA47-71CA-11E1-9E33-C80AA9429562:001:0-4,10-14".
[0069] S102. Traverse the transactions in the binary log. For each target transaction, determine whether the target transaction's position information is stored in the preset first database table.
[0070] Specifically, the system can start from a user-specified position and traverse the binary log to search for transactions, read and parse the transactions, and then run them in the target database. However, due to interruptions caused by failures during data replication, some transactions may have already been re-executed in the target database, while others may not. This embodiment pre-establishes a first database table to store the position information of transactions that have already been re-executed in the target database. For each traversed transaction as the target transaction, the system can determine whether the target transaction's position information is stored in the first database table based on the position information stored therein, thus determining whether the target transaction has already been re-executed in the target database. This is a simple range judgment and is very fast.
[0071] S103. If it is determined that the position information of the target transaction is not stored in the first database table, then the target transaction is executed in the target database, and the position information of the target transaction is recorded in the first database table until all transactions in the binary log are traversed.
[0072] In this case, the data generated by the target transaction after execution in the target database is the same as the data generated by the target transaction when execution in the database to be replicated.
[0073] If it is determined that the location information of the target transaction is not stored in the preset first database table, that is, the target transaction is not re-executed in the target database, then the target transaction is executed in the target database. The data generated after the target transaction is executed in the target database is the same as the data generated when the target transaction is executed in the database to be replicated, so that the data in the target database is the same as the data in the database to be replicated, thus realizing data replication between databases.
[0074] It should be noted that after any target transaction is executed in the target database, the position information of the target transaction is recorded in the first database table, so that the first database table records the position information of all transactions executed in the target database. This is a dynamic process; each time a transaction is executed, its position information is recorded in the first database table until all transactions in the binary log have been traversed. At this point, the data in the target database is identical to the data in the database to be replicated.
[0075] This application embodiment obtains the binary log in the database to be copied, determines the transactions and transaction position information recorded in the binary log, and uses the transaction position information recorded in the binary log itself to uniquely identify the transaction. During data copying between databases, it determines which transactions have been executed in the target database by judging which transaction position information has been stored in a preset first database table, preventing the same transaction from being executed multiple times in the target database. Even if the program fails, data corruption will not occur, and the reentrancy of operations during data copying is guaranteed. Even if multiple failures occur, the correctness of the data can be guaranteed.
[0076] This application embodiment also provides a possible implementation method, which involves traversing transactions in a binary log, and further includes:
[0077] If the location information of the target transaction is determined to be stored in the first database table, then continue to traverse the next transaction.
[0078] If the location information of the target transaction obtained by traversing the binary log has been stored in the first database table, that is, the target transaction has been executed in the target database and the corresponding data has been generated, and there is no need to execute it again, then continue to traverse the next transaction in the binary log until all transactions in the binary log have been judged.
[0079] This embodiment of the application continuously traverses the binary log and uses the first database table to determine if a transaction has already been executed in the target database before proceeding to determine the next transaction. This ensures the efficiency of the data replication process and avoids repeatedly executing the same transaction.
[0080] This application embodiment also provides a possible implementation method, which involves executing a target transaction in the target database, and prior to this:
[0081] Generate target operation instructions, which, when executed, store the location information of the target transaction in the first database table;
[0082] Insert the target operation instruction into the operation sequence of the target transaction.
[0083] Figure 5 A transaction and event relationship diagram provided for embodiments of this application, such as Figure 5 As shown, BEGIN, EV1, EV2, COMMIT, etc. represent individual events. Each transaction begins with a BEGIN event and ends with a COMMIT event.
[0084] Therefore, a target operation instruction can be generated based on the position information of the target transaction and inserted into the operation sequence of the target transaction. Specifically, in this embodiment, the target operation instruction is inserted before the last event of the target transaction, namely the COMMIT event. Inserting the target operation instruction before the COMMIT event will not affect the execution of other operations in the transaction. When the transaction executes the COMMIT event, all operations before the COMMIT event have been completed. The COMMIT event is taken as the end of the transaction execution. Therefore, when the transaction execution ends, the target operation instruction will also be executed. This is determined by the atomicity of the transaction, that is, all operations in a transaction are either all executed or none are executed, and there will be no intermediate state. It can be guaranteed that after the target transaction is executed in the target database, the target operation instruction will also be executed. The target operation instruction is generated based on the position information of the target transaction and is used to store the position information of the target transaction in the first database table when it is executed.
[0085] This application embodiment inserts the target operation instruction into the operation sequence of the target transaction, and ensures that the target operation instruction is also executed after the target transaction is executed, based on the atomicity of the transaction, and stores the position information of the target transaction in the first database table, thereby avoiding the target transaction from being executed repeatedly in the target database, and logically ensuring the correctness of the data by utilizing the characteristics of the transaction itself.
[0086] This application embodiment also provides a possible implementation method, which uses the binary log where the target transaction is located as the target binary log, and the location information of the target transaction includes the file name of the target binary log and the storage location of the target transaction in the target binary log;
[0087] The location information of the target transaction is stored in the first database table, including:
[0088] Store the location information of the target transaction in a pre-defined second database table;
[0089] The filename of the target binary log is searched in the first database table. If the filename of the target binary log is found in the first database table, the storage location of the target transaction in the target binary log is merged into the storage location of all executed transactions in the target binary log in the first database table.
[0090] The database to be replicated contains multiple binary log files, and each binary log file contains multiple transactions. A transaction identified by traversing the binary logs is designated as the target transaction, and the binary log containing the target transaction is designated as the target binary log. Before storing the target transaction's position information in the first database table, it is necessary to first store the target transaction's position information in a pre-defined second database table, and then merge the position information stored in the second database table into the first database table.
[0091] The reason for using two database tables, a first database table and a second database table, in this embodiment is that, due to the locking mechanism, only one operation can be performed on a single database table at a time. Specifically, inserting data and merging data cannot be performed simultaneously. However, in actual implementation, inserting site information into a database table is very fast, potentially tens of thousands of times per second. If merging site information is also required simultaneously, lock waiting issues arise, reducing efficiency. Therefore, this embodiment uses two database tables, a first database table and a second database table, to perform the insertion and merging of site information respectively, effectively improving program performance.
[0092] When replication resumes after a failure, the position information in the second database table must first be merged into the first database table. This ensures reentrancy, meaning that the target transaction will not be executed repeatedly even if multiple failures occur. If the second database table stores too much data, the position information of the target transaction that has already been executed in the target database can be deleted periodically.
[0093] It should be noted that after a transaction is executed, its position information is inserted into the second database table. This position information is discretely distributed, i.e., one by one. For example... Figure 3 The example in the example is "3E11FA47-71CA-11E1-9E33-C80AA9429562:001:0-4". When the location information stored in the second database table is merged into the first database table, the location information in the first database table is aggregated; it is a set. This set can be represented based on the filename of the binary log file, for example... Figure 4 The example in the text is “3E11FA47-71CA-11E1-9E33-C80AA9429562:001:0-4,10-14”.
[0094] Figure 6This is a schematic diagram illustrating the merging of position information provided in an embodiment of this application. "3E11FA47-71CA-11E1-9E33-C80AA9429562:001:0-4,10-14" refers to transactions within the position range "0-4,10-14" of binary log "001" in the database to be replicated, "3E11FA47-71CA-11E1-9E33-C80AA9429562". In a database to be replicated, the database servers share the same unique identifier, and for transactions within the same binary log file, the binary log filenames are also identical. Therefore, what is being merged is actually the position range of the transactions within the binary log. Specifically, in this embodiment, "0" represents the end of the file; for example, "14-0" represents the position from "14" to the end of the file, "0".
[0095] like Figure 6 As shown, the first merge involves merging transactions in the "0-4, 10-14" position range of binary log file "001", transactions in the "6-10" position range of binary log file "001", and transactions in the "0-6" position range of binary log file "002", resulting in transactions in the "0-4, 6-14" position range of binary log file "001" and transactions in the "0-6" position range of binary log file "002".
[0096] The second merge: Transactions in the "0-4, 6-14" position range of binary log file "001", transactions in the "0-6" position range of binary log file "002", transactions in the "4-6" position range of binary log file "001", and transactions in the "14-0" position range of binary log file "001" are merged to obtain: transactions in the "0-6" position range of binary log file "002". Notably, the position range of binary log file "001" is merged into 0-0 and is no longer displayed.
[0097] This application embodiment improves program efficiency and ensures reentrancy by combining the first and second database tables to perform different tasks. Furthermore, by merging transaction location information within the same binary log file based on the filename, it facilitates the determination of whether a target transaction has been executed in the target database, thus accelerating the determination process.
[0098] This application embodiment also provides a possible implementation method, which involves searching for the filename of the target binary log in the first database table, and then further includes:
[0099] If it is determined that the filename of the target binary log does not exist in the first database table, then the filename of the target binary log is created in the first database table, and the storage location of all executed transactions in the target binary log is updated to the storage location of the target transaction in the target binary log.
[0100] According to the format of the transaction position information stored in the first database table mentioned above, the unique identifier of the database server is the same as the unique identifier of the database server to be replicated. That is, in this embodiment, the unique identifier of the database server is the same in the position information of all transactions. However, for transactions in different binary log files, the filenames of the binary logs in their position information are different, and of course, the position ranges of their position information in the binary log files are also different.
[0101] Therefore, to find the target binary log in the first database table, it is first necessary to determine whether the filename of the target binary log exists in the first database table. If the filename of the target binary log does not exist in the first database table, the filename of the target binary log is created in the first database table, and the storage location of all transactions that have been executed in the target binary log is updated to the storage location in the target binary log. That is, the position information of the transactions that have been executed in the target database is stored in the storage location of the filename of the target binary log created in the first database table, and the position information of the transactions is merged according to the filename of the target binary log to form a set of position information whose name includes the filename of the target binary log.
[0102] This application embodiment establishes a storage location by creating the filename of the target binary log where the target transaction is located in the first database table. This storage location stores transactions whose binary log filenames are the target binary log filenames from all position information, making the data in the first database table more concise and clear, and facilitating the determination of whether the position information of the transaction is stored in the first database table.
[0103] This application embodiment also provides a possible implementation method for determining whether the location information of the target transaction is stored in a preset first database table, including:
[0104] If the filename of the target binary log is stored in the first database table, then determine whether the target storage location is recorded in the storage location of all transactions executed in the target binary log in the first database table;
[0105] If the target storage location is recorded in the storage location of all executed transactions in the target binary log in the first database table, then the location information of the target transaction is stored in the first database table.
[0106] The target storage location is the location where the target transaction is stored in the target binary log.
[0107] To determine whether the location information of the target transaction is stored in the preset first database table, specifically, it is necessary to first determine whether the file name of the target binary log in which the target transaction is located exists in the first database table, that is, whether the file name of the target binary log has been created in the first database table.
[0108] If the filename of the target binary log exists in the first database table, since the position information of the target transaction also includes the storage location of the target transaction in the target binary log, the storage location of the target transaction in the target binary log is used as the target storage location. It is also necessary to determine whether the target storage location is recorded in the storage location of all transactions executed in the target binary log in the first database table, that is, whether the position information of the target transaction has been stored in the first database table.
[0109] If the target storage location is recorded in the storage location of all executed transactions in the target binary log in the first database table, then the location information of the target transaction is stored in the first database table.
[0110] This application embodiment can accurately determine whether the target transaction's location information is stored in the first database table by judging the file name of the target binary log in the target transaction's location information and the storage location of the target transaction in the target binary log. This determines whether the target transaction has been executed in the target database, avoiding the problem of data corruption caused by the target transaction being executed repeatedly in the target database. It ensures that the data in the target database is the same as the data in the database to be copied, thus improving data accuracy.
[0111] This application embodiment also provides a possible implementation method in which the storage location of the target transaction in the binary log is arranged according to the recording time order;
[0112] Continue iterating through the next transaction, including:
[0113] Based on the order of the target transaction in the binary log, the transactions adjacent to the target transaction in the binary log are traversed sequentially.
[0114] Transactions are typically recorded in chronological order in a binary log. When traversing the binary log, you can start traversing the target transaction from the position specified by the user. After traversing the target transaction, determine the next transaction adjacent to the target transaction and traverse it in chronological order. This way, you can traverse from the beginning to the end of the binary log file and read all transactions and their position information.
[0115] This application embodiment reads all transactions and their position information from the binary log by sequentially traversing it, which facilitates quick and easy further judgment.
[0116] This application also provides a possible implementation method in which the filenames of the binary logs in the database to be copied are identified by sequentially increasing Arabic numerals.
[0117] Binary log filenames are strictly incremental, typically using Arabic numerals, and arranged in strictly ascending order of the numbers. For example... Figure 5 The filenames of binary logs are identified by sequentially increasing Arabic numerals, such as "001" and "002". In this embodiment, the filenames of binary logs are directly represented by numbers.
[0118] The embodiments of this application use strictly incremental Arabic numerals to represent the filenames of binary logs, which is very concise and clear, and can also accurately express the filenames of the binary logs containing the transaction's location information.
[0119] This application provides a data copying device, such as... Figure 7 As shown, the device may include: an acquisition module 11, a judgment module 12, and an execution module 13, specifically:
[0120] The acquisition module 11 is used to acquire at least one binary log in the database to be replicated. Each binary log includes at least one transaction and the position information of each transaction. The position information is used to indicate the storage location of the transaction in the database to be replicated.
[0121] The judgment module 12 is used to traverse the transactions in the binary log. For each target transaction, it determines whether the position information of the target transaction is stored in the preset first database table.
[0122] Execution module 13 is used to execute the target transaction in the target database and record the target transaction's position information in the first database table if it is determined that the position information of the target transaction is not stored in the first database table, until all transactions in the binary log have been traversed.
[0123] Specifically, the data generated after the target transaction is executed in the target database is the same as the data generated when the target data is executed in the database to be replicated.
[0124] The data replication apparatus provided in this embodiment of the invention specifically executes the process described in the above-described method embodiment. For details, please refer to the content of the above-described data replication method embodiment; further details will not be repeated here. The data replication apparatus provided in this embodiment of the invention obtains binary logs from the database to be replicated, determines the transactions recorded in the binary logs and their position information, and uniquely identifies transactions using the position information of the transactions recorded in the binary logs themselves. During data replication between databases, it determines which transactions have already been executed in the target database by judging which transaction position information has been stored in a preset first database table, preventing the same transaction from being executed multiple times in the target database. Even if a program malfunctions, data corruption will not occur, and the reentrancy of operations during data replication is guaranteed, ensuring data correctness even if multiple malfunctions occur.
[0125] In one possible implementation, the judgment module 12 includes:
[0126] The traversal module is used to continue traversing the next transaction if the location information of the target transaction is determined to be stored in the first database table.
[0127] In another possible implementation, execution module 13 includes:
[0128] The insertion module is used to generate target operation instructions, which, when executed, store the location information of the target transaction in the first database table.
[0129] Insert the target operation into the operation sequence of the transaction to be executed.
[0130] In yet another possible implementation, execution module 13 also includes:
[0131] The storage module is used to use the binary log containing the target transaction as the target binary log. The location information of the target transaction includes the file name of the target binary log and the storage location of the target transaction in the target binary log.
[0132] The location information of the target transaction is stored in the first database table, including:
[0133] Store the location information of the target transaction in a pre-defined second database table;
[0134] The filename of the target binary log is searched in the first database table. If the filename of the target binary log is found in the first database table, the storage location of the target transaction in the target binary log is merged into the storage location of all executed transactions in the target binary log in the first database table.
[0135] In yet another possible implementation, the storage module includes:
[0136] Create a sub-unit to create the filename of the target binary log in the first database table if it is determined that the filename of the target binary log does not exist in the first database table, and update the storage location of all executed transactions in the target binary log to the storage location of the target transaction in the target binary log.
[0137] In yet another possible implementation, the storage module also includes:
[0138] The judgment sub-unit is used to determine whether the location information of the target transaction is stored in a preset first database table, including:
[0139] If the filename of the target binary log is stored in the first database table, then determine whether the target storage location is recorded in the storage location of all transactions executed in the target binary log in the first database table;
[0140] If the target storage location is recorded in the storage location of all executed transactions in the target binary log in the first database table, then the location information of the target transaction is stored in the first database table.
[0141] The target storage location is the location where the target transaction is stored in the target binary log.
[0142] In yet another possible implementation, the traversal module includes:
[0143] Sequential traversal of sub-cells is used to arrange the storage location of the target transaction in the binary log according to the chronological order of the records;
[0144] Continue iterating through the next transaction, including:
[0145] Based on the order of the target transaction in the binary log, the transactions adjacent to the target transaction in the binary log are traversed sequentially.
[0146] This application provides an electronic device comprising: a memory and a processor; at least one program stored in the memory, which, when executed by the processor, can achieve the following compared to the prior art: by acquiring binary logs in the database to be copied, determining the transactions recorded in the binary logs and the transaction position information, using the transaction position information recorded in the binary logs themselves to uniquely identify the transactions, and during data copying between databases, determining which transactions have been executed in the target database by judging which transaction position information has been stored in a preset first database table, preventing the same transaction from being executed multiple times in the target database, ensuring that data corruption will not occur even if the program fails, and guaranteeing the reentrancy of operations during data copying, ensuring data correctness even if multiple failures occur.
[0147] In one alternative embodiment, an electronic device is provided, such as Figure 8 As shown, Figure 8 The illustrated electronic device 4000 includes a processor 4001 and a memory 4003. The processor 4001 and the memory 4003 are connected, for example, via a bus 4002. Optionally, the electronic device 4000 may also include a transceiver 4004. It should be noted that in practical applications, the transceiver 4004 is not limited to one type, and the structure of this electronic device 4000 does not constitute a limitation on the embodiments of this application.
[0148] Processor 4001 may be a CPU (Central Processing Unit), a general-purpose processor, a DSP (Digital Signal Processor), an ASIC (Application Specific Integrated Circuit), an FPGA (Field Programmable Gate Array), or other programmable logic devices, transistor logic devices, hardware components, or any combination thereof. It can implement or execute the various exemplary logic blocks, modules, and circuits described in conjunction with the disclosure of this application. Processor 4001 may also be a combination that implements computational functions, such as including one or more microprocessor combinations, a combination of a DSP and a microprocessor, etc.
[0149] Bus 4002 may include a pathway for transmitting information between the aforementioned components. Bus 4002 may be a PCI (Peripheral Component Interconnect) bus or an EISA (Extended Industry Standard Architecture) bus, etc. Bus 4002 can be divided into address bus, data bus, control bus, etc. For ease of representation, Figure 8 The bus is represented by a single thick line, but this does not mean that there is only one bus or one type of bus.
[0150] The memory 4003 may be ROM (Read Only Memory) or other types of static storage devices capable of storing static information and instructions, RAM (Random Access Memory) or other types of dynamic storage devices capable of storing information and instructions, or EEPROM (Electrically Erasable Programmable Read Only Memory), CD-ROM (Compact Disc Read Only Memory) or other optical disc storage, optical disc storage (including compressed optical discs, laser discs, optical discs, digital universal optical discs, Blu-ray discs, etc.), magnetic disk storage media or other magnetic storage devices, or any other medium capable of carrying or storing desired program code in the form of instructions or data structures and accessible by a computer, but not limited thereto.
[0151] The memory 4003 stores application code that executes the scheme of this application, and its execution is controlled by the processor 4001. The processor 4001 executes the application code stored in the memory 4003 to implement the content shown in the foregoing method embodiments.
[0152] This application provides a computer-readable storage medium storing a computer program that, when run on a computer, enables the computer to execute the corresponding content in the aforementioned method embodiments. Compared with the prior art, by obtaining the binary log in the database to be copied, determining the transactions recorded in the binary log and their position information, and using the position information of the transactions recorded in the binary log itself to uniquely identify the transactions, during data copying between databases, by determining which transaction position information has been stored in a preset first database table, it is determined which transactions have been executed in the target database, preventing the same transaction from being executed multiple times in the target database. Even if the program fails, data corruption will not occur, and the reentrancy of operations during data copying is guaranteed, ensuring data correctness even if multiple failures occur.
[0153] It should be understood that although the steps in the flowcharts of the accompanying figures are shown sequentially as indicated by the arrows, these steps are not necessarily executed in the order indicated by the arrows. Unless explicitly stated herein, there is no strict order restriction on the execution of these steps, and they can be executed in other orders. Moreover, at least some steps in the flowcharts of the accompanying figures may include multiple sub-steps or multiple stages. These sub-steps or stages are not necessarily completed at the same time, but can be executed at different times, and their execution order is not necessarily sequential, but can be performed alternately or in turn with other steps or at least some of the sub-steps or stages of other steps.
[0154] The above are only some embodiments of the present invention. It should be noted that for those skilled in the art, several improvements and modifications can be made without departing from the principle of the present invention, and these improvements and modifications should also be considered within the scope of protection of the present invention.
Claims
1. A data copying method, characterized in that, The method includes: Obtain at least one binary log from the database to be replicated, each binary log including at least one transaction and position information of each transaction, the position information being used to indicate the storage location of the transaction in the database to be replicated; The transactions in the binary log are traversed. For each target transaction, it is determined whether the position information of the target transaction is stored in a preset first database table. The first database table is used to store the position information of transactions that have been re-executed in the target database. If it is determined that the location information of the target transaction is not stored in the first database table, then the target transaction is executed in the target database, and the location information of the target transaction is recorded in the first database table, until all transactions in the binary log have been traversed. Wherein, the data generated by the target transaction after execution in the target database is the same as the data generated by the target data when execution in the database to be replicated.
2. The data copying method according to claim 1, characterized in that, The process of traversing the transactions in the binary log also includes: If it is determined that the location information of the target transaction is stored in the first database table, then continue to traverse the next transaction.
3. The data copying method according to claim 1, characterized in that, Before executing the target transaction in the target database, the process also includes: Generate a target operation instruction, which, when executed, stores the location information of the target transaction in the first database table; Insert the target operation instruction into the operation sequence of the transaction to be executed.
4. The data copying method according to any one of claims 1-3, characterized in that, The binary log containing the target transaction is used as the target binary log. The location information of the target transaction includes the file name of the target binary log and the storage location of the target transaction in the target binary log. The step of storing the location information of the target transaction in the first database table includes: The location information of the target transaction is stored in a preset second database table; The filename of the target binary log is searched in the first database table. If the filename of the target binary log exists in the first database table, the storage location of the target transaction in the target binary log is merged into the storage location of all executed transactions in the target binary log in the first database table.
5. The data copying method according to claim 4, characterized in that, The step of searching for the filename of the target binary log in the first database table further includes: If it is determined that the filename of the target binary log does not exist in the first database table, then the filename of the target binary log is created in the first database table, and the storage location of all executed transactions in the target binary log is updated to the storage location of the target transaction in the target binary log.
6. The data copying method according to claim 4, characterized in that, The step of determining whether the location information of the target transaction is stored in a preset first database table includes: If the filename of the target binary log is stored in the first database table, then determine whether the target storage location is recorded in the storage location of all transactions executed in the target binary log in the first database table; If the target storage location is recorded in the storage location of all executed transactions in the target binary log in the first database table, then the location information of the target transaction is stored in the first database table. The target storage location is the storage location of the target transaction in the target binary log.
7. The data copying method according to claim 2, characterized in that, The target transaction is stored in the binary log in chronological order of its recording time. The process of continuing to traverse the next transaction includes: Based on the order of the target transaction in the binary log, the transactions adjacent to the target transaction in the binary log are sequentially determined and traversed.
8. A data copying device, characterized in that, The device includes: The acquisition module is used to acquire at least one binary log in the database to be replicated, each binary log including at least one transaction and the position information of each transaction, the position information being used to indicate the storage location of the transaction in the database to be replicated; The judgment module is used to traverse the transactions in the binary log. For each target transaction traversed, it determines whether the position information of the target transaction is stored in a preset first database table. The first database table is used to store the position information of transactions that have been re-executed in the target database. An execution module is configured to, if it is determined that the location information of the target transaction is not stored in the first database table, execute the target transaction in the target database and record the location information of the target transaction in the first database table, until all transactions in the binary log have been traversed. Wherein, the data generated by the target transaction after execution in the target database is the same as the data generated by the target data when execution in the database to be replicated.
9. An electronic device comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, characterized in that, When the processor executes the program, it implements the steps of the data copying method in the database as described in any one of claims 1 to 7.
10. A computer-readable storage medium, characterized in that, The computer-readable storage medium stores computer instructions that cause the computer to perform the steps of the data copying method in the database as described in any one of claims 1 to 7.