Data synchronization method, apparatus, storage medium, and electronic apparatus

A data synchronization and data storage space technology, applied in the computer field, can solve the problems of cumbersome and incremental synchronization tools not working properly, and achieve the effects of avoiding loss, simplifying the data synchronization process, and ensuring accuracy

Active Publication Date: 2019-01-18
NEUSOFT CORP
6 Cites 27 Cited by

AI-Extracted Technical Summary

Problems solved by technology

During the whole process, a lot of human workload is required, and the process of constantly monitoring the export and import of backup files is relatively cumbersome. At the same time, for some large databases, an initialization proc...
the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Method used

In the process of deriving the data to be backed up, the incremental data obtained can be stored in the preset cache queue, so that the transaction log of the source database can be avoided because the backup time is too long and the incremental backup caused by cleaning is wrong . Finally, data synchronization can be achieved by importing the data to be backed up from the source database and the incremental data in the cache queue into the target database...
the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Abstract

The invention relates to a data synchronization method, a device, a storage medium and an electronic device, which can simultaneously carry out steps of data initialization and incremental synchronization, simplifies the process of the whole database synchronization, and avoids incremental synchronization errors caused by cleaning transaction logs. The method comprises the following steps: while the data to be backed up is derived from the source database, the transaction log of the source database is monitored to obtain incremental data in the backup process; storing the incremental data in acache queue; the exported data to be backed up and the incremental data in the cache queue are imported to the target database.

Application Domain

Technology Topic

Incremental backupData synchronization +5

Image

  • Data synchronization method, apparatus, storage medium, and electronic apparatus
  • Data synchronization method, apparatus, storage medium, and electronic apparatus
  • Data synchronization method, apparatus, storage medium, and electronic apparatus

Examples

  • Experimental program(1)

Example Embodiment

[0071] The specific embodiments of the present disclosure will be described in detail below with reference to the accompanying drawings. It should be understood that the specific embodiments described herein are only used to illustrate and explain the present disclosure, and are not used to limit the present disclosure.
[0072] figure 1 It is a flowchart of a data synchronization method according to an exemplary embodiment, such as figure 1 As shown, the data synchronization method can be applied to a computer and includes the following steps.
[0073] Step S11: While exporting the data to be backed up from the source database, monitor the transaction log of the source database to obtain incremental data during the backup process.
[0074] Step S12: Store the incremental data in the cache queue.
[0075] Step S13: Import the exported data to be backed up and the incremental data in the cache queue into the target database.
[0076] The data stored in the database (including the data to be backed up) are in the form of data tables. The database transaction log is a set of sequentially written and appended files that store the change process of the database tables in the database. It is mainly used as an important basis for data recovery after a database failure. Stored in the transaction log are the transaction information to which the change belongs, the owner of the data table for the change, the data table where the change occurred, the type of change, and the data of the change.
[0077] In the embodiment of the present disclosure, the source database is a database connected to the production system and used to store transactional data generated by the production system, and the target database may be a database used to back up the source database. To synchronize data between the source database and the target database, it is necessary to perform the steps of exporting the data to be backed up in the source database (ie, full synchronization), and parsing the transaction log of the database to obtain incremental data (ie, incremental synchronization). Then when you start to query the data to be backed up from the source database, the source database will generate a mirror of the current data, so that the data to be backed up will not change with the operation of the source database, and the source database will generate increments during the full synchronization process The data will be synchronized with the full amount simultaneously through incremental synchronization.
[0078] In the process of exporting the data to be backed up, the acquired incremental data can be stored in the preset cache queue, which can avoid incremental backup errors caused by the transaction log of the source database being cleaned up due to the long backup time. Finally, the data to be backed up obtained from the source database and the incremental data in the cache queue are imported into the target database to achieve data synchronization.
[0079] Optionally, before importing the exported data to be backed up and the incremental data in the cache queue into the target database, the data to be backed up can be converted into standard data objects, and the incremental data can be converted into standard data objects. Among them, the standard data objects include at least owner information, data table information, metadata information, specific data, change types, and time stamps; the change types of data to be backed up are all insertion types.
[0080] When using the existing initialization tool that comes with the database for data synchronization, it is often limited by factors such as database type and version. For example, files exported from a source database of type Oracle 11g can only be imported into targets that are also Oracle 11g Database, the target database of type Oracle 10g cannot be imported, and even the database of MySQL type cannot be imported. In order to achieve data synchronization between different types of databases, the present disclosure can define standard data objects, and use JDBC (Java DataBase Connectivity, java database connection) to connect to the database to import and export data, which can be beneficial to further Import synchronization data into various types of target databases.
[0081] When exporting backup data from the source database, you can start data extraction threads separately for different data tables, perform data extraction in parallel, and convert the extraction results into standard data objects, such as data owners (OWNER), data Table information (TABLE), metadata information (such as column metadata (COLUMNS)), specific data (DATA), change type (OP), and time stamp (SCN). In order to facilitate data search, DATA can be a key-value structure, the key is the column name, and the value is a specific value. OP is divided into insert I, update U, delete D, and table structure change DDL according to specific changes. For the data to be backed up in the source database, the change type OP value is fixed to insert I. The time stamp SCN value of all exported data objects is the current time series of the database corresponding to this batch of data. Take the Oracle database as an example, this value is: select current_scn from v$database The current_scn queried for this batch of data results.
[0082] Analyze the transaction log of the source database and extract incremental data from it. The data content includes transaction information (XID), data owner (OWNER), data table information (TABLE), change data (DATA), change type (OP) and change data Timestamp (SCN). The OP includes transaction insertion (I), update (U), delete (D), transaction start (START), transaction rollback (ROLLBACK), transaction commit (COMMIT) and table structure change (DDL) of the data table.
[0083] Convert the identified change data into the same standard data object as the data to be backed up, where the time stamp (SCN) of the change data is the specific time series value of the change that occurred in the database extracted from the transaction log, so that it can be synchronized later During the process, it is convenient to know the sequence of each change transaction, and avoid the synchronization result error caused by the wrong sequence.
[0084] After obtaining the data to be backed up and the incremental data converted into standard data objects, data can be imported for various types of target databases. Optionally, the database type of the target database may be determined, and then the to-be-backed data and incremental data converted into standard data objects are loaded into SQL statements conforming to the database type, and then written into the target database.
[0085] For example, Oralce and MySQL target databases can convert standard data objects into SQL statements that conform to the target database type, and then write the data to the target database through the JDBC protocol.
[0086] When importing the target database, it is also necessary to determine the change type (OP) in the standard data object. If it is an insert (I), it is converted to an insert SQL operation; if it is an update (U), it is converted to an update SQL operation; if it is Delete (D) is converted to a delete SQL operation; if it is a table structure change (DDL), it is necessary to further determine the type of DDL operation, if it is an add column operation, it is converted to an ALTER TABLE ADD COLUMN SQL operation, etc.
[0087] Through the above methods, the data synchronization method of the present disclosure can be applied to different types of databases, and the applicability is relatively strong.
[0088] Optionally, monitor the transaction log of the source database to obtain incremental data during the backup process. When each change data in the transaction log is monitored, determine whether the corresponding transaction has been allocated based on the transaction information of the change record Data storage space; if the data storage space of the corresponding transaction is not allocated, a new data storage space is allocated and stored in the changed data; if the data storage space of the corresponding transaction has been allocated, the changed data is stored in the data of the corresponding transaction Storage space: Among them, the changed data stored in the same data storage space belongs to the same transaction, and the changed data stored in each data storage space is incremental data.
[0089] The change records analyzed from the transaction log are extracted in the order of records in the database, but as a transactional database, in order to ensure data consistency, it also needs to synchronize data backwards according to the transaction. In addition, transactions are committed sequentially, so it also needs to be synchronized backwards in the order of transactions. Each time a change record is extracted, the result can be reorganized according to the transaction information (XID) in the change record, and the change records with the same XID can be combined into a pile, stored in the same data storage space, and the XID is used as the storage space Unique identifier, then all subsequent insert (I), update (U), delete (D), transaction rollback (ROLLBACK), transaction commit (COMMIT) and table structure changes (DDL) with the same XID will be placed Store in this space. This process is called transaction composition, for example.
[0090] Optionally, for the change data whose type is transaction rollback, the data storage space corresponding to the transaction information of the change data can be cleared. For the change data submitted by the transaction type: determine whether the data storage space corresponding to the transaction information of the change data stores the change data of the transaction start type; if the change data of the transaction start type is not stored, the transaction from the source database The other change data of the transaction corresponding to the change data is traced forward in the log until the change data whose type of the transaction is the start of the transaction is obtained; the other change data traced back is stored in the data storage space where the change data is located. Then, storing the incremental data in the cache queue can be to put the data tables contained in the incremental data into the cache queue in the order of transaction submission.
[0091] That is to say, you can only synchronize the committed transaction, and discard the rolled back transaction. The transaction rollback (ROLLBACK) or transaction commit (COMMIT) in the change type will be the end of each transaction. If it is ROLLBACK, it will The data storage space is deleted; if it is a COMMIT operation, it means that the transaction is normally submitted and can be synchronized normally.
[0092] However, in practical applications, there may be cases where no transaction starts to change. This is because when you start to extract the data to be backed up from the source database, that is, when you start to monitor the transaction log of the source database, the location of the transaction log read may be in this transaction The intermediate position of all related changes. At this time, the acquired transaction is incomplete. In order to obtain the complete transaction data, it needs to be traced forward, such as transaction compensation.
[0093] After receiving a COMMIT change, it is necessary to identify whether there is a START change in the transaction, and if it does not exist, start the transaction compensation mechanism. To perform transaction compensation, a log analysis thread will be started separately, and only the log data of the transaction will be extracted, and because the start of each thing in the transaction log is missing at a specific position in the transaction log, the step-tracking method can be used, that is, the first Trace forward a small section of the log at a time (such as 16K), if not, increase the traceability interval (such as 32K), and the maximum is the size of a transaction log file. The reason for this method is that for transactional databases, the usual business does not last too long. Therefore, the location of START change and the location of COMMIT change are usually relatively close, which can save a lot of resources and improve efficiency.
[0094] It should be understood that in the process of executing transaction compensation, the main process still performs log analysis normally, but in order to ensure the order of transaction submission, so when outputting the transaction backward, the transaction after the transaction needs to wait for the compensation transaction to complete before normal Sync backwards.
[0095] When performing backward synchronization (that is, putting it into the cache queue), it is also necessary to split the data table included in the transaction twice, and group the data tables into the data cache according to the order of transaction submission. First traverse all the change records in the transaction, and then use the combination of owner OWNER and data table TABLE as the unique identifier of a data cache. After standardized processing, incremental data with the same identifier is converted into standard data objects and placed in the cache queue.
[0096] The cache queue can be a persistent data queue, following the first-in-first-out principle. It should be understood that a persistent data queue refers to a capacity that can be set to a certain size. When the amount of data stored in the queue exceeds this capacity, you can Write the data content to the file. The method of writing standard data objects into the file can adopt various mainstream serialization methods, such as the serialization method of Java programs, and so on.
[0097] When exporting backup data from the source database, after each data extraction thread ends, an end signal needs to be sent backward. The signal needs to include the data table owner (OWNER) and data table name (TABLE) information, then the When data is imported into the target database, it can be that after receiving the end signal of a certain data table, a new thread is started to extract the corresponding incremental data converted into standard data objects from the data cache, and then the data table and the corresponding The incremental data is synchronized to the target database.
[0098] Optionally, it is also possible to monitor whether the main process of data synchronization is interrupted during the data synchronization process, and then automatically restart the main process when the main process is interrupted.
[0099] Since the data synchronization process takes a long time, and network fluctuations may occur in the middle, the synchronization process may be interrupted. Therefore, during the entire synchronization process, you can monitor whether the main process of synchronization is interrupted, and if so, you can automatically issue a restart command. Try to restart the main process to resume the synchronization process. In this way, the abnormal interruption can be recovered by itself without manual intervention.
[0100] See figure 2 Based on the same inventive concept, an embodiment of the present disclosure provides a data synchronization device 200, which may include:
[0101] The data extraction module 201 is used to export the data to be backed up from the source database;
[0102] The log analysis module 202 is configured to monitor the transaction log of the source database while the data extraction module 201 exports the data to be backed up from the source database to obtain incremental data during the backup process;
[0103] The cache module 203 is configured to store the incremental data in a cache queue;
[0104] The data collection module 204 is configured to import the exported data to be backed up and the incremental data in the cache queue into the target database.
[0105] Optionally, the data extraction module 201 is further configured to convert the data to be backed up into standard data objects before importing the exported data to be backed up and the incremental data in the cache queue into the target database;
[0106] The log analysis module 202 is further configured to convert the incremental data into the standard data object before importing the exported data to be backed up and the incremental data in the cache queue into the target database;
[0107] Wherein, the standard data object includes at least owner information, data table information, metadata information, specific data, change types, and time stamps; the change types of the data to be backed up are all insertion types.
[0108] Optional, such as image 3 As shown, the device 200 further includes:
[0109] The data loading module 205 is configured to determine the database type of the target database before importing the exported data to be backed up and the incremental data in the cache queue into the target database;
[0110] The data to be backed up and the incremental data converted into the standard data object are loaded into a structured query language SQL statement conforming to the database type, and written into the target database.
[0111] Optionally, the log analysis module 202 is further configured to:
[0112] When each change data of the transaction log is monitored, determine whether the data storage space of the corresponding transaction has been allocated according to the transaction information of the change record;
[0113] If the data storage space for the corresponding transaction is not allocated, a new data storage space is allocated and the changed data is stored;
[0114] If the data storage space of the corresponding transaction has been allocated, store the changed data in the data storage space of the corresponding transaction;
[0115] Wherein, the change data stored in the same data storage space belongs to the same transaction, and the change data stored in each data storage space is the incremental data.
[0116] Optionally, the log analysis module 202 is further configured to:
[0117] For the change data whose type is transaction rollback, clear the data storage space corresponding to the transaction information of the change data;
[0118] Change data submitted for transaction type:
[0119] Determine whether the data storage space corresponding to the transaction information of the change data stores the change data of the transaction start type;
[0120] If the change data whose type is the transaction start is not stored, the other change data of the transaction corresponding to the change data is traced forward from the transaction log of the source database until the change data whose transaction type is the transaction start is obtained;
[0121] Store other retrospective change data in the data storage space where the change data is located;
[0122] The cache module is used for:
[0123] Put each data table included in the incremental data into the cache queue in the order of transaction submission.
[0124] Optional, such as Figure 4 As shown, the device 200 further includes an error recovery module 206 for:
[0125] In the process of data synchronization, monitor whether the main process of data synchronization is interrupted;
[0126] When the main process is interrupted, the main process is automatically restarted.
[0127] Regarding the device in the foregoing embodiment, the specific manner in which each module performs operation has been described in detail in the embodiment of the method, and detailed description will not be given here. It should be understood that the present disclosure is described by integrating the above-mentioned modules in a computer. In actual applications, the modules may be distributed in different ways. For example, each module is distributed in a computer. This is not limited.
[0128] Figure 5 It is a block diagram showing an electronic device 500 according to an exemplary embodiment. Such as Figure 5 As shown, the electronic device 500 may include: a processor 501 and a memory 502. The electronic device 500 may further include one or more of a multimedia component 503, an input/output (I/O) interface 504, and a communication component 505.
[0129] The processor 501 is used to control the overall operation of the electronic device 500 to complete all or part of the steps in the above-mentioned data synchronization method. The memory 502 is used to store various types of data to support operations on the electronic device 500. These data may include, for example, instructions for any application or method to operate on the electronic device 500, and application-related data. For example, contact data, messages sent and received, pictures, audio, video, etc. The memory 502 can be implemented by any type of volatile or non-volatile storage device or a combination thereof, such as static random access memory (Static Random Access Memory, SRAM for short), electrically erasable programmable read-only memory ( Electrically Erasable Programmable Read-Only Memory (EEPROM), Erasable Programmable Read-Only Memory (EPROM), Programmable Read-Only Memory (PROM), Read only Memory (Read-Only Memory, ROM for short), magnetic memory, flash memory, magnetic disk or optical disk. The multimedia component 503 may include a screen and an audio component. The screen may be a touch screen, for example, and the audio component is used to output and/or input audio signals. For example, the audio component may include a microphone, which is used to receive external audio signals. The received audio signal may be further stored in the memory 502 or transmitted through the communication component 505. The audio component also includes at least one speaker for outputting audio signals. The I/O interface 504 provides an interface between the processor 501 and other interface modules. The above-mentioned other interface modules may be keyboards, mice, buttons, and so on. These buttons can be virtual buttons or physical buttons. The communication component 505 is used for wired or wireless communication between the electronic device 500 and other devices. Wireless communication, such as Wi-Fi, Bluetooth, Near Field Communication (NFC), 2G, 3G, or 4G, or a combination of one or more of them, so the corresponding communication component 505 may include: Wi -Fi module, Bluetooth module, NFC module.
[0130] In an exemplary embodiment, the electronic device 500 may be used by one or more Application Specific Integrated Circuits (Application Specific Integrated Circuit, ASIC for short), Digital Signal Processor (DSP), Digital Signal Processing Equipment (Digital Signal Processing Device, referred to as DSPD), programmable logic device (Programmable Logic Device, referred to as PLD), field programmable gate array (Field Programmable Gate Array, referred to as FPGA), controller, microcontroller, microprocessor or other electronic components , Used to perform the above data synchronization method.
[0131] In another exemplary embodiment, there is also provided a computer-readable storage medium including program instructions, which, when executed by a processor, implement the steps of the aforementioned data synchronization method. For example, the computer-readable storage medium may be the aforementioned memory 502 including program instructions, which can be executed by the processor 501 of the electronic device 500 to complete the aforementioned data synchronization method.
[0132] The preferred embodiments of the present disclosure are described in detail above with reference to the accompanying drawings. However, the present disclosure is not limited to the specific details in the above-mentioned embodiments. Within the scope of the technical concept of the present disclosure, various simple modifications can be made to the technical solutions of the present disclosure. These simple modifications all belong to the protection scope of the present disclosure.
[0133] In addition, it should be noted that the various specific technical features described in the above-mentioned specific embodiments can be combined in any suitable manner without contradiction. In order to avoid unnecessary repetition, the present disclosure provides various possible The combination method will not be explained separately.
[0134] In addition, various different embodiments of the present disclosure can also be combined arbitrarily, as long as they do not violate the idea of ​​the present disclosure, they should also be regarded as the content disclosed in the present disclosure.
the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

no PUM

Description & Claims & Application Information

We can also present the details of the Description, Claims and Application information to help users get a comprehensive understanding of the technical details of the patent, such as background art, summary of invention, brief description of drawings, description of embodiments, and other original content. On the other hand, users can also determine the specific scope of protection of the technology through the list of claims; as well as understand the changes in the life cycle of the technology with the presentation of the patent timeline. Login to view more.
the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Similar technology patents

Classification and recommendation of technical efficacy words

Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products