MapReduce-based CDC (Change Data Capture) method of MYSQL database

A technology of changing data capture and database, applied in the field of data capture, which can solve the problems of delay in synchronization operation and complex algorithm implementation.

Inactive Publication Date: 2013-12-11
JINAN UNIVERSITY
View PDF1 Cites 9 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The CDC method based on snapshot difference can store snapshot files on a system other than the production system, so it does not have any impact on the execut

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • MapReduce-based CDC (Change Data Capture) method of MYSQL database
  • MapReduce-based CDC (Change Data Capture) method of MYSQL database
  • MapReduce-based CDC (Change Data Capture) method of MYSQL database

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0091] Aiming at the I / O cost problem of database files, the present invention realizes a SQL query with summary through the open source code of MYSQL, thereby reduces the I / O cost of generating the snapshot file containing summary; In order to improve the speed and correctness of snapshot difference and large data volume problems, such as figure 1 As shown, the present invention proposes a snapshot difference algorithm based on Hadoop MapReduce parallel framework.

[0092] 1. SQL query select into outfile with summary

[0093] The process of finding the difference is to compare the corresponding lines in the two snapshot files. One method is to compare the attribute values ​​​​in the corresponding lines one by one. If the number of attributes is large, the cost of this comparison will be very high; The attribute value generates a summary, and directly compares the summary of the corresponding row to obtain a difference result. The existing method needs to read and write the f...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a MapReduce-based CDC (Change Data Capture) method of an MYSQL database. The MapReduce-based CDC method comprises the steps of (1) generating a query statement 'select into outfile' of an abstract, and setting a zone bit according to a FIELDS clause; inserting an 'attribute value separator' into a line of tuples obtained by searching the database by the 'select into outfile'; generating abstract md5value and generating an output format for a searching result of 'select into outfile' according to a zone bit value; writing the searching result into a disk file outfile; (2) calculating difference by adopting a Hadoop MapReduce parallel framework; reading in two snapshoot files of old.txt and new.txt from a map end, storing a value of same keys in a Key/value structure in an iterator by a shuffle function of MapReduce, and synthesizing an output file of reduce into an insert file and a delete file, i.e obtaining a CDC result. According to the MapReduce-based CDC method disclosed by the invention, both grammar and implementation of the query statement in MYSQL is improved, a snapshoot file with the abstract can be generated by searching a data file of the database in one step, one I/O (Input/Output) is reduced by the generation of one snapshoot file, and a large amount of I/O can be reduced by multiple continuous snapshoot difference processes.

Description

technical field [0001] The invention relates to the technical field of data capture, in particular to a method for capturing changed data of a MapReduce-based MYSQL database. Background technique [0002] Change data capture (change data capture, CDC) is one of the main problems to be solved in the ETL (Extract Transform Load) process. CDC is used to capture the data of data update operations (such as insert, delete, and update) in the production database, and provide incremental data extraction for data synchronization update of enterprise application databases such as OLAP databases, report databases, data warehouses, and business intelligence databases. Serve. [0003] Existing change data capture CDC methods can be summarized into five categories: [0004] (1) CDC method based on time attribute [0005] If the database table is an append-only table (a table that only allows insertion and does not allow deletion and update operations is called an append-only table), th...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30
Inventor 邹先霞李鹏杜威
Owner JINAN UNIVERSITY
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products