Method and apparatus for realizing real-time increment synchronization of data

A technology of incremental synchronization and data, applied in the field of big data, can solve the problems of time-consuming and aggravating the burden of Hadoop distributed system, so as to reduce the burden and enhance the user experience

Active Publication Date: 2016-01-13
北京明智和术科技有限公司
View PDF4 Cites 73 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

This not only increases the burden on the Hadoop distributed system, but also takes a lot of time
At present, there is no method to realize the real-time in

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and apparatus for realizing real-time increment synchronization of data
  • Method and apparatus for realizing real-time increment synchronization of data

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0085] In this implementation, the relational database Mysql is taken as an example to explain in detail how to implement the real-time incremental data update from the Mysql database to the HBase database in Hadoop.

[0086] First, configure the target Mysql database settings, turn on the Mysql binary log writing function, and set it to row mode; in the table building module, configure the Mysql database information to be synchronized. After the configuration is completed, run the table building module, in Hive and The HBase creates the associated table corresponding to the relational database, and at the same time generates the mapping relationship file between the data table in the relational database and the data table in HBase for the data update module; suppose there is a table info in the target database, and its table structure is as follows :

[0087] Field Name

Field Type

Description

id

bigint

Self-incrementing primary key

name

varchar(10)

age

int

[...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a method and an apparatus for realizing real-time increment synchronization of data. The method for realizing the real-time increment synchronization of the data comprises: according to table structure information of a relational database, generating a mapping relation file corresponding to the relational database in a distributed column-oriented open source database HBase; obtaining an operation log of the relational database in real time; according to the obtained operation log, obtaining change data of the relational database, and according to the established mapping relation file, updating the obtained change data into the HBase of Hadoop. The real-time increment update synchronization of the data from the relational database to Hadoop is realized, so that the burden of a Hadoop platform is effectively reduced and the user experience is improved.

Description

Technical field [0001] The invention relates to the technical field of big data, in particular to a method and device for realizing real-time incremental data synchronization. Background technique [0002] The rapid development of the Internet has produced a large amount of data with a sharp increase in volume. The emergence of massive data and changes in data structure have brought huge challenges to various industries in management, analysis and processing. Traditional processing methods based on relational database data have been unable to effectively store and analyze the growing variety of business data. For this reason, many industries have begun to use distributed system infrastructure (Hadoop) to analyze and process data. The current mainstream method of synchronizing relational database data to the Hadoop platform mainly uses Sqoop to realize the one-time full import of data. Sqoop is an efficient data transfer tool that exists between relational databases and distribu...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30
Inventor 杨威白军伟王啸风冯是聪吴明辉
Owner 北京明智和术科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products