Distributed high-fault-tolerance data real-time synchronization method from MongoDB to HBase

A real-time synchronization and distributed technology, applied in the database field, can solve problems such as lack of a more general method, and achieve the effect of avoiding single point of failure, improving robustness and high synchronization efficiency

Active Publication Date: 2019-09-27
SHANGHAI DATATOM INFORMATION TECH CO LTD
View PDF3 Cites 5 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Although there are many synchronization methods between various databases, there is still no general method for data synchronization between MongoDB and HBase

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Distributed high-fault-tolerance data real-time synchronization method from MongoDB to HBase
  • Distributed high-fault-tolerance data real-time synchronization method from MongoDB to HBase
  • Distributed high-fault-tolerance data real-time synchronization method from MongoDB to HBase

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0032] The present invention will be further described below in conjunction with the drawings.

[0033] See figure 1 , The distributed high-fault-tolerant data real-time synchronization method from MongoDB to HBase of the present invention includes the following steps:

[0034] Step S1, enable MongoDB's oplog operation logging function (replicated and sharded deployment methods are enabled by default, single-node deployment requires manual configuration to enable), and ensure that the two databases of MongoDB and HBase are in the initial state with the same data.

[0035] Oplog is a collection used to implement data backup in MongoDB. The collections in MongoDB are all stored in json format. The main function of oplog is to store write operations in MongoDB, which are divided into additions, deletions, modifications, table creation, database statements, and system no operations. And other types.

[0036] Step S2: The producer reads the location timestamp as a checkpoint, and reads the...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a distributed high-fault-tolerance data real-time synchronization method from MongoDB to HBase. The method comprises the steps that an oplog operation log recording function of the MongoDB is started; a producer reads out records after the check points of the oplogs in the MongoDB, pushes each oplog to a predetermined theme of Kafka, and writes the oplogs into Redis at the same time; after the oplog enters the Storm, the operation type of the oplog is judged, and for the operation type related to data synchronization, a key suitable for HBase storage is obtained; the value and the data positioning information are obtained, or only the data positioning information is obtained; and the HBase writer receives the transmitted key-value and data positioning information and carries out corresponding processing. According to the method, real-time data synchronization from MongoDB to HBase is efficiently carried out, the accuracy of data synchronization is ensured, and the complex data real-time synchronization requirement of a service system can be met.

Description

Technical field [0001] The invention relates to the technical field of databases, and in particular to a real-time synchronization method for distributed high fault-tolerant data from MongoDB to HBase. Background technique [0002] MongoDB is a commonly used non-relational database. As a database suitable for agile development, the data model of MongoDB can be flexibly updated with the development of applications. MongoDB can make enterprises more agile and scalable. Enterprises of all sizes can use MongoDB to create new applications, improve work efficiency, and reduce business costs. HBase is a distributed column-oriented database built on the Hadoop (distributed computing) file system, which can provide fast and random access to massive structured data. Although there are many synchronization methods between various databases, there is still no more general method for data synchronization between MongoDB and HBase. Summary of the invention [0003] The purpose of the present ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/25G06F16/27
CPCG06F16/258G06F16/27
Inventor 任旭波谢赟陈大伟
Owner SHANGHAI DATATOM INFORMATION TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products