Big data synchronization method and device based on Binlog, HBase and Hive

A data synchronization and big data technology, applied in the database field, can solve the problems of data synchronization performance consumption, etc., and achieve the effect of small data volume, powerful data query support, and high real-time data

Active Publication Date: 2021-01-29
武汉物易云通网络科技有限公司
View PDF8 Cites 3 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0013] The technical problem to be solved by the present invention is the comprehensive solution performance improvement problem involving data synchronization performance consumption, data synchronization timeliness, data physical deletion synchronization, binlog data concurrent consumption and writing order

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Big data synchronization method and device based on Binlog, HBase and Hive
  • Big data synchronization method and device based on Binlog, HBase and Hive
  • Big data synchronization method and device based on Binlog, HBase and Hive

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0051] Embodiment 1 of the present invention provides a big data synchronization method based on Binlog+HBase+Hive, such as figure 1 shown, including:

[0052] In step 201, monitor the Binlog log file in the relational database to obtain real-time changing data.

[0053] Wherein, the relational database includes: one or more of Oracle, DB2, Microsoft SQL Server, Microsoft Access and MySQL.

[0054] In step 202, after the data synchronization device obtains the Binlog log file data, it parses and obtains the database name db, the table name t, the operation type o, the primary key k, and all field values; when storing the Binlog log file data in HBase, the corresponding database The name is used as the namespace of the HBase table, and the table db:t is created.

[0055] Among them, in the Hbase database, the naming rule for creating a table is "space name": "table name", that is, the above-mentioned db:t.

[0056] In step 203, the primary key k is used as the rowkey of the ...

Embodiment 2

[0071] For the content of the method described in Embodiment 1, in the embodiment of the present invention, another key node is cut in, and the content is presented in combination with specific tools. The corresponding overall process is as follows image 3 shown.

[0072] Part 1: Real-time Binlog acquisition

[0073] As a preference, you can use the canal tool to monitor the MySql database Binlog logs, and then connect the Binlog log data to the message middleware Kafka, and then develop applications for data consumption processing. Or use Ali's logtail tool, and then develop program consumption.

[0074] That is, the method of obtaining real-time changing data by monitoring MySql Binlog log files, compared with the data query and export scheme, does not affect the database performance, and solves the problems of data synchronization performance and timeliness.

[0075] Part II: Real-time Binlog processing

[0076] The data synchronization device is characterized in that i...

Embodiment 3

[0090] like Figure 4 As shown, it is a schematic diagram of the architecture of the big data synchronization device based on Binlog+HBase+Hive according to the embodiment of the present invention. The big data synchronization device based on Binlog+HBase+Hive in this embodiment includes one or more processors 21 and memory 22 . in, Figure 4 A processor 21 is taken as an example.

[0091] Processor 21 and memory 22 can be connected by bus or other means, Figure 4 Take connection via bus as an example.

[0092] Memory 22, as a non-volatile computer-readable storage medium, can be used to store non-volatile software programs and non-volatile computer-executable programs, such as the big data synchronization based on Binlog+HBase+Hive in Embodiment 1 method. The processor 21 executes the big data synchronization method based on Binlog+HBase+Hive by running the non-volatile software programs and instructions stored in the memory 22 .

[0093] The memory 22 may include a hi...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to the technical field of databases, and provides a big data synchronization method and device based on Binlog, HBase and Hive. The method comprises the steps that: a Binlog filein a relational database is monitored, and real-time changing data is obtained; after a data synchronization device acquires the Binlog file data, the data synchronization device analyzes and acquires a database name, a table name, an operation type, a primary key and all field values; when the Binlog file data is stored in the HBase, the database name is correspondingly taken as a namespace of an HBase table, and the table is established; and the primary key is used as a rowkey of HBase data records, so that all changes of the database records can be found out through the rowkey, and data coverage updating can be performed. According to the method and device, no historical total data is reserved in a real-time table, the data size is small, and the real-time data query speed is high.

Description

technical field [0001] The invention relates to the technical field of databases, in particular to a big data synchronization method and device based on Binlog+HBase+Hive. Background technique [0002] With the growth of data, the traditional stand-alone database can no longer meet the storage and analysis needs of hundreds of millions. The current practice in the industry is to use a stand-alone database as real-time hotspot data storage in the production environment, archive historical data, and synchronize the data to the big data warehouse for complex analysis. However, the current solutions have some disadvantages: [0003] 1. Data query and export scheme [0004] Full export plan [0005] Use oozie and other task scheduling tools to regularly use sqoop and other ETL tools to connect to the database through the JDBC protocol, query and export all the table data in the database, and then completely cover the table data corresponding to the big data warehouse. This so...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F16/22G06F16/2458G06F16/27
CPCG06F16/2282G06F16/2471G06F16/27
Inventor 吴凡
Owner 武汉物易云通网络科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products