Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Incremental data synchronization method and device based on map reduce

A technology of incremental data and data loading, applied in the field of data warehouse, can solve the problems of low efficiency of incremental comparison and synchronization, improve the efficiency of data synchronization, avoid large-scale data sorting operations, and improve the efficiency of data comparison and synchronization Effect

Active Publication Date: 2020-11-27
WUHAN DAMENG DATABASE
View PDF7 Cites 4 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] Aiming at the above defects or improvement needs of the prior art, the present invention solves the problem of low incremental comparison and synchronization efficiency due to the need for large-scale data sorting and single-thread comparison in the existing incremental comparison synchronization method. question

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Incremental data synchronization method and device based on map reduce
  • Incremental data synchronization method and device based on map reduce
  • Incremental data synchronization method and device based on map reduce

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0031] When performing incremental synchronization of ETL data, it is necessary to compare the corresponding data in the incremental source and the incremental destination one by one. In order to match the data in the incremental source and the incremental destination, the data needs to be sorted first. In common usage scenarios, the scale of incremental data that needs to be synchronized is very large, and sorting consumes a lot of time and resources. Moreover, the comparison process after sorting is generally single-threaded, and the efficiency is low.

[0032] Hadoop map reduce is a computing model, framework, and platform for parallel processing of big data. It can be used for parallel computing of large-scale data sets (greater than 1TB), and it allows ordinary commercial servers on the market to form a network containing dozens, hundreds to The distribution and parallel computing clusters of thousands of nodes provide a huge but well-designed parallel computing software ...

Embodiment 2

[0068] In the specific usage scenario of Embodiment 1, in order to adapt to the need for collating and synchronizing a large amount of data in a big data scenario, improve the platform versatility of the method, improve the convenience of implementing the solution, and facilitate the use in a distributed system, the present invention implements The example selects map reduce as the specific implementation method of concurrent execution.

[0069] In the specific use scenario of this embodiment, such as Figure 5 The technical details of the Hadoop map reduce framework shown, steps 101 to 105 in Embodiment 1, and corresponding specific implementation steps and optimization methods are executed concurrently by map reducejob. Wherein, the map stage includes steps 101-1 and step 103, and the reduce stage includes steps 104-106.

[0070] The following methods can be used for specific execution, such as Figure 5 , according to the attribute information in the split and the process...

Embodiment 3

[0108] On the basis of the method for incremental data synchronization based on map reduce provided by the above-mentioned embodiment 1 to embodiment 2, the present invention also provides a device for incremental data synchronization based on map reduce that can be used to implement the above method, such as Figure 6 Shown is a schematic diagram of the device architecture of the embodiment of the present invention. The apparatus for incremental data synchronization based on map reduce in this embodiment includes one or more processors 21 and memory 22 . in, Figure 6 A processor 21 is taken as an example.

[0109] Processor 21 and memory 22 can be connected by bus or other means, Figure 6 Take connection via bus as an example.

[0110] The memory 22 is a non-volatile computer-readable storage medium based on a map reduce-based incremental data synchronization method, and can be used to store non-volatile software programs, non-volatile computer-executable progra...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to the field of data warehouses, in particular to an incremental data synchronization method and device based on map reduce. The method mainly comprises the following steps: in aMapper stage, respectively reading incremental source data and incremental destination data according to preset fragments; cleaning and converting each incremental source data and / or each incrementaldestination data; mapping the incremental source data and the incremental destination data into corresponding key / value structures respectively, and writing the key / value structures into context of map reduce; in the Reducer stage, receiving key / values structures generated after each pair of key / values structures is reduced from the context; according to the type of the key / values structure, judging the synchronization operation needing to be carried out; and synchronizing the incremental data according to the synchronization operation needing to be carried out. According to the method, parallel comparison and parallel synchronization are carried out on large-scale data by utilizing a hadoop map reduce framework, so that the incremental data comparison and synchronization efficiency is improved.

Description

【Technical field】 [0001] The invention relates to the field of data warehouses, in particular to a method and device for synchronizing incremental data based on map reduce. 【Background technique】 [0002] During data migration and synchronization, incremental data synchronization is a common data synchronization method. In the process of incremental data synchronization, effectively capturing incremental data from the data source and synchronizing the incremental purpose based on the incremental data is an essential solution in the data processing production environment. [0003] In the current efficient incremental comparison synchronization method, the incremental source and the incremental destination must be arranged in the same order according to the same unique feature identifier, and then the correspondence between the incremental source and the incremental destination must be obtained in the sorted order Relationship, and then obtain difference data. However, in th...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/215G06F16/23G06F16/25G06F16/27
CPCG06F16/2365G06F16/215G06F16/254G06F16/27Y02D10/00
Inventor 高东升付铨梅纲
Owner WUHAN DAMENG DATABASE
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products