Cross-cluster data migration method and system

A cross-cluster and cluster technology, applied in the database field, can solve problems such as low migration efficiency, difficult control of migration time, and inability to guarantee data integrity, so as to improve migration efficiency and reduce data transfer volume

Active Publication Date: 2014-12-24
BEIJING JINGDONG SHANGKE INFORMATION TECH CO LTD +1
View PDF6 Cites 41 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] The disadvantages of the above data migration technology are: the integrity of the migrated data cannot be guaranteed; the migration time is strictly dependent on the size of the migrated data, making it difficult to control the migration time. It is difficult to ensure that the migration work is completed within a short migration window, that is, the migration efficiency is low

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Cross-cluster data migration method and system
  • Cross-cluster data migration method and system
  • Cross-cluster data migration method and system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0036] see figure 1 , is a flow chart of a cross-cluster data migration method provided in Embodiment 1 of the present invention. The method of the embodiment of the present invention is applicable to a cross-cluster data migration system, the system includes a source cluster and a target cluster, the source cluster includes a master control node and at least one child node, and the target cluster includes a master control node and at least one child node . Among them, the master control node and at least one child node of the source cluster form HDFS, and the data tables to be migrated are stored in the source cluster; the master control node and at least one child node of the target cluster can also form HDFS, which is used to migrate and store data tables in the source cluster data sheet.

[0037] The method includes:

[0038] Step 110, the master control node of the source cluster invokes a stop command to control each child node of the source cluster to stop data opera...

Embodiment 2

[0065] In this embodiment, on the basis of the foregoing embodiments, before the master control node of the source cluster counts the size of the first storage space of the HDFS occupied by the data tables in the distributed database of the source cluster and the first total number of file blocks, it also includes :

[0066] The master control node of the source cluster uses the complete file merging component of the distributed database of the source cluster to clear the data tables in the disk storage space of the distributed database of the source cluster that meet the preset clearing strategy.

[0067] In this step, after the data table to be migrated is compressed, the invalid data table in the disk storage space of the distributed database of the source cluster is cleared, so as to further reduce the amount of data to be migrated and improve migration efficiency.

[0068] Wherein, the preset clearing strategy can be implemented in multiple ways, for example including at ...

Embodiment 3

[0082] see figure 2 , is a flow chart of a cross-cluster data migration method provided by Embodiment 3 of the present invention. On the basis of the foregoing embodiments, this embodiment provides an optimal solution for the master control node of the target cluster to start the target cluster based on the start policy. This preferred method includes:

[0083] Step 210, the master control node of the target cluster invokes a startup command to start the target cluster;

[0084] Step 220, if there is no error log information or warning log information in the log file associated with the distributed database of the target cluster, the master control node of the target cluster calls the overall health check component in the distributed database of the target cluster to check the target cluster overall health of

[0085]This step is specifically to check the log files associated with the distributed database in the nodes contained in the target cluster. If there is error log ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

An embodiment of the invention provides a cross-cluster migration method and system. According to the cross-cluster migration method and system, persistence of data inside a distributed database of a source cluster before migration can be achieved due to data operation interruption through all child nodes of the source cluster and persistence of memory data of the distributed database of the source cluster; the data transmission amount can be reduced due to compression of data tables in the distributed database of the source cluster, the compressed data tables in the distributed database of the source cluster are migrated to a target cluster, and the migration efficiency is improved; then occupied storage space and total file blocks of the data tables in the distributed database of the source cluster before migration are matched with occupied space and total file blocks of the data tables of the target cluster, after migration and accordingly the migration integrity can be verified according to a matching result.

Description

technical field [0001] The embodiments of the present invention relate to the technical field of databases, and in particular to a cross-cluster data migration method and system. Background technique [0002] With the development of Internet applications, the number of users is increasing rapidly, and the number of data storage is increasing exponentially. The traditional single-database storage technology cannot meet the access requirements of massive data. HDFS (Hadoop Distributed File System, distributed file system) and distributed database Born to apply. [0003] HBase (Hadoop Database, distributed database) is a scalable, column-oriented distributed database that uses HDFS as a file storage system to store data in the form of data tables, and can support billions of data on the basis of ordinary hardware environments. Large data tables with row-level rows and millions of columns, and support random storage and read operations on data of this scale. Because of its hig...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
CPCG06F16/24532G06F16/252G06F16/27
Inventor 黄刚何洋
Owner BEIJING JINGDONG SHANGKE INFORMATION TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products