Data backup method of distributed file system

A distributed file and data backup technology, which is applied in the direction of data error detection, electrical digital data processing, special data processing applications, etc., can solve the problem of heavy network load, bandwidth occupation, and long time-consuming data backup, etc. problems, to achieve the effect of reducing execution time and reducing data transmission

Active Publication Date: 2014-04-30
清能艾科(深圳)能源技术有限公司
View PDF3 Cites 57 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] This copy command is to assign a single Map to each file for copying. It is based on file-level copying. During data backup, the target file in the target file system is deleted and rewritten into the source file, even if a source file already exists in the target file. The contents of some file blocks will also be deleted and rewritten. Therefore, it takes too long to back up data using this method, which will easily lead to serious bandwidth occupation and excessive network load.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Data backup method of distributed file system
  • Data backup method of distributed file system
  • Data backup method of distributed file system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0027] Before describing the technical solutions of the present invention in conjunction with specific implementation methods, a brief introduction is first made to the related concepts of the HDFS file system. The HDFS file system is a master-slave structure, including a metadata node (Name Node, metadata node or name node) and several data nodes (Data Node), allowing users to store data in the form of files, each file is divided into several ordered File blocks or data blocks (usually 64MB in size), stored on a set of data nodes. The metadata node serves as the master server to provide metadata services and client access operations on files, etc., and the data node is used to manage stored data. In addition, the data backup method of the present invention introduces the concept of chunk in order to speed up the file transfer speed during the data backup process. The chunk refers to a basic unit that divides a file block into a number of file blocks (256 by default), called ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a data backup method of a distributed file system. The method includes: setting up a thread pool by a synchronous control node, distributing source files to each thread according to a copy list, and parallelly conducting metadata synchronization of each source file and the corresponding target file; judging content consistency of each file block in the source files and the target files by each thread of the synchronous control node to analyze difference between each distributed source file and the corresponding target file; judging content consistency of each chunk in the source files and the target files by a source data node to analyze difference between the source file blocks and the target file blocks; duplicating data of the source file blocks to the corresponding target file blocks by a target data node according to the difference analyzing results of the source file blocks and the target file blocks. Data transmission among trans-cluster data nodes can be reduced by effectively using existing data of the target files of a target file system, and data backup execution time is shortened since file backup is completed by taking a file block as a unit.

Description

technical field [0001] The present invention relates to a distributed file system, in particular to a technology for data backup between clusters of different distributed file systems or a technology called file synchronization. Background technique [0002] HDFS (Hadoop Distributed File System, Hadoop Distributed File System) is an open source distributed file system developed in Java language, which has high fault tolerance and is suitable for applications with very large data sets. In order to avoid data loss caused by equipment failures, sudden power outages, or natural disasters (such as earthquakes, tsunamis, etc.), it is necessary to back up or migrate data in a file system (source file system) to a geographically distant location And in another file system (target file system) of the relatively safe cluster. HDFS provides a data backup command distcp (Distribute Copy, distributed data replication), which is used for data backup between file systems in different clus...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F11/14G06F17/30
CPCG06F16/178G06F11/1451G06F11/1456G06F11/1464G06F16/182G06F2201/82
Inventor 武永卫陈康郑纬民李贞强
Owner 清能艾科(深圳)能源技术有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products