Data redundancy and file operation methods based on Hadoop distributed file system (HDFS)

A distributed file and data redundancy technology, applied in the field of file system storage and management, can solve the problems of system resource waste, file reading and file writing efficiency and performance loss, reduce data failure rate and overcome cost Longer, shorter time effect

Active Publication Date: 2012-04-18
XIDIAN UNIV
View PDF2 Cites 29 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The data redundancy mechanism of HDFS adopts a replication mechanism. By default, the system stores three copies in a distributed manner, which are distributed and stored in the system Datanode nodes. Although this can ensure the integrity of data recovery, it

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Data redundancy and file operation methods based on Hadoop distributed file system (HDFS)
  • Data redundancy and file operation methods based on Hadoop distributed file system (HDFS)
  • Data redundancy and file operation methods based on Hadoop distributed file system (HDFS)

Examples

Experimental program
Comparison scheme
Effect test

Example Embodiment

[0049] The invention will be further described in detail below in conjunction with the drawings.

[0050] Reference figure 2 The specific steps of data redundancy in the present invention are as follows:

[0051] Step 1: Segment the file. The size of the file segment is based on the number of basic blocks. By default, each file segment consists of 16 basic blocks. The size of each basic block is 4MB. Therefore, the data length of each file segment It is 64MB, if the file segment data of less than 64MB is filled with 0 at the end. For the non-fixed length of the file segment, such as image 3 , The file length is 74MB, it can only be divided into two file segments, file segment 1 is divided into 16 basic blocks of 4MB, file segment 2 is divided into three basic blocks of 4MB, the files in the first two basic blocks come from the file, A basic block has only 2MB of data from the original file, and the following data is filled with 0.

[0052] Step 2: Generate basic blocks for the fi...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses data redundancy and file operation methods based on a Hadoop distributed file system (HDFS). The file operation method comprises the processes of file writing, file reading, file adding and file deletion. The file writing process comprises the following steps of: giving a file writing request by a client; allocating main nodes; segmenting a file; generating and storing basic blocks; generating and storing coding blocks; and reporting the main nodes. The file reading process comprises the following steps of: giving a file reading request by the client; selecting the main nodes; sending information data; according to the information data, restoring an original file; and reporting the main nodes. The file adding process comprises the following steps of: giving a file adding request by the client; querying file information; allocating the main nodes; segmenting an added file; generating and storing the basic blocks; generating and storing the coding blocks; and reporting the added file. The file deletion process comprises the following steps of: giving a file deletion request by the client; processing the file name of the file to be deleted; deleting a hidden file; deleting isolated block metadata; and deleting any block. The invention has the advantages of high read-write performance and efficiency and reliability in storage, and can be applied to storage and management of the HDFS under the access of a great number of clients; and storage resources are saved.

Description

technical field [0001] The invention belongs to the technical field of digital information storage, in particular to a data redundancy and file operation method based on a Hadoop Distributed File System (HDFS), which can be used for storage and management of the file system. Background technique [0002] With the development of information technology, storage data is growing explosively, and local storage is difficult to meet the growing demand for mass storage. In addition, personal mobile computing and enterprise-level large-scale computing put forward higher requirements for the underlying storage system. , more and more people use distributed file system because it can bring people higher storage capacity, reliability, security and mobility. [0003] At present, the distributed file system includes the NFS distributed file system developed by Sun, which is the first distributed file system based on IP protocol, the Farsite system of Microsoft Research, the TotalRecall sy...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30
Inventor 樊凯李晖吴昊张大洋
Owner XIDIAN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products