Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Data redundancy and file operation methods based on Hadoop distributed file system (HDFS)

A distributed file and data redundancy technology, applied in the field of file system storage and management, can solve the problems of file reading and file writing efficiency and performance loss, waste of system resources, etc. The effect of reducing the failure rate and shortening the time spent

Active Publication Date: 2013-11-20
XIDIAN UNIV
View PDF2 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The data redundancy mechanism of HDFS adopts a replication mechanism. By default, the system stores three copies in a distributed manner, which are distributed and stored in the system Datanode nodes. Although this can ensure the integrity of data recovery, it causes a huge waste of system resources.
HDFS file operations include file writing, file reading, file appending, and file deletion. Due to the data redundancy scheme of three copies, the efficiency and performance of file reading and file writing will be lost.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Data redundancy and file operation methods based on Hadoop distributed file system (HDFS)
  • Data redundancy and file operation methods based on Hadoop distributed file system (HDFS)
  • Data redundancy and file operation methods based on Hadoop distributed file system (HDFS)

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0049] The invention will be described in further detail below in conjunction with the accompanying drawings.

[0050] refer to figure 2 , the specific steps of data redundancy in the present invention are as follows:

[0051] Step 1: Segment the file. The size of the file segment is based on the number of basic blocks. By default, each file segment consists of 16 basic blocks, and the size of each basic block is 4MB. Therefore, the data length of each file segment It is 64MB, if the end of the file segment data less than 64MB is filled with 0. For the case of non-fixed-length file segments, such as image 3 , the file length is 74MB, and it can only be divided into two file segments. File segment 1 is divided into 16 basic blocks of 4MB, and file segment 2 is divided into three basic blocks of 4MB. The files in the first two basic blocks come from the file, and the latter Only 2MB of data in a basic block comes from the original file, and the following data is filled with...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses data redundancy and file operation methods based on a Hadoop distributed file system (HDFS). The file operation method comprises the processes of file writing, file reading, file adding and file deletion. The file writing process comprises the following steps of: giving a file writing request by a client; allocating main nodes; segmenting a file; generating and storing basic blocks; generating and storing coding blocks; and reporting the main nodes. The file reading process comprises the following steps of: giving a file reading request by the client; selecting the main nodes; sending information data; according to the information data, restoring an original file; and reporting the main nodes. The file adding process comprises the following steps of: giving a file adding request by the client; querying file information; allocating the main nodes; segmenting an added file; generating and storing the basic blocks; generating and storing the coding blocks; and reporting the added file. The file deletion process comprises the following steps of: giving a file deletion request by the client; processing the file name of the file to be deleted; deleting a hidden file; deleting isolated block metadata; and deleting any block. The invention has the advantages of high read-write performance and efficiency and reliability in storage, and can be applied to storage and management of the HDFS under the access of a great number of clients; and storage resources are saved.

Description

technical field [0001] The invention belongs to the technical field of digital information storage, in particular to a data redundancy and file operation method based on a Hadoop Distributed File System (HDFS), which can be used for storage and management of the file system. Background technique [0002] With the development of information technology, storage data is growing explosively, and local storage is difficult to meet the growing demand for mass storage. In addition, personal mobile computing and enterprise-level large-scale computing put forward higher requirements for the underlying storage system. , more and more people use distributed file system because it can bring people higher storage capacity, reliability, security and mobility. [0003] At present, the distributed file system includes the NFS distributed file system developed by Sun, which is the first distributed file system based on IP protocol, the Farsite system of Microsoft Research, the TotalRecall sy...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F17/30
Inventor 樊凯李晖吴昊张大洋
Owner XIDIAN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products