Data redundancy and file operation methods based on Hadoop distributed file system (HDFS)

What is AI technical title?
AI technical title is built by Patsnap AI team. It summarizes the technical point description of the patent document.
A distributed file and data redundancy technology, applied in the field of file system storage and management, can solve the problems of system resource waste, file reading and file writing efficiency and performance loss, reduce data failure rate and overcome cost Longer, shorter time effect

Active Publication Date: 2012-04-18

XIDIAN UNIV

View PDF2 Cites 29 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

The data redundancy mechanism of HDFS adopts a replication mechanism. By default, the system stores three copies in a distributed manner, which are distributed and stored in the system Datanode nodes. Although this can ensure the integrity of data recovery, it causes a huge waste of system resources.

HDFS file operations include file writing, file reading, file appending, and file deletion. Due to the data redundancy scheme of three copies, the efficiency and performance of file reading and file writing will be lost.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0049] The invention will be described in further detail below in conjunction with the accompanying drawings.

[0050] refer to figure 2 , the specific steps of data redundancy in the present invention are as follows:

[0051] Step 1: Segment the file. The size of the file segment is based on the number of basic blocks. By default, each file segment consists of 16 basic blocks, and the size of each basic block is 4MB. Therefore, the data length of each file segment It is 64MB, if the end of the file segment data less than 64MB is filled with 0. For the case of non-fixed-length file segments, such as image 3 , the file length is 74MB, and it can only be divided into two file segments. File segment 1 is divided into 16 basic blocks of 4MB, and file segment 2 is divided into three basic blocks of 4MB. The files in the first two basic blocks come from the file, and the latter Only 2MB of data in a basic block comes from the original file, and the following data is filled with...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses data redundancy and file operation methods based on a Hadoop distributed file system (HDFS). The file operation method comprises the processes of file writing, file reading, file adding and file deletion. The file writing process comprises the following steps of: giving a file writing request by a client; allocating main nodes; segmenting a file; generating and storing basic blocks; generating and storing coding blocks; and reporting the main nodes. The file reading process comprises the following steps of: giving a file reading request by the client; selecting the main nodes; sending information data; according to the information data, restoring an original file; and reporting the main nodes. The file adding process comprises the following steps of: giving a file adding request by the client; querying file information; allocating the main nodes; segmenting an added file; generating and storing the basic blocks; generating and storing the coding blocks; and reporting the added file. The file deletion process comprises the following steps of: giving a file deletion request by the client; processing the file name of the file to be deleted; deleting a hidden file; deleting isolated block metadata; and deleting any block. The invention has the advantages of high read-write performance and efficiency and reliability in storage, and can be applied to storage and management of the HDFS under the access of a great number of clients; and storage resources are saved.

Description

technical field [0001] The invention belongs to the technical field of digital information storage, in particular to a data redundancy and file operation method based on a Hadoop Distributed File System (HDFS), which can be used for storage and management of the file system. Background technique [0002] With the development of information technology, storage data is growing explosively, and local storage is difficult to meet the growing demand for mass storage. In addition, personal mobile computing and enterprise-level large-scale computing put forward higher requirements for the underlying storage system. , more and more people use distributed file system because it can bring people higher storage capacity, reliability, security and mobility. [0003] At present, the distributed file system includes the NFS distributed file system developed by Sun, which is the first distributed file system based on IP protocol, the Farsite system of Microsoft Research, the TotalRecall sy...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G06F17/30

Inventor樊凯李晖吴昊张大洋

OwnerXIDIAN UNIV

Data redundancy and file operation methods based on Hadoop distributed file system (HDFS)

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology