Mass non-independent small file associated storage method based on Hadoop

What is AI technical title?
AI technical title is built by Patsnap AI team. It summarizes the technical point description of the patent document.
A technology of associative storage and small files, applied in special data processing applications, instruments, electrical digital data processing, etc., can solve the problems of large-scale non-independent small file storage and low reading efficiency, so as to reduce load and improve storage efficiency , reducing the effect of interaction

Inactive Publication Date: 2012-01-25

XI AN JIAOTONG UNIV

View PDF6 Cites 65 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0012] The purpose of the present invention is to solve the problem that the existing Hadoop distributed file system stores and reads low efficiency of large-scale non-independent small files, and provides a storage optimization method on the Hadoop distributed file system

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0051] The present invention will be further described in detail below in conjunction with the accompanying drawings and embodiments.

[0052] Based on Hadoop's massive non-independent small file associative storage method, some large files are first divided into many small files for storage and reading. These small files are part of the large file, called non-independent small files, which belong to a certain All non-independent small files of a large file are merged into one file, called a merged file; then a local index is established for each merged file, and the local index file is stored together with the file entity on the DataNode of the Hadoop file system when uploading ; Then, when reading non-independent small files, use metadata cache, local index file prefetching and associated file prefetching to improve file reading efficiency.

[0053] DataNode side local index management technology is to create a local index file for each merged file, record the starting posit...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a mass non-independent small file associated storage method based on Hadoop. The method is mainly used for solving the problem of low mass non-independent small file access and reading efficiency and aims at a plurality of small files, namely non-independent small files obtained by cutting a big file. The method is characterized by comprising the following steps: (1) merging all the small files of the big file into one file which is named merged file; (2) establishing a local index for each merged file, and storing a local index file and a file entity onto a Data Node of a Hadoop system while updating; and (3) when the non-independent small files are read, improving the file reading efficiency by adopting metadata cache, local index file pre-fetching and associated file pre-fetching. By utilizing the method provided by the invention, the efficiency of the existing Hadoop system for storing and reading small files is improved. The method is suitable for the storage and management of the mass non-independent small files in universal scenes.

Description

technical field [0001] The invention relates to a method for optimizing storage and reading of massive non-independent small files on Hadoop (distributed file system). Hadoop is the current mainstream cloud storage platform, which consists of a NameNode and multiple DataNodes, where the NameNode is responsible for managing the file system name space and controlling the access of external clients, and the DataNode is responsible for storing data. The problem of low file storage and reading efficiency. Background technique [0002] With the development of the Internet, the amount of data that needs to be stored is increasing; and the file size varies widely, from small files of several kilobytes to large files of hundreds of megabytes. The Hadoop distributed file system is suitable for storing large files, but its storage performance and read performance are severely degraded when storing small files. Therefore, how to effectively store and manage a large number of small fil...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G06F17/30

Inventor郑庆华董博刘均马瑞宋凯磊

OwnerXI AN JIAOTONG UNIV

Mass non-independent small file associated storage method based on Hadoop

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology