Mass non-independent small file associated storage method based on Hadoop

A technology of associative storage and small files, applied in special data processing applications, instruments, electrical digital data processing, etc., can solve the problems of large-scale non-independent small file storage and low reading efficiency, so as to reduce load and improve storage efficiency , reducing the effect of interaction

Inactive Publication Date: 2012-01-25
XI AN JIAOTONG UNIV
View PDF6 Cites 65 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0012] The purpose of the present invention is to solve the problem that the existing Hadoop distributed file system stores and reads lo

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Mass non-independent small file associated storage method based on Hadoop
  • Mass non-independent small file associated storage method based on Hadoop
  • Mass non-independent small file associated storage method based on Hadoop

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0051] The present invention will be further described in detail below in conjunction with the accompanying drawings and embodiments.

[0052] Based on Hadoop's massive non-independent small file associative storage method, some large files are first divided into many small files for storage and reading. These small files are part of the large file, called non-independent small files, which belong to a certain All non-independent small files of a large file are merged into one file, called a merged file; then a local index is established for each merged file, and the local index file is stored together with the file entity on the DataNode of the Hadoop file system when uploading ; Then, when reading non-independent small files, use metadata cache, local index file prefetching and associated file prefetching to improve file reading efficiency.

[0053] DataNode side local index management technology is to create a local index file for each merged file, record the starting posit...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a mass non-independent small file associated storage method based on Hadoop. The method is mainly used for solving the problem of low mass non-independent small file access and reading efficiency and aims at a plurality of small files, namely non-independent small files obtained by cutting a big file. The method is characterized by comprising the following steps: (1) merging all the small files of the big file into one file which is named merged file; (2) establishing a local index for each merged file, and storing a local index file and a file entity onto a Data Node of a Hadoop system while updating; and (3) when the non-independent small files are read, improving the file reading efficiency by adopting metadata cache, local index file pre-fetching and associated file pre-fetching. By utilizing the method provided by the invention, the efficiency of the existing Hadoop system for storing and reading small files is improved. The method is suitable for the storage and management of the mass non-independent small files in universal scenes.

Description

technical field [0001] The invention relates to a method for optimizing storage and reading of massive non-independent small files on Hadoop (distributed file system). Hadoop is the current mainstream cloud storage platform, which consists of a NameNode and multiple DataNodes, where the NameNode is responsible for managing the file system name space and controlling the access of external clients, and the DataNode is responsible for storing data. The problem of low file storage and reading efficiency. Background technique [0002] With the development of the Internet, the amount of data that needs to be stored is increasing; and the file size varies widely, from small files of several kilobytes to large files of hundreds of megabytes. The Hadoop distributed file system is suitable for storing large files, but its storage performance and read performance are severely degraded when storing small files. Therefore, how to effectively store and manage a large number of small fil...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30
Inventor 郑庆华董博刘均马瑞宋凯磊
Owner XI AN JIAOTONG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products