Hadoop-based storage optimizing method for small file hierachical indexing

A storage optimization and hierarchical indexing technology, applied in the field of cloud storage, can solve the problems of mass small file storage, low reading and writing operation efficiency, etc., and achieve the effect of improving file access efficiency, ensuring versatility, and reducing the difficulty of use.

Inactive Publication Date: 2015-12-23
HUAZHONG UNIV OF SCI & TECH
View PDF2 Cites 41 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0009] The main purpose of the present invention is to solve the problem that the existing Hadoop distributed file system has low efficiency for massiv

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Hadoop-based storage optimizing method for small file hierachical indexing
  • Hadoop-based storage optimizing method for small file hierachical indexing
  • Hadoop-based storage optimizing method for small file hierachical indexing

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0043] In order to make the object, technical solution and advantages of the present invention clearer, the present invention will be further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the present invention, not to limit the present invention. In addition, the technical features involved in the various embodiments of the present invention described below can be combined with each other as long as they do not constitute a conflict with each other.

[0044] Such as figure 1 Shown, be the storage optimization method of a kind of Hadoop-based small file classification index of the present invention, outside Hadoop distributed file system HDFS, add a new network server WebServer for file read and write request, add a new one with A small file processing server for processing small files, the steps are as follows:

[0045] (1) The web server W...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The present invention discloses a Hadoop-based storage optimizing method for small file hierachical indexing. By using the method, a large amount of small files can be uploaded onto an HDFS file system, and the small file indexing efficiency stored on the HDFS file system can be raised. The method mainly comprises a first step that a WebServer monitors file read-write requests and judges sizes of files are smaller than that of blocks set in the HDFS file system; a second step that files with the sizes larger than the size of the blocks are read and written according to normal HDFS files, files with the sizes smaller than the size of the blocks are categorized as small files which are sent to a small file processing server for processing; a third step that the small file processing server is internally provided with a judgement module, and 1-1023KB small files are set as K-level small files while 1M-64M small files are set as M-level small files; and a fourth step of the small file processing server is also provided with a buffer memory, a part of small files can be preset during file separation according to file correlation, and cache updating strategies are designed to raise the file accessing efficiency.

Description

technical field [0001] The invention belongs to the technical field of cloud storage, and more specifically relates to a Hadoop-based storage optimization method for hierarchical indexing of small files. Background technique [0002] Hadoop is a distributed system infrastructure developed by the Apache Foundation. Hadoop is the current mainstream cloud storage platform. It consists of a NameNode and multiple DataNodes. The NameNode is responsible for managing the file system namespace and controlling the access of external clients, and the DataNode is responsible for the storage of specific data. Users can develop distributed programs without knowing the underlying details of the distribution. Make full use of the power of the cluster for high-speed computing and storage. Hadoop implements a distributed file system, HDFS for short. HDFS is highly fault-tolerant and designed to be deployed on inexpensive hardware. And it provides a high transfer rate to access application...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30
CPCG06F16/2272G06F16/27
Inventor 戴彬王雄张焜
Owner HUAZHONG UNIV OF SCI & TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products