Small file optimization storage method based on HDFS

A technology for optimizing storage and small files. It is applied in special data processing applications, instruments, and electronic digital data processing. It can solve the problems of low utilization of storage resources and low file access efficiency, and achieve overall performance improvement, file directory structure optimization, The effect of improving reading efficiency

Active Publication Date: 2014-02-12
HOHAI UNIV
View PDF2 Cites 108 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0009] Purpose of the invention: Aiming at the problem of low utilization of massive small file data storage resources and low file access efficiency in the HDFS distributed file system, the p

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Small file optimization storage method based on HDFS
  • Small file optimization storage method based on HDFS
  • Small file optimization storage method based on HDFS

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0047] Below in conjunction with specific embodiment, further illustrate the present invention, should be understood that these embodiments are only used to illustrate the present invention and are not intended to limit the scope of the present invention, after having read the present invention, those skilled in the art will understand various equivalent forms of the present invention All modifications fall within the scope defined by the appended claims of the present application.

[0048] figure 1 The overall frame diagram based on the HDFS small file optimization storage method provided by the present invention, its working process is described as follows:

[0049] ① When the client needs to store small files to the DataNode node, first, the file filtering module filters the files to determine whether the files transmitted by the client are small files. If the file size is less than 1M, it is judged as a small file, and subsequent operations such as merging are performed o...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a small file optimization storage method based on a HDFS, wherein the efficiency for the HDFS to read small files is improved, and the overall performance of the system is improved. The method includes the steps that first, the small files are combined and undergo storage preprocessing, wherein the storage preprocessing on the small files is achieved through filtering of the filers, combination of the small files, generation of metadata and generation of object IDs; second, after the files are stored into the HDFS in a combined mode, the mapping relations between the small files and combined files in the HDFS are stored into the metadata of the small files in a mode of file metadata, a directory structure of the files is stored in a file name, and the metadata are stored in a mode of distributed clusters on the basis of a Chord protocol; third, the directory structure of the files is optimized, and generated key values of the metadata are decomposed into Directory IDs and Small File IDs. The Directory IDs serve as key values for the metadata to skip into nodes in a metadata cluster, and therefore the files under the same directory are stored into the same node. The Small File IDs are generated in metadata nodes, and therefore each of the metadata corresponds to unique ID identification in the whole system.

Description

technical field [0001] The invention relates to an HDFS-based small file optimized storage method, in particular to a method for realizing optimized storage of distributed and massive small file data, and belongs to the field of distributed data optimized storage. Background technique [0002] With the rapid development of Internet technology, the amount of data is increasing exponentially. Now such an Internet era of information explosion has brought a profound impact on the lives of Internet users. However, for the Internet application field, how to solve the storage of these massive data is a huge challenge. Under these challenges, traditional data storage and processing technologies, such as relational databases, can no longer meet the needs of technological development, and emerging mass data storage technologies continue to emerge. Such as the parallel file system GFS used by Google; the MapReduce programming model for massive data; Amazon's Simple Storage Service (S3...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F3/06G06F17/30
Inventor 毛莺池闵伟戚荣志陈曦王康任道宁
Owner HOHAI UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products