Hadoop-based mass classifiable small file association storage method

A technology of associative storage and small files, which is applied in special data processing applications, instruments, electrical digital data processing, etc., can solve the problems of large-scale categorical small file storage and low reading efficiency, so as to improve memory utilization, The effect of increasing the maximum number of files

Inactive Publication Date: 2012-01-25
XI AN JIAOTONG UNIV
View PDF3 Cites 69 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0012] The purpose of the present invention is to solve the problem that the existing Hadoop distributed file system stores and reads low efficiency of large

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Hadoop-based mass classifiable small file association storage method
  • Hadoop-based mass classifiable small file association storage method
  • Hadoop-based mass classifiable small file association storage method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0049] A Hadoop-based associative storage method for a large number of classifiable small files, including NameNode-side global index management technology for managing aggregated small files and file aggregation technology for improving storage efficiency of classifiable small files. We can call the small files belonging to a certain category as classifiable small files. After the small files belonging to a certain category are aggregated into a file, it is called a logical unit; for each small file, a NameNode stored in the Hadoop file system is established. Global index in memory.

[0050] The global index management technology on the NameNode side includes: the global index file is loaded in the memory of the NameNode, which expands the metadata structure of the NameNode, including small file index collections and fragmentation index collections;

[0051] (1) The small file index set adopts a binary sorting tree structure to locate small files. The index items include file...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a Hadoop-based mass classifiable small file association storage method, which mainly solves the problem of low access efficiency of classifiable small files. The method comprises a NameNode global index management technology and a file clustering technology. Independent small files belonging to a certain class are subjected to file clustering and global index management, so that the utilization rate of an internal memory is greatly improved and the maximum file number supported by unit internal memory is improved. The method comprises the following steps of: (1) clustering small files belonging to a certain class into a file called a logic unit; and (2) establishing a global index for each small file stored in a NameNode internal memory. The file clustering technology is used for improving the storage efficiency of the classifiable small files, and the NameNode global index management technology is used for managing the clustered small files. By the technologies, the storage efficiency of the mass classifiable small files is improved. The invention is suitable for storing and managing the classifiable small files under general scenes.

Description

technical field [0001] The present invention relates to the storage and reading optimization method of massive classifiable small files on Hadoop (distributed file system). Hadoop is the current mainstream cloud storage platform, and it consists of a NameNode and a plurality of DataNodes, wherein NameNode is responsible for managing files The system name space and control the access of external clients, and the DataNode is responsible for storing data, which mainly solves the existing problems of large-scale classifiable small file storage and low reading efficiency. Background technique [0002] With the development of the Internet, the amount of data that needs to be stored is increasing; and the file size varies widely, from small files of several kilobytes to large files of hundreds of megabytes. The Hadoop distributed file system is suitable for storing large files, but its storage performance and read performance are severely degraded when storing small files. Therefo...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30
Inventor 郑庆华董博刘均马瑞宋凯磊
Owner XI AN JIAOTONG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products