Optimized method for accessing lots of small files for Hadoop

A technology of massive small files and hadoop clusters, applied in special data processing applications, instruments, electrical digital data processing, etc., to achieve the effect of improving access efficiency

Active Publication Date: 2015-04-22
NANJING UNIV OF POSTS & TELECOMM
View PDF4 Cites 54 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] The object of the present invention is to provide a kind of small file merge, index and query method that is applied to Hadoop, mainly solves the access efficiency problem of small file in Hadoop

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Optimized method for accessing lots of small files for Hadoop
  • Optimized method for accessing lots of small files for Hadoop
  • Optimized method for accessing lots of small files for Hadoop

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0034] The present invention will be further described in detail below in conjunction with the accompanying drawings.

[0035] Hadoop small file access three-tier architecture:

[0036] The present invention divides Hadoop small file access process into three levels, and each level completes different processing processes. The three-layer structure diagram is attached figure 1 shown.

[0037]The present invention adopts the B / S mode, that is, the "browser-server" mode. The user interface layer is the client machine, which is an ordinary PC equipped with a browser. The business logic layer is the preprocessor, which can be a single server or a server cluster, in which a Web server, such as Tomcat, is running to process the request submitted by the client through the browser and the response to the request. The preprocessor is the middleware between the user interface layer and the Hadoop cluster at the storage layer. It is mainly responsible for preprocessing the user's ope...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides an optimized method for accessing lots of small files for Hadoop, and aims at providing a small file merging, indexing and querying method applied to the Hadoop. The method mainly solves the problem of low access efficiency of small files in the Hadoop. The invention provides a three-layer Hadoop small file access processing architecture; and the three layers are respectively a user interface layer, a service logic layer and a data storage layer. According to the method, a merging mapping technology of the small files at a preprocessor side and a fast indexing technology of lots of small files are used.

Description

technical field [0001] The invention relates to the field of software development and application integration, in particular to the field of a mechanism and method for accessing a large number of small files in the Internet. Background technique [0002] Hadoop is one of the more mature cloud computing platforms in recent years. It has been widely used in the Internet field due to its reliability, efficiency, and scalability, and has also attracted widespread attention from the academic community. As the distributed file system of Hadoop, HDFS has become the mainstream file system deployed on mass storage clusters. HDFS consists of a NameNode and several DataNodes. The NameNode is responsible for managing the namespace of the file system, and the DataNode is the working node of the file system. The master-slave architecture mode of HDFS greatly simplifies the structure of the distributed file system, but since the NameNode places the metadata of the file system in memory, t...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30G06F9/44
CPCG06F16/134G06F16/182
Inventor 胡海峰贾玉辰
Owner NANJING UNIV OF POSTS & TELECOMM
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products