Improved method aimed at small files of HDFS

A technology of small files and files, applied in special data processing applications, program control design, instruments, etc., can solve problems such as low efficiency of HDFS

Inactive Publication Date: 2014-01-22
LANGCHAO ELECTRONIC INFORMATION IND CO LTD
View PDF12 Cites 43 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] The technical problem to be solved by the present invention is: the present invention provides an improved method for processing small files in H

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Improved method aimed at small files of HDFS
  • Improved method aimed at small files of HDFS
  • Improved method aimed at small files of HDFS

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0042] Such as figure 1 As shown, an HDFS improvement method for small files includes a cluster, which contains a Namenode and multiple Datanodes, which can be accessed by multiple clients, and part of the authority of the Namenode is delegated to the Datanode node, allowing the Datanode to cache Part of the metadata information of small files handles most of the small file read and write requests.

Embodiment 2

[0044] On the basis of Embodiment 1, in this embodiment, besides Namenode manages all the metadata of the file system, Datanode also saves part of the metadata, mainly the metadata information of small files, and the metadata information of large files is still stored on Namenode. Among them, Namenode is responsible for managing system-wide activities, and periodically communicates with each Datanode with heartbeat information, providing them with instruction operations and collecting their feedback status.

Embodiment 3

[0046] On the basis of Embodiment 1, the operation of the client in this embodiment assigns the metadata operations of some small files to the Datanode. If no corresponding result is found, it is searched on the Namenode. When writing files, the client follows the previous Read and write records, directly query whether the Datanode has a block file block that is not full and no other client is writing at this time, if there is, write the data directly to this data block, and update the corresponding metadata information, if not , then send a write data request to the Namenode, and then the Namenode allocates a new data block to complete the data writing, the client queries the data block is completed on the local machine; when reading the file, directly query the Datanode, if not found, then Find Namenodes.

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a computer HDFS field and discloses an improved method aimed at small files of an HDFS. Partial authority of a Namenode is delegated to a Datanode which caches element data information of part of small files and the Datanode is used for processing the writing and reading request of most small files. Burdens of the Namenode are reduced to the utmost. The new processing method is provided for solving the problem that the efficiency in processing the small files by the HDFS is low. The improved method aimed at the small files of the HDFS can effectively solve the problem that burdens of a single node, namely the Datanode are excessively large and pressure of the small files is distributed to the Datanode. Thus, large file and small file processing efficiency and performance of the integral large data processing colony are quite ideal.

Description

technical field [0001] The invention relates to the field of computer HDFS distributed file systems, in particular to an improved HDFS method for small files. technical background [0002] Hadoop Distributed File System, HDFS for short, is a distributed file system. [0003] With the rapid development of the Internet, the amount of data has increased exponentially. In order to adapt to this situation, many large server architectures such as data centers and cloud computing have emerged. In terms of big data processing, Google's GFS provides an effective method for processing large files, and the file system HDFS under Hadoop is an open source implementation of GFS, which realizes most of the functions of GFS. It is also based on large file processing. The processing efficiency is excellent, but the efficiency of processing small files is very low, because when storing small files, it is necessary to repeatedly request storage addresses and allocate storage blocks (blocks), ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30G06F12/08
CPCG06F9/5044G06F16/172G06F16/1827G06F2209/503
Inventor 孟祥飞邓鹏飞吴楠宗栋瑞邓强
Owner LANGCHAO ELECTRONIC INFORMATION IND CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products