HDFS mass small file processing method and system

A technology of massive small files and processing methods, which is applied in the field of HDFS massive small file processing, can solve problems affecting small file access efficiency, NameNode node memory consumption, etc., to avoid long file waiting time, avoid frequent jumps, and highlight the essence The effect of sexuality

Inactive Publication Date: 2018-01-16
ZHENGZHOU YUNHAI INFORMATION TECH CO LTD
View PDF3 Cites 7 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The storage of a large number of small files will cause a large amount of memory consumption of the NameNode node, which will affect the access efficiency of small files

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • HDFS mass small file processing method and system
  • HDFS mass small file processing method and system

Examples

Experimental program
Comparison scheme
Effect test

specific Embodiment approach 1

[0035] figure 1 It is a specific embodiment of the HDFS massive small file processing method described in the present invention. In this specific implementation, the described method for processing a large number of small files in HDFS is applied to a server capable of interacting with each node included in HDFS in the HDFS cluster, including steps:

[0036] S1. The server receives a file upload request, and determines whether the file currently requested to be uploaded is a small file, and the small file is a file that satisfies the condition that the file size does not exceed the preset first threshold; if the above determination result is yes, then Execute step S3, otherwise execute step S2;

[0037] S2. Upload the file currently requested to be uploaded to the HDFS cluster;

[0038] S3. Cache the file uploaded by the above-mentioned current request; then execute step S4;

[0039] S4. Calculate the sum of the sizes of the above-mentioned currently cached files uploaded b...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides an HDFS mass small file processing method and system. According to the method, whether all received files currently requested to be uploaded are small files is judged, if yes, the files currently requested to be uploaded are cached respectively, the sum of sizes of all the files which are currently cached and requested to be uploaded is calculated, and the total quantity ofall the files which are currently cached and requested to be uploaded is subjected to statistical analysis; the currently calculated sum of sizes and a preset second threshold value are compared, thetotal quantity currently obtained through statistical analysis and a preset specific quantity threshold value are compared; when any of the comparison relations is established, all the files which arecurrently cached and requested to be uploaded are merged, and a file index list of all the files which are currently cached and requested to be uploaded is created; and next, the currently merged files and the corresponding file index list are uploaded to an HDFS cluster, and then all the files which are cached and requested to be uploaded are removed. In this way, access efficiency of an HDFS onmass small files can be improved.

Description

technical field [0001] The invention relates to the field of HDFS data storage, in particular to a method and system for processing massive small files of HDFS, which are mainly applicable to servers in HDFS clusters capable of interacting with each node included in HDFS. Background technique [0002] HDFS (Hadoop Distributed File System), consisting of a NameNode and several DataNodes, is an important part of the cluster. With its reliable, efficient, and scalable features, it has been widely used in the field of large-scale computing. [0003] With the rapid development of the Internet, there are many types of data generated by e-commerce, social networks, mobile communications, etc., and the amount of data is very large. The amount of small files in HDFS is getting bigger and bigger. [0004] However, HDFS is mainly aimed at the use mode of writing once and reading many times. Its design purpose is to store very large files, mainly for files above 100 megabytes. The sto...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
Inventor 赵闪闪
Owner ZHENGZHOU YUNHAI INFORMATION TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products