Unlock instant, AI-driven research and patent intelligence for your innovation.

HDFS-based small file processing method, apparatus and device, and storage medium

A processing method and small file technology, applied in the computer field, can solve problems such as consumption, occupation, and large memory consumption of name nodes, and achieve the effect of improving access efficiency

Inactive Publication Date: 2020-11-13
SUZHOU EAGLE NETWORK TECH CO LTD
View PDF5 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Each small file occupies a storage space, and task (task) startup will take a lot of time or even most of the time spent on starting and releasing tasks
[0004] At the same time, when HDFS processes small files, the name node needs to consume a lot of memory to save the metadata information of small files, and the efficiency of uploading and downloading small files is not ideal.
Although HDFS provides methods such as HAR, SequenceFile, MapFile, and CombineFileInputFormat to deal with small file problems, the HAR and CombineFileInputFormat methods cannot improve the upload efficiency of small files. Although the SequenceFile and MapFile methods have high upload efficiency, their query efficiency is low. , so it is very meaningful to solve the small file problem of HDFS

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • HDFS-based small file processing method, apparatus and device, and storage medium
  • HDFS-based small file processing method, apparatus and device, and storage medium
  • HDFS-based small file processing method, apparatus and device, and storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0059] figure 1 It is a flow chart of an HDFS-based small file processing method provided by Embodiment 1 of the present invention. This embodiment is applicable to the case of processing small files in HDFS, and the method can be executed by an HDFS-based small file processing device , the device can be implemented by software and / or hardware, and can generally be integrated into computer equipment. Correspondingly, such as figure 1 As shown, the method includes the following operations:

[0060] S110. Retrieve the small files in the HDFS according to a preset retrieval period.

[0061] Wherein, the preset retrieval period may be a retrieval period set according to actual needs, such as half an hour, 1 hour, or 2 hours, and the embodiment of the present application does not limit the specific value of the preset retrieval period.

[0062] In the embodiment of the present invention, the small files in the HDFS are retrieved according to the preset retrieval cycle, specifica...

Embodiment 2

[0070] figure 2 It is a flow chart of an HDFS-based small file processing method provided by Embodiment 2 of the present invention. This embodiment is embodied on the basis of the above-mentioned embodiments. The specific implementation manner of classifying the small files in the small files according to the keywords of each of the small files, and merging and storing the classified small files according to the preset file merging method. Correspondingly, such as figure 2 As shown, the method of this embodiment may include:

[0071] S210. According to the preset retrieval cycle, use a file whose file size satisfies the small file retrieval condition as the small file.

[0072] Wherein, the small file retrieval condition may be: the file size is smaller than a set threshold. Exemplarily, the set threshold may be 216M or 512M, etc., and may be specifically set according to actual requirements, which is not limited in this embodiment of the present invention.

[0073] In t...

Embodiment 3

[0090] image 3 It is a schematic diagram of an HDFS-based small file processing device provided in Embodiment 3 of the present invention, as shown in image 3 As shown, the device includes: a small file retrieval module 310, a small file classification module 320, and a small file storage module 330, wherein:

[0091] A small file retrieval module 310, configured to retrieve small files in HDFS according to a preset retrieval cycle;

[0092] A small file classification module 320, configured to classify the small files according to the keywords of each of the small files;

[0093] The small file storage module 330 is configured to merge and store the small files according to a preset file merge method; wherein, the preset merge method includes an item method or a dictionary method.

[0094] The embodiment of the present invention retrieves the small files in HDFS according to the preset retrieval cycle, classifies the retrieved small files according to the keywords of each ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The embodiment of the invention discloses a small file processing method, device and equipment based on an HDFS and a storage medium. The method comprises the steps that small files in the HDFS are retrieved according to a preset retrieval period; classifying the small files according to the keywords of the small files; merging and storing the small files according to a preset file merging mode; wherein the preset combination mode comprises a project mode or a dictionary mode. According to the technical scheme of the embodiment of the invention, the access efficiency of the HDFS to the small files can be improved, so that the resource consumption of the HDFS is reduced, and the overall performance of the HDFS is improved.

Description

technical field [0001] Embodiments of the present invention relate to the field of computer technology, and in particular, relate to an HDFS-based small file processing method, device, computer equipment, and storage medium. Background technique [0002] In HDFS (Hadoop Distributed File System, Distributed File System), with the growth of data, it takes longer time for data processing to get results. And these data contain a large amount of small files again, and the size of this file is smaller than the size of the data block block, will bring serious problem to the performance of hadoop (distributed system infrastructure) like this. [0003] First of all, in HDFS, any block, file or directory is stored in the form of objects in the memory, and each object occupies about 150 bytes. If there are 10000000 small files, each file occupies a block, the Namenode (Master node) is about 3G space is required. If storing 100 million files, Namenode needs 30G space. Such a large nu...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/11G06F16/16G06F16/182
CPCG06F16/113G06F16/16G06F16/182
Inventor 宋大伟丁静
Owner SUZHOU EAGLE NETWORK TECH CO LTD