Small file storage method and device based on HADOOP

A storage device and small file technology, applied in the file system, file system function, file access structure, etc., can solve the problem of inability to effectively store a large number of small files

Pending Publication Date: 2021-05-28
西藏宁算科技集团有限公司
View PDF8 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] This application provides a small file storage method and device based on HADOOP, which is used to solve the problem that each data block in the existing HDFS system can only store one file, and the file will occupy the namespace of the data block at the same time, resulting in the inability to effectively store a large number of files. Technical issues with small files

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Small file storage method and device based on HADOOP
  • Small file storage method and device based on HADOOP

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0029] The present application is described in detail below in conjunction with the examples, but the present application is not limited to these examples.

[0030] see figure 1 , the small file storage method based on HADOOP provided by this application, comprises the following steps:

[0031] Step S100: Analyze the type and number of bytes of the file to be uploaded, and determine whether the number of bytes of the file to be uploaded is greater than 10MB, if yes, pre-store it in the small file queue, and if not, determine whether the number of bytes of the file to be uploaded is greater than 128MB, if so, mark as oversized file;

[0032] By adopting this step, files with different byte counts can be classified and stored, and file storage efficiency and processing efficiency can be improved.

[0033] Step S200: setting up a temporary storage area on the server, judging whether the total number of bytes of the small file queues in the temporary storage area is greater than...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a small file storage method and device based on HADOOP. The method comprises the following steps: S100, analyzing the type and the byte number of a to-be-uploaded file; S200, setting a temporary storage area on the server, and judging whether the total byte number of small file queues in the temporary storage area is greater than 128 MB or not; and step S300, naming a plurality of file directories in each data block of the Hadoop system according to file types through the NameNode, obtaining distribution space positions in the data blocks of the Hadoop system through the NameNode, and respectively merging the small file directories into preset classification file directories according to the types of the small files when the plurality of merged small file queues are uploaded. According to the method, a plurality of small files are placed in one hdfs file according to a certain rule, so that the problems of low read-write efficiency and large occupied space of small file storage are solved.

Description

technical field [0001] The present application relates to a HADOOP-based small file storage method and device thereof, belonging to the technical field of file storage. Background technique [0002] Hadoop Distributed File System (HDFS) is a distributed file system that runs on commodity hardware. It has a lot in common with existing distributed file systems. HDFS is highly fault-tolerant and can provide high-throughput data access. At the same time, HDFS relaxes some POSIX constraints to achieve the purpose of streaming file system data. [0003] The basic storage unit of the Hadoop distributed file system is a data block (Block). When the capacity of a data block is set to 128MB, if the size of the uploaded file is smaller than this value, due to the existing storage mode of the HDFS system, the file will still occupy a block. The namespace of the Block (NameNodeMetadata), but the physical storage does not occupy the entire space of 128MB. [0004] When a large number ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/172G06F16/182G06F16/17G06F16/16G06F16/13
CPCG06F16/172G06F16/182G06F16/1727G06F16/13G06F16/16
Inventor 洪金磊扈晓
Owner 西藏宁算科技集团有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products