Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

HDFS-based small file merging method and device and equipment

A technology of small files and files, applied in the field of distributed file systems, can solve the problem of easy memory space, and achieve the effect of improving storage efficiency, saving hardware resources, and speeding up

Active Publication Date: 2021-03-02
SUZHOU LANGCHAO INTELLIGENT TECH CO LTD
View PDF2 Cites 1 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] In view of this, the object of the present invention is to propose a method, device and equipment for merging small files based on HDFS, in order to solve the problem that single small files occupy in the prior art 1 block leads to the problem that the memory space is easy to be full

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • HDFS-based small file merging method and device and equipment
  • HDFS-based small file merging method and device and equipment
  • HDFS-based small file merging method and device and equipment

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0038]In order to make the objectives, technical solutions, and advantages of the present invention clearer, the following describes the embodiments of the present invention in detail in conjunction with specific embodiments and with reference to the accompanying drawings.

[0039]It should be noted that all the expressions "first" and "second" in the embodiments of the present invention are used to distinguish two non-identical entities or non-identical parameters with the same name. "It is only for the convenience of presentation, and should not be construed as a limitation to the embodiments of the present invention. In addition, the terms "including" and "having" and any variations of them are intended to cover non-exclusive inclusions, for example, other steps or units inherent in a process, method, system, product, or device that include a series of steps or units.

[0040]Based on the foregoing objective, the first aspect of the embodiments of the present invention proposes an embo...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides an HDFS-based small file merging method and device and equipment, and the method comprises the following steps: merging file directories of a plurality of small files of which the storage capacities are within a preset threshold range in a temporary storage medium according to the same level, so as to obtain a plurality of attached directories; querying the storage capacities of the small files of the plurality of auxiliary directories at preset time intervals, respectively collecting the small files under the auxiliary directories of which the sum of the storage capacities is greater than a first threshold value as small file collections, screening and combining the small file collections, and transferring combined block files obtained after processing to an HDFS; and if the memory occupancy rate of the temporary storage medium exceeds a second threshold value, performing hierarchical merging processing and forced storage processing on each small file set and the remaining small files after screening and merging processing until the memory occupancy rate of the temporary storage medium is smaller than the second threshold value. According to the method, thenumber of blocks occupied by small files in the HDFS is effectively reduced, and the storage space is saved.

Description

Technical field[0001]The present invention relates to the technical field of distributed file systems, in particular to a method, device and equipment for merging small files based on HDFS.Background technique[0002]Distributed File System (Hadoop Distributed File System, HDFS) is widely used in the field of large-scale computing due to its high reliability, efficiency, and scalability. Distributed file system includes a NameNode and multiple DataNodes, which is an important part of the cluster structure. component. As the scale of cluster data continues to increase, the resident memory of the NameNode also increases with the increase in the amount of data. For this reason, the size of the heap memory of the NameNode needs to be continuously adjusted to adapt to the ever-increasing memory space. But the NameNode heap space cannot be increased endlessly. The total memory of the NameNode of the cluster of 200 million blocks occupies about 113G; each small file occupies 1 block; if the ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/182G06F16/16
CPCG06F16/182G06F16/16
Inventor 李勇
Owner SUZHOU LANGCHAO INTELLIGENT TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products