Unlock instant, AI-driven research and patent intelligence for your innovation.

Parallel adding method and system for merging small files on basis of distributed file system

A distributed file and small file technology, applied in file systems, file access structures, special data processing applications, etc., can solve problems such as inability to append files to existing files, reduce pressure, reduce metadata, and reduce IO effect of overhead

Active Publication Date: 2016-08-17
INST OF COMPUTING TECHNOLOGY - CHINESE ACAD OF SCI
View PDF5 Cites 19 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

"Li Tie, "Research on HDFS Optimization for Massive Small File Access", Donghua University Dissertation, 2015" designed a middleware based on HDFS, and established a task layer between the user interface and HDFS. Each function Corresponding to its own buffer, temporarily store the files that need to be merged or deleted in their respective buffers first, and when the number of files reaches a certain amount or after a certain period of time, the batch of files that need to be processed will be uniformly operated. However, this method cannot Appends files to existing files, only merges with new files

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Parallel adding method and system for merging small files on basis of distributed file system
  • Parallel adding method and system for merging small files on basis of distributed file system
  • Parallel adding method and system for merging small files on basis of distributed file system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment approach

[0050] Attached below figure 1 , 2, further describe the steps of the present invention, such as figure 1 2. The step of adding files in the present invention includes continuous execution: A. uploading files to Memcache; B. appending small files to the target large file. A specific implementation is as follows:

[0051] A. The client uploads the file from the local to Memcache, such as figure 1 As shown, the implementation method is:

[0052] A1. The user is on the client interface, path p i Select the required small file f i , named src i , and select the destination file dest i , click upload small file f i ;

[0053] A2. After the client clicks upload, it will send an additional request and put it into the request queue request_queue;

[0054] A3. Take the request from the request_queue, create a thread to process the request, and perform the following steps:

[0055] A3-1: To determine whether the request queue is empty, there are two possibilities: 1) if it is ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a parallel adding method and system for merging small files on the basis of a distributed file system. The method comprises the following steps: a step of uploading small files into Memcache: uploading the small files to object files by a client, taking names of the small files as keys, taking contents of the small files as values, uploading the keys and the values into the Memcache, and writing the keys on a table key-list, wherein the table key-list is stored in the Memcache and the table key-list is used for recording the keys of all the small files stored in the Memcache; a step of adding the small files into an object large file: downloading the table key-list from the Memcache, constructing a Hash table according to the names of the small files in the table key-list and the object files, and merging the small files corresponding to the object files if the target files are the same, wherein each Hash value corresponds to a linear list in which the small files are stored.

Description

technical field [0001] The invention relates to the problem in the field of file processing, in particular to a method and system for parallel appending of small files based on a distributed file system. Background technique [0002] The Internet has changed the world and driven a series of industrial chains. Every year, the amount of data generated by electronic products and other equipment has skyrocketed. The "Digital Universe Report" released by IDC and EMC in 2013 predicts that by 2020, the scale of the digital universe will reach 40ZB. Due to the rise of online shopping and social media, a large number of pictures, emails, messages, and log files are generated, which are increasing rapidly. The data occupies a very large part, and the size of these files is basically K-level or M-level. Since the size is smaller than the block size on HDFS, they are all small files. [0003] Hadoop is a big data storage and processing platform that can process large-scale distributed ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/30
CPCG06F16/13G06F16/182
Inventor 张笛孙毓忠宋莹
Owner INST OF COMPUTING TECHNOLOGY - CHINESE ACAD OF SCI