Method and device for processing small files

A processing method and technology for small files, applied in the direction of electrical digital data processing, special data processing applications, instruments, etc., can solve the problems of increasing space occupation of small files, slow storage and reading operation speed of large files, etc.

Active Publication Date: 2014-01-15
HUAWEI TECH CO LTD
View PDF5 Cites 15 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] Since the above method only integrates small files into a large file, does not deduplicate the large file, increases the file header to record the information of the small file, increases the space occupied by

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and device for processing small files
  • Method and device for processing small files
  • Method and device for processing small files

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0031] figure 1 The flow chart of the small file processing method provided by Embodiment 1 of the present invention, such as figure 1 As shown, the method includes:

[0032] Step 101. Read N small files, where the small files are files smaller than MkB.

[0033] The action in step 101 can be performed by a deduplication management process (Management, MGT). The method that can be adopted is: MGT reads N files under the same directory on the disk. The advantage of doing this is that the reading speed is fast and the management is convenient. And M is a numerical value that can be defined manually according to needs. N represents the number of small files, obviously, N can be a natural number greater than 0.

[0034] Step 103, integrating N small files into one large file, and dividing the integrated large file into blocks by using a sliding window;

[0035] Wherein, logically combining the read N small files into a large file means that the N small files are logically comb...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a method and device for processing small files. With reference to processing of data de-duplication of massive small files, the small files are integrated into a large file, then the integrated large file is partitioned, in the partitioning process, if a data block which is partitioned at present contains the tail of any one small file, the tail of the small file is used as the ending position of the current data block partitioning according to needs, and after the integrated large file is partitioned, operation like repeating data searching is carried out. Efficiency of data de-duplication of the small files is improved.

Description

technical field [0001] The invention relates to file processing technology, in particular to a small file processing method and device. Background technique [0002] Data deduplication technology is a new type of application in storage systems. By deleting duplicate data in the storage system, only one copy is kept, thereby eliminating redundant data. Deduplication of files requires operations such as segmenting files, similarity analysis, and querying duplicate blocks. Then, when deduplicating a small file, it will take a lot of time to eliminate redundant data in small blocks, which is not only time-consuming, but also Take up system resources. [0003] In the prior art, in the process of deduplication, a large number of small files are usually processed in the following way: multiple small files are merged into one large file according to a certain method, and the small file information is recorded in the header of this large file for use in Storage and reading of a la...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30
CPCG06F16/162
Inventor 叶林睿张宗全钟延辉
Owner HUAWEI TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products