A storage optimization method based on data compression and data deduplication

A data compression and storage optimization technology, applied in memory address/allocation/relocation, input/output to record carrier, etc., can solve the problem of data compression mechanism and data de-redundancy mechanism not working together, and achieve good time efficiency and storage efficiency, universal applicability, optimization and perfection

Active Publication Date: 2017-09-26
NANJING UNIV
View PDF4 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] Purpose of the invention: The technical problem to be solved by the present invention is the defect that the current data compression mechanism and the data de-redundancy mechanism do not work together. A storage optimization method based on data compression and data de-redundancy cooperation is used to coordinate scheduling. It is necessary to make a judgment before de-redundancy. Only when it is judged that it is valuable to spend time to de-redundant data, then data de-redundancy is performed, otherwise data compression is performed.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A storage optimization method based on data compression and data deduplication

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0027] Such as figure 1 As shown, this embodiment discloses a storage optimization method based on data compression and data deduplication collaboration, including the following steps:

[0028] Step 1, assuming that the computer needs to reclaim and release the space of N data blocks in the storage medium (memory, disk) to store more new data under certain circumstances, the candidate data block set DSet that can be used for release contains M data blocks (M>N), then the data storage ratio S=N / M expected to be released by this processing. Set the sampling ratio threshold R, 0%

[0029] Step 2, use the random generator provided by the system to generate a random seed, fill in the pseudo-random algorithm to generate a data block random scanning sequence Block 1 、Block 2 、...Block M ;

[0030] Step 3, choose a fast deduplication scanning hash algorithm H, and block the first M×R data blocks 1 、B...

Embodiment 2

[0039] This implementation case discloses the application of a storage optimization method based on the collaboration of data compression and data deduplication to memory management of an operating system. The implementation scenarios are as follows:

[0040] A computer system has a fixed size of physical memory RAM and disk swap SWAP. If the program in the system is constantly applying for memory exceeding the capacity of RAM, then the system needs to perform slow disk swap in exchange for more physical memory space. Contemporary operating systems such as Linux and OS X have memory compression mechanism (ZSWAP) and page memory deduplication mechanism (KSM) to relieve the pressure of swapping and improve the efficiency of system memory usage, but they cannot work together.

[0041] In this scenario, the specific implementation steps of the method in this embodiment are:

[0042] Step 1, assuming that the system attempts to reclaim N pages due to insufficient memory space at a...

Embodiment 3

[0051] This implementation case discloses the application of a storage optimization method based on data compression and data deduplication cooperation in a disk storage system. The implementation scenarios are as follows:

[0052]A computer system uses a hard disk to store file data, organizing data in 4k disk blocks. If the disk is getting full, you need to optimize storage space to get more free storage. Contemporary server file systems such as zfs, btrfs, etc. all have the functions of deduplication and compression at the same time. However, these two functions operate independently in a synchronous or asynchronous manner, and the two cannot work together. After implementing the method, the hard disk storage can obtain more free storage, and the overall optimization speed is improved.

[0053] In this scenario, the specific implementation steps of the method in this embodiment are:

[0054] Step 1, assuming that there are only A free data blocks in the computer disk, bu...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention proposes a storage optimization method based on cooperation of data compression and data deduplication. This method will determine whether data de-redundancy needs to be performed before the data compression mechanism takes effect. Given a batch of data blocks that need to be compressed, this method first estimates the benefits of data de-redundancy mechanisms with high performance overhead based on sampling, and judges whether data de-redundancy is worthwhile: if it is worth it, all given data Blocks are deredundant before being compressed, otherwise they are compressed directly. The present invention can maximize the efficiency of data storage by coordinating two sets of originally independent mechanisms for reducing the amount of data storage.

Description

technical field [0001] The invention relates to the field of computer data processing, in particular to a storage optimization method based on cooperation of data compression and data deduplication. Background technique [0002] To improve the utilization rate of the limited capacity of the physical medium, there are currently two mainstream methods, namely data deduplication (Data Deduplication) and data compression (Date Compression). The data de-redundancy is to scan and check the data blocks, organize the data blocks with the hash value as the feature, find multiple identical data blocks, and merge them into one block through indirect reference, thereby saving the entire storage space. Data compression relies on existing data compression algorithms (such as LZW) to try to compress data blocks. If the compression rate is higher than the threshold, the compressed data blocks are saved and the original data block space is released to achieve the purpose of saving space. Co...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F3/06G06F12/06
Inventor 夏耐姜承祥
Owner NANJING UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products