Supercharge Your Innovation With Domain-Expert AI Agents!

Fingerprint-based block granularity data deduplication system and method

A data and fingerprint technology, applied in the field of data deduplication and storage, can solve the problems of small length and increase storage system overhead, and achieve the effect of high I/O throughput and low system resource overhead

Pending Publication Date: 2022-04-29
SHANGHAI JIAO TONG UNIV
View PDF1 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

In short, the previous data deduplication methods faced two problems: (1) some blocks with some data changes could not be deduplicated (2) the block length was small, which increased the storage system overhead

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Fingerprint-based block granularity data deduplication system and method
  • Fingerprint-based block granularity data deduplication system and method
  • Fingerprint-based block granularity data deduplication system and method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0051] The invention is based on fingerprint block granularity data De duplication, and the architecture diagram of the whole system is as follows Figure 1 As shown in. For write, read and delete, the system runs different processes, which are described below.

[0052] 1. Write process. Firstly, the blocking module divides the input data into fixed length blocks (FSC) or blocks according to the changing length of the content; Then, for each block, the fingerprint calculation module calculates the fingerprint, head fingerprint and tail fingerprint of the block; Then, the index module searches for matching entries in the de duplication table, delta table, head de duplication table and tail de duplication table respectively according to the fingerprint, head fingerprint and tail fingerprint; There are three possible situations:

[0053] (1) The matching of the head de duplication entry or the tail de duplication entry means that a suitable written block is found, which is called a re...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a block granularity data deduplication system and method based on fingerprints. The method comprises the following steps: carrying out preliminary blocking on original input data; respectively calculating fingerprints, head fingerprints and tail fingerprints of the blocks obtained by partitioning and the heads and the tails of the blocks, and storing the fingerprints, the head fingerprints and the tail fingerprints in corresponding data structures; when the physical block address is read and input, requesting data from a lower-layer storage system, and returning the data after the data is read; when the data is written and input, outputting the data to the lower storage system, and returning the physical block address allocated by the storage system after the writing is completed; delta data are created, and original data are recovered according to the delta data; and adding, deleting, modifying and checking entries of the deduplication table, the head deduplication table, the tail deduplication table and the delta table in the system. According to the method, the deduplication effect of data with similar contents is focused on, and high I / O throughput and low system resource overhead are kept through design strategies of the I / O module and the index module.

Description

technical field [0001] The invention relates to the technical field of data De duplication and storage, in particular to a block granularity data De duplication system and method based on fingerprint. Background technology [0002] In the data storage task, the amount of data to be processed is increasing, which increases the space pressure of the storage system and the I / O throughput. In addition, the expansion of storage media increases the money cost, the service life of high-performance storage media is short, and a large amount of data writing further shortens the service life of the media. In order to solve the above problems, the usual practice is to de duplicate the data of the storage system. Specifically, the input data is processed in blocks, the fingerprint of each block is calculated, and the fingerprint to be written is compared with the fingerprint of the written block. If it is the same, it indicates that the data of the two blocks are repeated, the block is no ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F3/06
CPCG06F3/061G06F3/0616G06F3/064G06F3/0673
Inventor 姚建国张子扬管海兵彭博
Owner SHANGHAI JIAO TONG UNIV
Features
  • R&D
  • Intellectual Property
  • Life Sciences
  • Materials
  • Tech Scout
Why Patsnap Eureka
  • Unparalleled Data Quality
  • Higher Quality Content
  • 60% Fewer Hallucinations
Social media
Patsnap Eureka Blog
Learn More