Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Differential compression method based on data de-duplication

A technology for de-duplication and data-duplication, which is applied in electrical digital data processing, special data processing applications, instruments, etc., and can solve the problems of limiting the promotion and development of differential compression algorithms, low data compression efficiency, and large indexing overhead. Improve data storage compression efficiency, avoid calculation and index overhead, and maximize the effect of search range

Active Publication Date: 2012-12-19
HUAZHONG UNIV OF SCI & TECH
View PDF2 Cites 58 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] However, the existing differential compression technology has the following problems: its calculation speed is slow, the indexing overhead is large, the data compression efficiency is low, and the scalability is poor. If it is to support PB-level similar data retrieval, it will generate a 10TB-level similar data information index , these metadata cannot be put into the memory because they are too large, and at the same time, because they are put into the disk storage management, it brings the bottleneck of slow indexing
Such metadata management and indexing severely limit the promotion and development of differential compression algorithms.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Differential compression method based on data de-duplication
  • Differential compression method based on data de-duplication
  • Differential compression method based on data de-duplication

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0034] In order to make the object, technical solution and advantages of the present invention clearer, the present invention will be further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the present invention, not to limit the present invention.

[0035] The differential compression method based on deduplication in the present invention divides the data stream to be backed up into blocks and groups, then performs deduplication, and then uses the deduplication information to judge similarity data blocks. For the case of poor locality, the present invention also A low-cost super-fingerprint is used as a supplement. By combining locality and similarity mining, it can maximize the search for similar data, improve the efficiency of differential compression, and reduce the cost of similar data search.

[0036] In the present invention, multiple co...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a differential compression method based on data de-duplication. The differential compression method includes steps of partitioning files in data flow to obtain multiple data blocks; computing data block fingerprint of each data block for searching duplicate data; grouping all the data blocks to establish data block groups and double link lists thereof; searching the fingerprint of each data block in each data block group for realizing data de-duplication so as to determine whether the data block is duplicated or not; searching similar data locally to the data block group which is subjected to the data de-duplication process according to the duplicated data information in the double link lists of the data block groups, namely, determining the non-duplicated data blocks adjacent to the duplicated data blocks as potential similar data blocks; verifying the similarity of the similar data blocks by differential compression; and finally complementarily searching similarity data to the data block groups according to the similarity. The differential compression method based on data de-duplication has the advantages of rapidness in similar data searching, low computing and indexing overhead and high data compression efficiency.

Description

technical field [0001] The invention belongs to the field of data compression for computer storage, and more specifically relates to a data difference compression method based on deduplication. Background technique [0002] In recent years, with the development and popularization of computer technology and networks, the amount of data information storage in the world has shown an explosive growth trend. Although the price of storage devices has been declining, it is far behind the speed of data expansion. Data deduplication (Data Deduplication), as a technology to effectively eliminate redundant data on a large scale, has become a hotspot in storage system research in recent years. To put it simply, the emerging data deduplication is an important technology to reduce the cost of data storage by effectively eliminating redundant data on a large scale. For example: Now a core department has 200GB of data that needs to be backed up every day, so it needs to back up 73TB a yea...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/30
Inventor 冯丹夏文江泓田磊付忞
Owner HUAZHONG UNIV OF SCI & TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products