Method for deleting repeated data by using double-fingerprint hash check

A technology of data deduplication and hash function, which is applied in the direction of redundant data error detection, electrical digital data processing, and special data processing applications, etc., which can solve the problem of increasing the amount of data block fingerprint calculations, costing calculations, Problems such as data block collisions can be achieved to reduce the amount of fingerprint calculations and save calculation costs

Inactive Publication Date: 2011-08-17
HUAZHONG UNIV OF SCI & TECH
View PDF3 Cites 71 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

If the number of fingerprints is small, the probability of collision during data block retrieval is high; if the number of fingerprints is large, the corresponding probability of collision is low, and it will cost more computation
And for KB-level fixed-length blocks, the number of blocks is very large, which leads to more complex calculations
In practical applications, in order to achieve a lower collision probability, a hash function with a high amount of calculation will be selected for fingerprint operation to reduce the probability of data block retrieval collision, which also greatly increases the calculation amount of data block fingerprints

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for deleting repeated data by using double-fingerprint hash check
  • Method for deleting repeated data by using double-fingerprint hash check
  • Method for deleting repeated data by using double-fingerprint hash check

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0024] The present invention is based on the backup mechanism of double fingerprint verification, on the basis of data fixed-length block, utilizes the characteristics of local existence of repeated data, adds double fingerprint hash verification, and deletes repeated data, thereby reducing the calculation amount of data fingerprints . Double-fingerprint hash verification aims to optimize fingerprint calculation, that is, to use weak verification (low calculation amount) and strong verification (high calculation amount), first use the former for screening, and use the latter for calculation after a collision occurs . The so-called weak check means that different data blocks may get the same check value. Strong check is to ensure that different data blocks must not get the same check value. The check value calculated by weak check is called weak check. Fingerprint, the verification value calculated by strong verification is called strong fingerprint. The weak checksum here is...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a method for deleting repeated data by using double-fingerprint hash check. The method comprises the following steps of: dividing an object to be backed up into data blocks to be backed up with equal lengths; judging whether the weak fingerprint of one of the data blocks to be backed up is the same as the weak fingerprint of any data block in a server; if the weak fingerprint of one of the data blocks to be backed up is different from the weak fingerprint of any data block in the server, backing up the data block; if the weak fingerprint of one of the data blocks to be backed up is the same as the weak fingerprint of any data block in the server, judging whether the strong fingerprint of the data block is the same as the strong fingerprint of any data block in the server; if the strong fingerprint of the data block is different from the strong fingerprint of any data block in the server, backing up the data block; and performing the operation on all data blocks to be backed up. In the method for deleting the repeated data, a hash function with low calculation amount is adopted at the first time to perform weak check on every data block and a has function with high calculation amount is adopted then to perform strong check, so the problem that all the data are checked by the hash function with the high calculation amount is avoided, the fingerprint calculation amount during checking is reduced greatly, the series performance is enhanced, and the appreciable transmission performance is provided for data backup based on mass data storage.

Description

technical field [0001] The invention belongs to the field of computer storage technology and data backup technology, and in particular relates to a duplicate data deletion method using double-fingerprint hash verification. Background technique [0002] With the improvement of computer informatization, human society has entered the information age, and computers have penetrated into all walks of life in society. More and more applications are combined with computers, and people's work and life rely more and more on the Internet. The stronger it is, the more important the security of the network system and its data becomes. At the same time, the infinite expansion of the Internet has led to the explosive growth of data information in a geometric progression. Turing Award winner Jim Gray pointed out that the amount of new data added every 18 months in the network environment is equal to the sum of the amount of data in history. Almost all business activities of enterprises are...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30G06F11/14
Inventor 周可王桦黄志刚金津
Owner HUAZHONG UNIV OF SCI & TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products