Method and system for implementing repeated data deletion

A technology for duplicating data and data, which is applied to the redundancy in operations for data error detection, electrical digital data processing, special data processing applications, etc.

Inactive Publication Date: 2010-11-10
无锡北方数据计算股份有限公司
View PDF0 Cites 58 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The downside of this method is that it requires a supported back

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and system for implementing repeated data deletion
  • Method and system for implementing repeated data deletion
  • Method and system for implementing repeated data deletion

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0019] In order to enable those skilled in the art to better understand the solutions of the embodiments of the present invention, the embodiments of the present invention will be further described in detail below in conjunction with the drawings and implementations.

[0020] To achieve deduplication, it is necessary to find the same data, then build a high-speed index, and use the index to replace the same data. The key points of realization are how to find the same data, and how to build a fast index. The traditional data deduplication technology marks by calculating the hash value of the data, and maintains the index through a large number of caches. It is difficult to achieve fast indexing, and if the hash value is used as the data fingerprint, hash conflicts are inevitable. Although the probability is very low, once it occurs, it will cause unpredictable data errors.

[0021] The principle of the Simhash (similarity hash) algorithm is: each token in the data is mapped to ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a method for implementing repeated data deletion, which comprises the following steps of: calculating similarity of data by a simhash (similarity hash) algorithm; reckoning the similarity into an index library through a similarity positioning storage position; writing the data into a data warehouse; if the data with the same similarity enters the data warehouse, extracting the corresponding data in the data warehouse, and then performing binary comparison; and if the data are the same, recording indexes, otherwise, recording different data parts. The invention also provides a system for implementing the repeated data deletion, which comprises a similarity marking library (BitMap), a data offset marking library, the data warehouse (LBAMap) and a storage library (Resp) for recording initial data. The method and the system ensure data consistency by data comparison based on the simhash algorithm theory, efficiently finish the repeated data deletion, and ensure the consistency of the data.

Description

technical field [0001] The invention relates to the technical field of computer storage, in particular to a method and system for realizing deduplication of data. Background technique [0002] The key business data of the enterprise will be backed up every day. According to the customization of backup strategy, incremental backup can be done every day, and full backup can be done every week. Big, but with the accumulation of a large amount of data, a large amount of data is repeated, and repeated data leads to a geometric progression in the amount of data. For example, if the initial data volume of an ERP system is 100TB, 10TB is added every day, incremental backup is done 6 days a week, full backup is done on weekends, and the backup data in a week reaches 160TB. However, using Data Deduplication technology, the initial data of 100TB does not need to be backed up repeatedly. It is further found that the incremental data of 10TB per day can be compressed to 1TB. Therefore,...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30G06F11/14
Inventor 张庆敏胡刚谢海威郭栋
Owner 无锡北方数据计算股份有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products