Data de-duplication method

A technology for data deduplication and data block, applied in the direction of electrical digital data processing, special data processing applications, instruments, etc., can solve the problems of slow deduplication speed and formation of single points of failure, and achieve high deduplication rate and high reliability , the effect of excellent performance

Inactive Publication Date: 2012-01-18
上海文广互动电视有限公司
View PDF3 Cites 59 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The advantage of byte-level deduplication is that the deduplication rate is higher, and the disadvantage is that the deduplication speed is slower
[0012] In addition, the traditional data deduplication method provides data services through a single physical device. When deduplication is performed, a single point of failure will be formed, which poses challenges to system reliability.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Data de-duplication method
  • Data de-duplication method
  • Data de-duplication method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0040] The present invention will be further described below in conjunction with the accompanying drawings and specific embodiments.

[0041] see figure 1 , a method for deduplication of the present invention, comprising the following steps:

[0042] First, write a file, divide the file into variable-length blocks, form multiple data blocks of different lengths, and calculate the hash value of the data blocks;

[0043] Form sample data of the file by sampling the hash value;

[0044] Locating a similarity group of documents by comparing sample data of documents with sample data of existing documents;

[0045] If the similarity between the sampled data of the file and the sampled data of the current existing file exceeds a certain value, it is determined that the data group corresponding to the sampled data of the current existing file is a similarity group of files.

[0046] Identify duplicate data blocks by comparing the hash value of the file with the hash value of the s...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a data de-duplication method, which comprises the following steps of: writing a file, lengthening the file, dividing the file into a plurality of data blocks with different lengths, and calculating Hash values of the data blocks; sampling the Hash values, and thus forming the sampling data of the file; by comparing the sampling data of the file with the sampling data of the conventional file, positioning a similarity group of the file; by comparing the Hash value of the file with the Hash value of the similarity group in a meta database, determining duplicated data blocks; de-duplicating, and storing non-duplicated data blocks; and generating a meta file, and storing the Hash values of the non-duplicated data blocks into the meta database. By adoption of the data de-duplication method, the occupation of de-duplication operation on resources of a system can be dynamically adjusted, the performance of in-line service is preferentially guaranteed, and the influence of the in-line service of the service is minimized. The data de-duplication method has the characteristics of high reliability, good stability and higher de-duplication rate.

Description

technical field [0001] The invention relates to a method for deleting data, in particular to a method for deleting repeated data. Background technique [0002] De-duplication is a data reduction technology designed to reduce the storage capacity used in a storage system. It eliminates redundant data by deleting duplicate data in the storage system and retaining only one copy. Data deduplication technology can greatly reduce the consumption of physical storage space. [0003] The data deduplication technology can be divided into an online processing method (In-Line) and a post-processing method (Post-Process) according to the data processing method. [0004] The in-line approach to deduplication is to perform deduplication before the data is written to disk. The deduplication of online processing reduces the amount of data to a certain extent, but there is also a problem. The deduplication operation itself will reduce the data throughput rate, resulting in a decrease in bu...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
Inventor 安然谈川玉卢宝丰
Owner 上海文广互动电视有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products