Data de-duplication method

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
A technology for data deduplication and data block, applied in the direction of electrical digital data processing, special data processing applications, instruments, etc., can solve the problems of slow deduplication speed and formation of single points of failure, and achieve high deduplication rate and high reliability , the effect of excellent performance

Inactive Publication Date: 2012-01-18

上海文广互动电视有限公司

View PDF3 Cites 59 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

The advantage of byte-level deduplication is that the deduplication rate is higher, and the disadvantage is that the deduplication speed is slower

[0012] In addition, the traditional data deduplication method provides data services through a single physical device. When deduplication is performed, a single point of failure will be formed, which poses challenges to system reliability.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0040] The present invention will be further described below in conjunction with the accompanying drawings and specific embodiments.

[0041] see figure 1 , a method for deduplication of the present invention, comprising the following steps:

[0042] First, write a file, divide the file into variable-length blocks, form multiple data blocks of different lengths, and calculate the hash value of the data blocks;

[0043] Form sample data of the file by sampling the hash value;

[0044] Locating a similarity group of documents by comparing sample data of documents with sample data of existing documents;

[0045] If the similarity between the sampled data of the file and the sampled data of the current existing file exceeds a certain value, it is determined that the data group corresponding to the sampled data of the current existing file is a similarity group of files.

[0046] Identify duplicate data blocks by comparing the hash value of the file with the hash value of the s...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a data de-duplication method, which comprises the following steps of: writing a file, lengthening the file, dividing the file into a plurality of data blocks with different lengths, and calculating Hash values of the data blocks; sampling the Hash values, and thus forming the sampling data of the file; by comparing the sampling data of the file with the sampling data of the conventional file, positioning a similarity group of the file; by comparing the Hash value of the file with the Hash value of the similarity group in a meta database, determining duplicated data blocks; de-duplicating, and storing non-duplicated data blocks; and generating a meta file, and storing the Hash values of the non-duplicated data blocks into the meta database. By adoption of the data de-duplication method, the occupation of de-duplication operation on resources of a system can be dynamically adjusted, the performance of in-line service is preferentially guaranteed, and the influence of the in-line service of the service is minimized. The data de-duplication method has the characteristics of high reliability, good stability and higher de-duplication rate.

Description

technical field [0001] The invention relates to a method for deleting data, in particular to a method for deleting repeated data. Background technique [0002] De-duplication is a data reduction technology designed to reduce the storage capacity used in a storage system. It eliminates redundant data by deleting duplicate data in the storage system and retaining only one copy. Data deduplication technology can greatly reduce the consumption of physical storage space. [0003] The data deduplication technology can be divided into an online processing method (In-Line) and a post-processing method (Post-Process) according to the data processing method. [0004] The in-line approach to deduplication is to perform deduplication before the data is written to disk. The deduplication of online processing reduces the amount of data to a certain extent, but there is also a problem. The deduplication operation itself will reduce the data throughput rate, resulting in a decrease in bu...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & AuthorityApplications(China)

IPC IPC(8): G06F17/30

Inventor安然谈川玉卢宝丰

Owner上海文广互动电视有限公司

Data de-duplication method

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology