Duplicated data deletion method in data recovery system

A technology for data deduplication and data disaster recovery, which is applied in the direction of data error detection, electrical digital data processing, and response error generation, which can solve the problems of large memory space and low efficiency, and achieve Generate and retrieve efficient effects

Inactive Publication Date: 2017-05-10
CHANGCHUN UNIV OF SCI & TECH
View PDF3 Cites 7 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] The present invention provides a method for deduplicating data in a data disaster recovery system in order to solve the problems of low efficiency and large memory space in the existing deduplication method

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Duplicated data deletion method in data recovery system
  • Duplicated data deletion method in data recovery system
  • Duplicated data deletion method in data recovery system

Examples

Experimental program
Comparison scheme
Effect test

specific Embodiment approach 1

[0031] Specific implementation mode 1. Combination Figure 1 to Figure 6 Describe this embodiment, the data deduplication method in the data disaster recovery system, this method introduces sampling into the data block process, makes sampling and data block synchronously complete; Use the hash value of the sample as the fingerprint;

[0032] The basis of the minimum value sampling method based on variable-length blocks is CDC block, and the block method specifically includes the following steps:

[0033] Step 1: Set a fixed-size sliding window, D and r are predefined values;

[0034] Step 2: Set the starting position of the file to the left border of the block, and align the left border of the sliding window with the left border of the block;

[0035] Step 3: If the right boundary of the sliding window reaches or exceeds the end of the file, set the end of the value file as the right boundary of the block, and end the block;

[0036] Step 4: Calculate the hash value hv of th...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a duplicated data deletion method in a data recovery system, relates to the field of data storage and solves the problem that an existing duplicated data deletion method is low in efficiency and large in occupied memory space. According to the deletion method provided by the invention, the number of fingerprints in a fingerprint base is reduced through sampling; hash values of samples are taken as fingerprints of the samples; and the fingerprints of the samples are not calculated through utilization of fingerprint generating algorithms such as MD5, so the fingerprint base generation and search efficiency are relatively high. According to the method, sliding windows with the minimum hash values are taken as the samples; when partial content of a file is changed, the positions of the samples may not be changed, namely the hash values of the samples are still minimum, so the mode is insensitive to partial update of the file. Compared with an enhanced position sensing sampling method, the deletion method provided by the invention has the advantage that the similarity of the obtained samples is higher. According to the method, the similarity and the duplication deletion rate of the samples when the number of the samples is different are tested, and test results show that according to the method, the higher similarity and duplication deletion rate can be obtained.

Description

technical field [0001] The invention relates to the field of data storage, and relates to a method for deduplication of data, in particular to a method for deduplication of data in a data disaster recovery system. Background technique [0002] Data is the oil of the information age, especially in the era of big data, ensuring data security and availability is a very important task. The data disaster recovery system ensures that important data will not be lost in the event of disaster or human error by backing up data in different places. However, the data that is continuously backed up has a high degree of redundancy, which not only wastes storage space, but also reduces storage performance and increases storage costs. Data deduplication technology can eliminate redundant data in disaster recovery systems. The general steps of data deduplication are: when a new file is added in the storage system, the file is first divided into blocks, and then the fingerprint of each bloc...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F11/14
CPCG06F11/1453
Inventor 祁晖底晓强李锦青宋小龙毕琳蒋振刚杨华民从立钢任维武
Owner CHANGCHUN UNIV OF SCI & TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products