Two-stage single-instance data de-duplication backup method

A data backup and single-instance technology, which is applied in the direction of electric digital data processing, special data processing applications, redundant data error detection in computing, etc., can solve the problem of heavy client workload, waste of time and bandwidth, and reduce query Speed ​​and other issues

Inactive Publication Date: 2013-09-25
XI AN JIAOTONG UNIV
View PDF3 Cites 50 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] In order to deal with this problem, the most commonly used method is to implement file-level deduplication technology or block-level deduplication technology on the server side. These two methods have many disadvantages. First, simply using file-level deduplication technology cannot achieve Very good deduplication effect, especially for some files with similar content and small differences, and cannot detect duplicate data between files
Second, for block-level deduplication technology, the client needs to upload a large amount of metadata information to the server, and the server can detect duplicate data. Both the server and the client need to process these data in real time, wasting time and bandwidth, and the work of the client a lot
The third is that file-level deduplication detection is to query all file information, without considering the necessary conditions when various files are the same. Block-level deduplication is to uniformly divide all files into blocks, and then use the Query, which will not only make the metadata scale very large, but also reduce the query rate
Fourth, the traditional block-level block technology is easy to disperse and store the continuous data blocks originally in the same file, and the restoration speed is very slow

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Two-stage single-instance data de-duplication backup method
  • Two-stage single-instance data de-duplication backup method
  • Two-stage single-instance data de-duplication backup method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0068] figure 1 Shown is the deployment implementation environment of this method. First, the deployment environment of this method is a C / S structure, including client and server. Local logs are saved on the client, and the logs record information and backups of files that users have saved. information about the task. The client interacts with the server through the network. The server side includes a backup server and a background processing system. The backup server saves the content of the backup file in a storage medium, and saves the metadata of the backup file into a metadata file. While the background processing system performs similar file classification and deduplication operations on files when the backup server is light or has no tasks, and deduplicates the files twice.

[0069] figure 2 Shown is the overall architecture diagram of this method, including three parts: client, backup server, and background processing system. The client processes local files. And...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a two-stage single-instance data de-duplication backup method. De-duplication of data at two stages is performed during backup. The method comprises firstly, performing repeated data detection on a file scale, inquiring local logs to judge whether identical files are stored or not, and if the identical files are stored, informing users to complete the backup operation; if identical files are not stored locally, informing backup programs of server ends to inquire databases to judge whether files with identical content exist or not, if the files with identical content are searched, establishing links pointing to the files for clients only, and recording quotes of the files by clients by the server ends; if the files are new, uploading the files and recording information of the files by two ends; further processing the files after the files are uploaded to the server ends by background programs, and splicing small files together to avoid waste of space; storing large files respectively by type, comparing similar files regularly, and performing difference de-duplication at the second stage after grouping.

Description

technical field [0001] The invention relates to the technical field of computer storage, and in particular aims at providing a method for eliminating redundant data and saving network bandwidth when a client backs up its own files to a server, so as to improve the availability of storage devices. Background technique [0002] In a general environment where the client saves its own files to the server, the server only accepts the files uploaded by the client without performing too many specific checks on the files, and the client does not have any identification of the uploaded files. In a general application environment, when multiple clients upload files to the server, it often happens that multiple users back up the same file, or a single user backs up several consecutive versions of files with similar content. In this case, a large amount of redundant data will be generated. [0003] In order to deal with this problem, the most commonly used method is to implement file-l...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F11/14G06F17/30
Inventor 张兴军朱跃光董小社朱国峰王龙翔姜晓夏
Owner XI AN JIAOTONG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products