Repeated data deleting method targeted at backup task

A technology for deduplication and backup tasks, which is applied to the redundancy of operations in data error detection, electrical digital data processing, and response error generation. It can solve problems such as increased energy consumption and improve deduplication efficiency. The effect of solving the bottleneck problem of fingerprint query and reducing the scope of duplicate checking

Active Publication Date: 2016-07-20
BEIHANG UNIV
View PDF5 Cites 4 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

These two overheads are especially obvious when the deduplication effect is poor, and directly lead to increased energy consumption

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Repeated data deleting method targeted at backup task
  • Repeated data deleting method targeted at backup task

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0047] The present invention will be further described in detail below in conjunction with the accompanying drawings.

[0048] Aiming at the size and change degree of the load task of the data center, the present invention proposes a backup task-oriented data deduplication method.

[0049] Important concepts and definitions involved in the algorithm:

[0050] The number of references to a fingerprint refers to the number of times the fingerprint is repeated minus one during the running process and running history of the algorithm. The reference count of the fingerprint warehouse refers to the sum of the reference counts of each fingerprint in the fingerprint warehouse.

[0051] According to the different roles played by buckets in the algorithm, buckets can be divided into C-bucket, S-bucket, N-bucket, and B-bucket. C-bucket is a warehouse for storing newly generated fingerprints. There is only one C-bucket at a time. The fingerprints in C-bucket are fingerprints that have n...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a repeated data deleting method targeted at a backup task.The method includes the steps that firstly, the backup task is divided; a fingerprint storehouse which completes the whole duplicate checking process on a hard disk is placed into a set B-bucket; then, local caching and global caching are established in the internal storage; elements in the B-bucket are placed into the global caching; all fingerprints of the current backup task are sequentially placed into a fingerprint storehouse C-bucket; the C-bucket is updated after reaching a filled state, and the updated biggest fingerprint and the smallest fingerprint are traversed and recorded; then, the fingerprint storehouse containing the two fingerprints is searched for in the B-bucket, and the local caching is added; after each updated fingerprint is researched and marked in the local caching and the global caching, the unmarked fingerprints are preserved to a fingerprint storehouse N-bucket; the marked fingerprints are all deleted; finally, the N-bucket is replaced after reaching a filled state, the local caching is added, and the global caching is updated.The repeated data deleting method has the advantages that the problem of fingerprint duplicate checking bottleneck is solved, the duplicate checking range is reduced, and duplicate checking efficiency is improved; a high throughput rate is maintained.

Description

technical field [0001] The invention belongs to the field of data backup storage, and describes a backup task-oriented deduplication method for data. Background technique [0002] As the energy consumption of data centers has attracted more and more attention from the IT industry, how to save energy consumption in data centers has gradually become a topic of focus for researchers. Data backup is one of the main applications of the storage system in the data center; therefore, applying a reasonable backup strategy to reduce the energy consumption of the storage system is an important way to reduce the overall power consumption of the data center. [0003] According to statistics, the energy consumed by the data center accounts for 1.5% of the world's energy consumption, and 40% of the energy comes from the storage system of the data center. Researchers and administrators usually use two methods to reduce the energy consumption of storage systems. One is to start from hardwar...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F11/14
CPCG06F11/1453
Inventor 吴文峻
Owner BEIHANG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products