Extensible pipeline for data deduplication

A deduplication and assembly line technology, applied in the direction of electronic digital data processing, digital data information retrieval, special data processing applications, etc., can solve the problem of less value

Active Publication Date: 2012-07-11
MICROSOFT TECH LICENSING LLC
View PDF4 Cites 11 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Furthermore, not all data that can be deduplicated yields equal savings (benefits) from d

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Extensible pipeline for data deduplication
  • Extensible pipeline for data deduplication
  • Extensible pipeline for data deduplication

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0024] Aspects of the techniques described herein generally relate to scalable pipelines for data deduplication, where individual modules / stages of the pipeline facilitate data deduplication, including by providing module chaining, module selection, main memory asynchronous processing, and / or parallel processing safe and efficient module. Typically, the various mechanisms required for deduplication (e.g., file selection, chunking, deduplication detection, compression, and commit of chunks) are each modularized in a pipeline that has a replacement for each of the individual modules, in which The ability to make selections and / or extend them.

[0025] In one aspect, the pipeline scans files using a two-stage log-based algorithm and selects files for optimization based on attributes by sorting, ranking, and / or grouping based on statistical analysis and feedback. Selected files can be processed asynchronously, in batches, and / or in parallel for data deduplication. Furthermore, t...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention describes an extensible pipeline for data deduplication. The subject disclosure relates to data deduplication (optimization) performed by phases/modules of a modular data deduplication pipeline. At each phase, the pipeline allows modules to be replaced, selected or extended, e.g., different algorithms can be used for chunking or compression based upon the type of data being processed. The pipeline facilitates secure data processing, batch processing, and parallel processing. The pipeline is tunable based upon feedback, e.g., by selecting modules to increase deduplication quality, performance and/or throughput. Also described is selecting, filtering, ranking, sorting and/or grouping the files to deduplicate, e.g., based upon properties and/or statistical properties of the files and/or a file dataset and/or internal or external feedback.

Description

technical field [0001] The present invention relates to scalable pipelines for data deduplication. Background technique [0002] Data deduplication (sometimes also referred to as data optimization) refers to the detection, unique identification and elimination of redundant data in the storage system to reduce the physical bytes of data that need to be stored on the disk or need to be transmitted over the network, without compromising the fidelity and integrity of the original data. Data deduplication thus results in savings in hardware and power costs (for storage) as well as data management costs (eg, reduced backup costs) by reducing the resources required to store and / or transmit data. These cost savings become important as the amount of digitally stored data grows. [0003] Data deduplication typically uses a combination of techniques for eliminating redundancy within and between persistently stored files. One such technique is to identify the same region of data in o...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30
CPCG06F17/3007G06F17/30091G06F16/11G06F16/13
Inventor P·A·奥尔泰安R·卡拉赫A·M·埃尔-施密J·R·本顿
Owner MICROSOFT TECH LICENSING LLC
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products