Synchronized data duplication

a data duplication and data technology, applied in the field of data deduplication, can solve the problems of weighing 14 ton, not being able to meet the requirements of bandwidth, and being about as large as punch cards, so as to reduce bandwidth requirements

Inactive Publication Date: 2015-06-04
COMMVAULT SYST INC
View PDF15 Cites 67 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Benefits of technology

[0017]According to various embodiments, systems and methods are provided for data deduplication. Particularly, in some embodiments, techniques for performing reference table distribution and synchronization are provided. Accordingly, a reference table generated as a result of the deduplication process at a storage repository can be shared among a plurality of client systems that utilize a repository for data storage. This can be implemented to allow the client systems to perform local data deduplication before their data is sent to the repository. Likewise, this can also allow the client systems to receive deduplicated data from the storage repository. Accordingly, systems and methods can be implemented to allow deduplicated data to be transferred among a plurality of computing systems thereby reducing bandwidth requirements for data storage and retrieval operations.
[0018]In some embodiments, rather than distribute the entire reference table to each client for synchronization, a proper subset of reference table entries can be identified and shared with the client devices for synchronization. This can be implemented so as to reduce the amount of bandwidth required to synchronize the reference table among the computing systems. In further embodiments, the subset can be identified based on data utilization criteria.

Problems solved by technology

However, punch cards were not the only storage mechanism available in the mid-20th century.
However, these were large and costly systems and although punch cards were inconvenient, their lower cost contributed to their longevity as a viable alternative.
The IBM 3380, however, was about as large as a refrigerator, weighed ¼ ton, and cost in the range of approximately $97,000 to $142,000, depending on the features selected.
The large volumes of data often stored and shared by networked devices can cause overloading of the limited network bandwidth.
In addition, even with large capacity storage systems, computing enterprises are being overloaded by vast amounts of data.
IT administrators are struggling to keep up with the seemingly exponential increase in the volume of documents, media and other data.
This problem is severely compounded by other factors such as the large file sizes often associated with multi-media files, and file proliferation through email and other content sharing mechanisms.
However, additional storage capacity requires capital expenditures, consumes power, takes up floor space and burdens administrative overhead.
Even with additional storage capacity, the sheer volume of data becomes a strain on backup and data recovery plans, leading to greater risk in data integrity.
However, deduplication at the file level can suffer in efficiencies as compared to deduplication using smaller segment sizes because even a small change in the file generally requires that an entire copy of the file be re-stored.
When such a false-positive occurs, the system can mistake new data for already-stored data and fail to store the new segment.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Synchronized data duplication
  • Synchronized data duplication
  • Synchronized data duplication

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0017]According to various embodiments, systems and methods are provided for data deduplication. Particularly, in some embodiments, techniques for performing reference table distribution and synchronization are provided. Accordingly, a reference table generated as a result of the deduplication process at a storage repository can be shared among a plurality of client systems that utilize a repository for data storage. This can be implemented to allow the client systems to perform local data deduplication before their data is sent to the repository. Likewise, this can also allow the client systems to receive deduplicated data from the storage repository. Accordingly, systems and methods can be implemented to allow deduplicated data to be transferred among a plurality of computing systems thereby reducing bandwidth requirements for data storage and retrieval operations.

[0018]In some embodiments, rather than distribute the entire reference table to each client for synchronization, a pro...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

A system and method for data deduplication is presented. Data received from one or more computing systems is deduplicated, and the results of the deduplication process stored in a reference table. A representative subset of the reference table is shared among a plurality of systems that utilize the data deduplication repository. This representative subset of the reference table can be used by the computing systems to deduplicate data locally before it is sent to the repository for storage. Likewise, it can be used to allow deduplicated data to be returned from the repository to the computing systems. In some cases, the representative subset can be a proper subset wherein a portion of the referenced table is identified shared among the computing systems to reduce bandwidth requirements for reference-table synchronization.

Description

CROSS-REFERENCE TO RELATED APPLICATIONS[0001]Any and all priority claims identified in the Application Data Sheet, or any correction thereto, are hereby incorporated by reference under 37 CFR 1.57.BACKGROUND[0002]1. Technical Field[0003]The present invention generally relates to data deduplication, and more particularly, some embodiments relate to systems and methods for facilitating shared deduplication information.[0004]2. Description of the Related Art[0005]The storage and retrieval of data is an age-old art that has evolved as methods for processing and using data have evolved. In the early 18th century, Basile Bouchon is purported to have used a perforated paper loop to store patterns used for printing cloth. In the mechanical arts, similar technology in the form of punch cards and punch tape were also used in the 18th century in textile mills to control mechanized looms. Two centuries later, early computers also used punch cards and paper punch tape to store data and to input ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(United States)
IPC IPC(8): G06F17/30H04L29/08
CPCH04L67/1097G06F17/30156G06F16/1756G06F16/1748G06F16/178G06F16/182G06F16/273
Inventor NGO, DAVIDMULLER, MARCUS S.
Owner COMMVAULT SYST INC
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products