Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Method of reducing redundancy between two or more datasets

a dataset and redundancy reduction technology, applied in the field of system and method for storing data, can solve the problems of limiting the number of backups the user maintains, consuming a great deal of disk space, and modern disk drives also have undesirable properties, so as to achieve lower deduplication efficiency, high deduplication efficiency, and high deduplication performance

Inactive Publication Date: 2013-01-10
CHRYSALIS STORAGE
View PDF6 Cites 40 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Benefits of technology

The invention is a method that reduces the amount of memory needed to remove wholesale duplicate data from large sets of files. This method is much more efficient than existing techniques, allowing for high deduplication ratios with small data blocks. This results in a much more efficient use of memory, making it faster and easier to de-duplicate large amounts of data.

Problems solved by technology

Maintaining multiple versions of such datasets consumes a great deal of disk space, so that only a relatively few, if any, versions are, generally, kept.
Users of this Method will find this method particularly efficacious when used with Repositories with the hardware property of “read many, write many” or “read many, write few” or “read many, write once”.
Modern disk drives also have the undesirable property that random seeks take milliseconds of time.
Modern solid-state memory has the undesirable properties of being far more expensive per byte than disk-based memory as well as (with some types of solid-state memory) limiting the number of rewrites before the device fails to be rewriteable.
Because users of computers accidentally delete files or their computers become infected by computer viruses, users often wish to retain multiple backups over time.
Thus, an unsophisticated backup scheme of uncompressed 250 gigabytes (125 gigabytes compressed) would cost the user about $12 for each backup, quickly limiting the number of backups the user maintains.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method of reducing redundancy between two or more datasets
  • Method of reducing redundancy between two or more datasets

Examples

Experimental program
Comparison scheme
Effect test

example cases

XXXVI. Example Cases

XXXVI. (1) Worst Case

[0548]As those familiar with the art of compression know, no compression technique is guaranteed to produce compression. Indeed, under worst-case conditions, every compression technique is guaranteed to produce output that is larger than its inputs.

[0549]Our Method is no exception and it is useful to understand this Method's limitations.

[0550]The amount of compression that one is likely to get from our Method is, on average, dependent on the size of the DDS; the larger the DDS the more likely it is that our Method will be able to detect and eliminate common data redundancy.

[0551]Consider a DDS with zero entries. The Method will proceed, roughly, as follows:

[0552]A pointless “Reference File Analysis Phase” would be done to build a non-existent DDS. As usual, the Reference File would be physically and / or logically copied to the Extended Reference File while digests are inserted into the DDS. Since the DDS has, by our example, no entries then th...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

A method for reducing redundancy between two or more datasets of potentially very large size. The method improves upon current technology by oversubscribing the data structure that represents a digest of data blocks and using positional information about matching data so that very large datasets can be analyzed and the redundancies removed by, having found a match on digest, expands the match in both directions in order to detect and eliminate large runs of data by replace duplicate runs with references to common data. The method is particularly useful for capturing the states of images of a hard disk. The method permits several files to have their redundancy removed and the files to later be reconstituted. The method is appropriate for use on a WORM device. The method can also make use of L2 cache to improve performance.

Description

CROSS-REFERENCE TO RELATED APPLICATIONS[0001]This application is a continuation of U.S. patent application Ser. No. 12 / 455,864, filed Jun. 8, 2009, which claims the benefit of U.S. Provisional Patent Application No. 61 / 059,276, each of which is hereby incorporated herein by reference in its entirety.FIELD OF THE INVENTION[0002]This invention relates generally to a system and method for storing data. More particularly, this invention relates to a form of data size reduction which involves finding redundancies in and between large data sets (files) and eliminating these redundancies in order to conserve repository memory (generally, disk space).BACKGROUND OF THE INVENTION[0003]This invention relates generally to a system and method for storing data. More particularly, this invention relates to storing data efficiently in both the time and space domains by removing redundant data between two or more data sets.[0004]The inventors of this invention noticed that there are many times when ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(United States)
IPC IPC(8): G06F17/30
CPCG06F17/3015G06F16/174G06F16/2365G06F16/217G06F16/2379
Inventor HELLER, STEVESHNELVAR, RALPH
Owner CHRYSALIS STORAGE
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products