Method of reducing redundancy between two or more datasets

What is AI technical title?
AI technical title is built by Patsnap AI team. It summarizes the technical point description of the patent document.
a dataset and redundancy reduction technology, applied in the field of system and method for storing data, can solve the problems of limiting the number of backups the user maintains, consuming a great deal of disk space, and modern disk drives also have undesirable properties, so as to achieve lower deduplication efficiency, high deduplication efficiency, and high deduplication performance

Inactive Publication Date: 2013-01-10

CHRYSALIS STORAGE

View PDF6 Cites 40 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Benefits of technology

The invention is a method that reduces the amount of memory needed to remove wholesale duplicate data from large sets of files. This method is much more efficient than existing techniques, allowing for high deduplication ratios with small data blocks. This results in a much more efficient use of memory, making it faster and easier to de-duplicate large amounts of data.

Problems solved by technology

Maintaining multiple versions of such datasets consumes a great deal of disk space, so that only a relatively few, if any, versions are, generally, kept.

Users of this Method will find this method particularly efficacious when used with Repositories with the hardware property of “read many, write many” or “read many, write few” or “read many, write once”.

Modern disk drives also have the undesirable property that random seeks take milliseconds of time.

Modern solid-state memory has the undesirable properties of being far more expensive per byte than disk-based memory as well as (with some types of solid-state memory) limiting the number of rewrites before the device fails to be rewriteable.

Because users of computers accidentally delete files or their computers become infected by computer viruses, users often wish to retain multiple backups over time.

Thus, an unsophisticated backup scheme of uncompressed 250 gigabytes (125 gigabytes compressed) would cost the user about $12 for each backup, quickly limiting the number of backups the user maintains.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

example cases

XXXVI. Example Cases

XXXVI. (1) Worst Case

[0548]As those familiar with the art of compression know, no compression technique is guaranteed to produce compression. Indeed, under worst-case conditions, every compression technique is guaranteed to produce output that is larger than its inputs.

[0549]Our Method is no exception and it is useful to understand this Method's limitations.

[0550]The amount of compression that one is likely to get from our Method is, on average, dependent on the size of the DDS; the larger the DDS the more likely it is that our Method will be able to detect and eliminate common data redundancy.

[0551]Consider a DDS with zero entries. The Method will proceed, roughly, as follows:

[0552]A pointless “Reference File Analysis Phase” would be done to build a non-existent DDS. As usual, the Reference File would be physically and / or logically copied to the Extended Reference File while digests are inserted into the DDS. Since the DDS has, by our example, no entries then th...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

A method for reducing redundancy between two or more datasets of potentially very large size. The method improves upon current technology by oversubscribing the data structure that represents a digest of data blocks and using positional information about matching data so that very large datasets can be analyzed and the redundancies removed by, having found a match on digest, expands the match in both directions in order to detect and eliminate large runs of data by replace duplicate runs with references to common data. The method is particularly useful for capturing the states of images of a hard disk. The method permits several files to have their redundancy removed and the files to later be reconstituted. The method is appropriate for use on a WORM device. The method can also make use of L2 cache to improve performance.

Description

CROSS-REFERENCE TO RELATED APPLICATIONS[0001]This application is a continuation of U.S. patent application Ser. No. 12 / 455,864, filed Jun. 8, 2009, which claims the benefit of U.S. Provisional Patent Application No. 61 / 059,276, each of which is hereby incorporated herein by reference in its entirety.FIELD OF THE INVENTION[0002]This invention relates generally to a system and method for storing data. More particularly, this invention relates to a form of data size reduction which involves finding redundancies in and between large data sets (files) and eliminating these redundancies in order to conserve repository memory (generally, disk space).BACKGROUND OF THE INVENTION[0003]This invention relates generally to a system and method for storing data. More particularly, this invention relates to storing data efficiently in both the time and space domains by removing redundant data between two or more data sets.[0004]The inventors of this invention noticed that there are many times when ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & AuthorityApplications(United States)

IPC IPC(8): G06F17/30

CPCG06F17/3015G06F16/174G06F16/2365G06F16/217G06F16/2379

InventorHELLER, STEVESHNELVAR, RALPH

OwnerCHRYSALIS STORAGE

Method of reducing redundancy between two or more datasets

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Benefits of technology

Problems solved by technology

Method used

Image

Examples

example cases

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology