Content locality-based caching in a data storage system

a data storage system and content technology, applied in the field of data caching techniques, can solve the problems of slow write and erase times, physical properties of ssds, and significantly limited number of write/erase cycles, so as to achieve high-speed random reads, reduce costs, and increase capacity. the effect of capacity

Inactive Publication Date: 2012-05-31
VELOBIT
View PDF3 Cites 130 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Benefits of technology

[0009]Recent developments of flash memory-based solid-state drives (SSDs) have been very promising, with rapid increases in capacity and decreases in cost. Because SSDs are based on semi-conductor technology, they may provide great advantages including enabling high-speed random reads, low power consumption, compact size, shock resistance, and the like. However, limitations that result from the physical properties of SSDs include write and erase times that are very slow compared to read times, and a significantly limited number of write / erase cycles before failure of a block.
[0010]The above physical properties and operational characteristics of SSDs present unique challenges in designing a SSD-based mass storage hierarchy. Therefore, we disclose herein cache management methods and systems that can address these unique challenges by effectively managing an SSD-based storage hierarchy to provide higher I / O performance, lower cost, longer durability, and higher data reliability.
[0011]Also, while the capacity of disk drives grows rapidly, their electromechanical parts have held down the improvement of their performance. Caching plays a critical role in modern systems, the cache bridging the gap between a disk drive and the main memory. Applying content locality techniques to cache design may provide a significant improvement in performance, particularly when combined with the techniques for SSD use and optimization described herein. Content locality refers to the characteristic that many data blocks in a data storage system may share similar or even the same content.

Problems solved by technology

However, limitations that result from the physical properties of SSDs include write and erase times that are very slow compared to read times, and a significantly limited number of write / erase cycles before failure of a block.
The above physical properties and operational characteristics of SSDs present unique challenges in designing a SSD-based mass storage hierarchy.
Also, while the capacity of disk drives grows rapidly, their electromechanical parts have held down the improvement of their performance.
This may result in much more data redundancy than would be present when running a single OS on a data center server.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Content locality-based caching in a data storage system
  • Content locality-based caching in a data storage system
  • Content locality-based caching in a data storage system

Examples

Experimental program
Comparison scheme
Effect test

first embodiment

[0103]the inventive methods and systems described herein may be embedded inside a disk controller. Such an embodiment may include a disk controller board that is adapted to include NAND-gate flash SSD or similar device, a GPU / CPU, and a DRAM buffer in addition to the existing disk control hardware and interfaces such as the host bus adapter (HBA). FIG. 8 depicts a block diagram of an HDD controller-embedded embodiment. A host system 802 may be connected to a disk controller 820 using a standard interface 812. Such an interface can be SCSI, SATA, SAS, PATA, iSCSI, FC, or the like. The flash memory 804 may be an SSD, such as to store reference blocks, compact delta blocks, hot independent blocks, and similar data. The intelligent processing unit 810 performs logical operations such as delta derivation, similarity detection, combining delta with reference blocks, managing reference blocks, managing meta data, and other operations described herein or known for maximizing SSD-based cachi...

third embodiment

[0105]A third embodiment is implemented at the HBA level but includes no onboard flash memory. An external SSD drive such as PCIe SSD, SAS SSD, SATA SSD, SCSI SSD, or other SSD drive may be used similarly to the SSD in the embodiment of FIG. 9. FIG. 10 depicts a block diagram describing this implementation. The HBA 1020 has an intelligent processing unit 1008 and a DRAM buffer 1004 in addition to the existing HBA control logic and interfaces. The host system 1002 may be connected to the system bus 1014, such as PCI, PCI-Express, PCI-X, HyperTransport, or InfiniBand. The bus interface 1010 allows the HBA card 1020 to be connected to the system bus 1014. The intelligent processing unit 1008 performs processing functions such as delta derivation, similarity detection, combining delta with reference blocks, managing reference blocks, executing cache algorithms that are described herein, managing meta data, and the like. The RAM cache 1004 temporarily stores deltas for active I / O operati...

fourth embodiment

[0106]While the above implementations can provide great performance improvements, all require redesigns of hardware such as a disk controller or an HBA card. A fourth implementation includes a software approach using commodity off-the-shelf hardware. A software application at the device driver level controls a separate SSD drive / card, a GPU / CPU embedded controller card, and an HDD connected to a system bus. FIG. 11 depicts a block diagram describing this software implementation. This implementation leverages standard off-the-shelf hardware such as an SSD drive 1114, an HDD 1118, and an embedded controller / GPU / CPU / MCU card 1120. All these standard hardware components may be connected to a standard system bus 1122, such as PCI, PCI-Express, PCI-X, HyperTransport, InfiniBand, and the like. The software for this fourth implementation may be divided into two parts: one running on a host computer system 1102 and another running on an embedded system 1120. One possible partition of softwar...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

A data storage caching architecture supports using native local memory such as host-based RAM, and if available, Solid State Disk (SSD) memory for storing pre-cache delta-compression based delta, reference, and independent data by exploiting content locality, temporal locality, and spatial locality of data accesses to primary (e.g. disk-based) storage. The architecture makes excellent use of the physical properties of the different types of memory available (fast r / w RAM, low cost fast read SSD, etc) by applying algorithms to determine what types of data to store in each type of memory. Algorithms include similarity detection, delta compression, least popularly used cache management, conservative insertion and promotion cache replacement, and the like.

Description

CROSS-REFERENCE TO RELATED APPLICATIONS[0001]This application is a continuation of U.S. patent application U.S. Ser. No. 13 / 366,846 entitled Pre-Cache Similarity-Based Delta Compression for use in a Data Storage System, filed Feb. 6, 2012, which is hereby incorporated herein by reference in its entirety.[0002]U.S. patent application U.S. Ser. No. 13 / 366,846 claims the benefit of the following provisional applications, each of which is hereby incorporated herein by reference in its entirety: U.S. Ser. No. 61 / 441,976 entitled Intelligently Coupled Array of SSD and HDD, filed Feb. 11, 2011; U.S. Ser. No. 61 / 447,208 entitled Effective Page Classification and Delta Encoding AND Buffer Cache—Temporal and Content Localities, filed Feb. 28, 2011; and U.S. Ser. No. 61 / 497,549 entitled Conservative Insertion and Promotion Cache Replacement Algorithm, filed Jun. 16, 2011.[0003]U.S. patent application Ser. No. 13 / 366,846 is a continuation-in-part of U.S. patent application Ser. No. 12 / 762,993 e...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(United States)
IPC IPC(8): G06F12/08G06F12/02
CPCG06F3/0611G06F3/0613G06F3/0616G06F3/0659G06F3/068Y02B60/1225G06F12/0897G06F2212/214G06F2212/311G06F2212/312G06F2212/401G06F12/0866Y02D10/00
Inventor YANG, QINGREN, JIN
Owner VELOBIT
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products