Storage system and data processing method

What is AI technical title?
AI technical title is built by Patsnap AI team. It summarizes the technical point description of the patent document.
The storage system optimizes throughput by compressing data in a snapshot virtual device and managing address mappings efficiently, addressing the increased load on storage controllers in the Redirect on Write method.

JP7875846B2Active Publication Date: 2026-06-18HITACHI VANTARA LTD

View PDF 5 Cites 0 Cited by

Patent Information

Authority / Receiving Office: JP · JP
Patent Type: Patents
Current Assignee / Owner: HITACHI VANTARA LTD
Filing Date: 2023-11-28
Publication Date: 2026-06-18

Application Information

Patent Timeline

28 Nov 2023

Application

18 Jun 2026

Publication

JP7875846B2

IPC: G06F3/06; G06F3/08; G06F11/1446

CPC: G06F3/065; G06F3/0665; G06F3/0652; G06F3/0613; G06F16/1744; G06F16/128; G06F16/188

AI Tagging

Application Domain

Input/output to record carriers Special data processing applications

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

Smart Images

Figure 0007875846000001
Figure 0007875846000002
Figure 0007875846000003

Patent Text Reader

Abstract

To achieve high throughput by efficiently utilizing resources of a virtual device.SOLUTION: A processor uses a snapshot virtual device as a storage destination for data of primary volume and snapshot volume, compresses and stores the data stored in the snapshot virtual device in a compression virtual device, and stores the data stored in the compression virtual device in a storage device. In the case of receiving a write request from a host, the processor switches and performs overwriting processing for overwriting an area on an allocated snapshot virtual device in data of a large size in accordance with the size of the address range of a write destination and new allocation processing for allocating a new area on the snapshot virtual device in the address range of the write destination in data of a small size, compresses a plurality of data of a small size stored in the new area, and collectively stores the data in the compression virtual device.SELECTED DRAWING: Figure 30

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] The present invention relates to a storage system and a data processing method.

Background Art

[0002] In recent years, the need for data utilization has been increasing, and the opportunities for data replication have been increasing. Along with this, in a storage system, the Snapshot function has become increasingly important. Conventionally, there is a Redirect on Write (RoW) method as a typical means of realizing Snapshot. The RoW method has an advantage that since there is no copy of data or meta information, the influence on I / O performance at the time of creating a Snapshot is small. The RoW method is widely adopted in AFA (All Flash Array) devices. The RoW method is a method of appending data. Appending data means that when writing data to a storage system, without overwriting the data stored before writing, storing the write target data in a new area, and rewriting the meta information so as to refer to the data stored in the new area.

[0003] In such data management to which deduplication technology and snapshot technology are applied, when the address in the virtual device of data is changed due to garbage collection or other reasons, if the address is the reference destination of a plurality of different addresses in a plurality of snapshot families, it is necessary to change the reference destination address for each of the plurality of addresses. For this reason, it takes a long time to change the address mapping, and as a result, the time for the entire process involving the change of the address mapping becomes long.

[0004] In Patent Document No. 1, a method is presented for reducing the number of references to duplicate data and improving the efficiency of garbage collection processing by allocating data shared by a plurality of VOLs by deduplication and snapshot to a virtual space different from the storage destination of the single data referred to only from one VOL.

Prior Art Documents

[0005] [Patent Document 1] U.S. Patent No. 10817209 [Overview of the Initiative] [Problems that the invention aims to solve]

[0006] However, when adding a virtual device space as described in Patent Document 1, there is a problem that the load on the storage controller increases and throughput decreases because the amount of mapping information referenced and updated during read / write I / O processing increases. Therefore, the objective of this invention is to achieve both garbage collection performance and high throughput by efficiently utilizing the resources of virtual devices. [Means for solving the problem]

[0007] To achieve the above objective, one representative storage system of the present invention comprises a storage device and a processor that accesses the storage device, wherein the processor manages a primary volume that is subject to read / write operations by the host and a snapshot volume generated from the primary volume as a snapshot family, the processor uses a snapshot virtual device, which is a logical address space associated with the snapshot family, as a storage location for the data of the primary volume and the snapshot volume, compresses the data stored in the snapshot virtual device and stores it in a compressed virtual device, stores the data stored in the compressed virtual device in the storage device, and when the processor receives a write request from the host, it switches between an overwrite process that overwrites an area on the snapshot virtual device already allocated for large data, and a new allocation process that allocates a new area on the snapshot virtual device to the address range of the write destination for small data, depending on the size of the address range of the write destination, and compresses a plurality of small data stored in the new area and stores them together in the compressed virtual device. Furthermore, one representative data processing method of the present invention is a data processing method for a storage system comprising a storage device and a processor that accesses the storage device, wherein the processor manages a primary volume that is subject to read / write operations by a host and a snapshot volume generated from the primary volume as a snapshot family, the processor uses a snapshot virtual device, which is a logical address space associated with the snapshot family, as a storage location for the data of the primary volume and the snapshot volume, the processor compresses the data stored in the snapshot virtual device and stores it in a compressed virtual device, and the processor stores the data stored in the compressed virtual device in the storage device, wherein when the processor receives a write request from the host, it switches between an overwrite process that overwrites an area on the snapshot virtual device already allocated for large data, and a new allocation process that allocates a new area on the snapshot virtual device to the address range of the write destination for small data, and compresses a plurality of small data stored in the new area and stores them together in the compressed virtual device. [Effects of the Invention]

[0008] According to the present invention, high throughput can be achieved by efficiently utilizing the resources of virtual devices. Other issues, configurations, and effects will be clarified by the following description of embodiments. [Brief explanation of the drawing]

[0009] [Figure 1] This is a block diagram showing the hardware configuration of the storage system according to the present invention. [Figure 2] This is a schematic diagram illustrating the memory control of the storage system in the present invention. [Figure 3] This is an overview diagram of the management of the mapping between addresses in SS-Family and addresses in SS-VDEV. [Figure 4] This is an overview diagram of the management of mapping between addresses in SS-VDEV and addresses in Dedup-VDEV, the management of mapping between addresses in SS-VDEV and addresses in CR-VDEV, and the management of mapping between addresses in Dedup-VDEV and addresses in CR-VDEV. [Figure 5] This is an explanatory diagram showing the memory configuration of the storage controller in the present invention. [Figure 6] This is an explanatory diagram showing the configuration of the control information area in the memory of the storage controller according to the present invention. [Figure 7] This is an explanatory diagram showing the configuration of the program area in the memory of the storage controller according to the present invention. [Figure 8] This diagram shows the structure of the owner rights management table. [Figure 9] This diagram shows the configuration of the CR-VDEV management table. [Figure 10] This diagram shows the configuration of the snapshot management table. [Figure 11] This diagram shows the structure of the VOL-Dir management table. [Figure 12] This diagram shows the configuration of the latest generation table. [Figure 13] This diagram shows the structure of the collection management table. [Figure 14] This diagram shows the structure of the generation management tree table. [Figure 15] This diagram shows the configuration of the snapshot allocation management table. [Figure 16] This diagram shows the structure of the Dir management table. [Figure 17] This diagram shows the configuration of the SS-Mapping management table. [Figure 18] This diagram shows the structure of the compression allocation management table. [Figure 19] This diagram shows the structure of the CR-Mapping management table. [Figure 20]It is a diagram showing the configuration of the Dedup-Dir management table. [Figure 21] It is a diagram showing the configuration of the Dedup allocation management table. [Figure 22] It is a diagram showing the configuration of the Pool-Mapping management table. [Figure 23] It is a diagram showing the configuration of the Pool allocation management table. [Figure 24] It is a diagram showing the flow of the snapshot acquisition process. [Figure 25] It is a diagram showing the flow of the snapshot restore process. [Figure 26] It is a diagram showing the flow of the snapshot deletion process. [Figure 27] It is a diagram showing the flow of the asynchronous recovery process. [Figure 28] It is a diagram showing the flow of the Write process (front end). [Figure 29] It is a diagram showing the flow of the Write process (back end). [Figure 30] It is a diagram showing the flow of the snapshot allocation determination process. [Figure 31] It is a diagram showing the flow of the snapshot append process. [Figure 32] It is a diagram showing the flow of the Dedup append process. [Figure 33] It is a diagram showing the flow of the compression append process. [Figure 34] It is a diagram showing the flow of the destage process. [Figure 35] It is a diagram showing the flow of the Read process. [Figure 36] It is a diagram showing the flow of the GC (Garbage Collection) process.

Mode for Carrying Out the Invention

[0010] Hereinafter, embodiments will be described with reference to the drawings.

Embodiment

[0011] Figure 1 shows the hardware configuration of the computer system. The computer system 100 includes a storage system 201, a server system 202, and a management system 203. The storage system 201 and the server system 202 are connected via a storage network 204 using FC (Fiber Channel) or the like. The storage system 201 and the management system 203 are connected via a management network 205 using IP (Internet Protocol) or the like. Note that the storage network 204 and the management network 205 may be the same communication network.

[0012] The storage system 201 comprises multiple storage controllers 210 and multiple SSDs 220. Multiple SSDs 220 are connected to the storage controllers 210. Multiple SSDs 220 are an example of persistent storage. Pool 13 is configured based on multiple SSDs 220. Data stored in page 14 of pool 13 is stored in one or more SSDs 220.

[0013] The storage controller 210 includes a CPU 211, memory 212, a backend interface 213, a frontend interface 214, and a management interface 215.

[0014] CPU 211 executes the program stored in memory 212. Memory 212 stores programs executed by CPU 211, as well as data used by CPU 211. Memory 212 and CPU 211 may be redundant.

[0015] The backend interface 213, frontend interface 214, and management interface 215 are examples of interface devices. The backend interface 213 is a communication interface device that mediates data exchange between the SSD 220 and the storage controller 210. Multiple SSDs 220 are connected to the backend interface 213. The front-end interface 214 is a communication interface device that mediates data exchange between the server system 202 and the storage controller 210. The server system 202 is connected to the front-end interface 214 via the storage network 204. The management interface 215 is a communication interface device that mediates data exchange between the management system 203 and the storage controller 210. The management system 203 is connected to the management interface 215 via the management network 205.

[0016] The server system 202 consists of one or more host devices. The server system 202 sends an I / O request (write request or read request) to the storage controller 210, specifying the I / O destination. The I / O destination is, for example, a logical volume number such as a LUN (Logical Unit Number), or a logical address such as an LBA (Logical Block Address). The management system 203 comprises one or more management devices. The management system 203 manages the storage system 201.

[0017] Figure 2 shows an overview of the storage system's memory control. In Figure 2, data represented by uppercase letters (data A, B, C, ...) is block data, and data represented by lowercase letters (data a, b, c, ...) is subblock data. Block data can be data in units of blocks. A block can be a fixed-length logical memory area (logical address range). Subblock data is compressed data of block data, and a group of subblocks (one or more subblocks) is the data stored there. A subblock can be a logical memory area smaller than a block. For example, a block can be an integer multiple of the size of a subblock.

[0018] A storage system having a storage device and a processor includes an SS-Family (snapshot family) 9, an SS-VDEV (snapshot virtual device) 11S, a Dedup-VDEV (deduplication virtual device) 11D, a CR-VDEV (compressed append virtual device) 11C, and a pool 13.

[0019] SS-Family9 is a VOL group that includes PVOL10P and SVOL10S, which is a snapshot of PVOL10P. SS-VDEV11S is a virtual device as a logical address space, and is designated as the storage location for data where any VOL10 in SS-Family9 is located. Dedup-VDEV11D is a virtual device with a separate logical address space from SS-VDEV11S, and is used as a storage location for duplicate data from two or more SS-VDEV11S devices.

[0020] CR-VDEV11C is a virtual device with a separate logical address space from SS-VDEV11S and Dedup-VDEV11D, and is intended as a storage location for compressed data. Each of the multiple CR-VDEV11Cs is associated with either SS-VDEV11S or Dedup-VDEV11D, and cannot be associated with both VDEV11S and 11D. In other words, each CR-VDEV11C becomes the storage location for data where the VDEV (virtual device) corresponding to that CR-VDEV11C is stored, and does not become the storage location for data where the VDEV not corresponding to that CR-VDEV11C is stored. Compressed data where the CR-VDEV11C is stored will be stored in pool 13.

[0021] Pool 13 is a logical address space based on at least a portion of the storage devices (e.g., persistent storage) of the storage system. Pool 13 may be based on at least a portion of the external storage devices (e.g., persistent storage) of the storage system, in addition to or instead of at least a portion of the storage devices of the storage system. Pool 13 has multiple pages 14, which are multiple logical areas. Compressed data to be stored in CR-VDEV11C is stored in pages 14 in Pool 13. The mapping between addresses in CR-VDEV11C and addresses in Pool 13 is 1:1. Pool 13 is composed of one or more pool VOLs.

[0022] According to the example shown in Figure 2, the following memory control is performed. The processor creates SVOL10S0 as a snapshot of PVOL10P0, thereby creating SS-Family9-0 with PVOL10P0 as the root volume. The processor also creates SVOL10S1 as a snapshot of PVOL10P1, thereby creating SS-Family9-1 with PVOL10P1 as the root volume. Figure 2 shows that SS-Family9-0 and 9-1 are examples of multiple SS-Families.

[0023] The storage system has one or more SS-VDEV11S for each of the multiple SS-Family9s. For each SS-Family9, the data whose destination is any VOL10 in that SS-Family9 is stored in the SS-VDEV11S corresponding to that SS-Family9 from among the multiple SS-VDEV11S. Taking SS-Family9-0 as an example, specifically, it is as follows: The processor uses SS-VDEV11S0 as the storage location for data A, which is stored in SVOL10S0 of SS-Family9-0. The processor maps the address in SVOL10S0 corresponding to data A to the address in SS-VDEV11S0 corresponding to data A, which is associated with SS-Family9-0. If the same data B exists in multiple VOLs (PVOL10P0 and SVOL10S0) of SS-Family9-0, the processor maps multiple addresses of the same data B in those multiple VOLs (addresses in PVOL10P0 and SVOL10S0) to the address of SS-VDEV11S0 of SS-Family9-0 (addresses corresponding to data B).

[0024] For each of SS-Family9-0 and 9-1 (an example of two or more SS-Family9s), the storage location for non-duplicate data is the CR-VDEV11C corresponding to that SS-Family9, and the storage location for duplicate data is Dedup-VDEV11D. In other words, since data C is duplicated in SS-VDEV11S0 and 11S1 of SS-Family9-0 and 9-1 (an example of two or more SS-VDEV11S), the processor maps the two addresses of the duplicate data C in SS-VDEV11S0 and 11S1 to the addresses in Dedup-VDEV11D that correspond to the duplicate data C. The processor then compresses the duplicate data C and sets the storage location of the compressed data c to CR-VDEV11CC, which corresponds to Dedup-VDEV11D. In other words, the processor maps the address (block address) of the duplicate data C in Dedup-VDEV11D to the address (subblock address) of the compressed data c in CR-VDEV11CC. The processor also allocates page 14B to CR-VDEV11CC and stores the compressed data c in page 14B. The address of compressed data c in CR-VDEV11CC is mapped to the address in page 14B of pool 13.

[0025] On the other hand, for SS-VDEV11S0, data A is non-duplicate with data in other SS-VDEV11S1 units, so the processor compresses the non-duplicate data A and assigns the compressed data a to CR-VDEV11C0, which corresponds to SS-VDEV11S0. That is, the processor maps the address (block address) of the non-duplicate data A in SS-VDEV11S0 to the address (subblock address) of the compressed data a in CR-VDEV11CC. The processor also allocates page 14A to CR-VDEV11C0 and stores the compressed data a in page 14A. The address of the compressed data a in CR-VDEV11C0 is mapped to the address in page 14A of pool 13.

[0026] The CR-VDEV11C is a write-once VDEV. Therefore, the processor updates the address mapping whether the CR-VDEV11C corresponding to SS-VDEV11S is used as the storage location for the update data, or whether the CR-VDEV11C corresponding to Dedup-VDEV11D is used as the storage location for the update data. Specifically, the processor performs the following memory control, for example. If the data to be stored in CR-VDEV11C0 is an updated version of compressed data a, the processor sets the storage location for the updated data a' to an available address in CR-VDEV11C0 and invalidates the address of the compressed data a before the update. The processor maps the address that was mapped to the address of compressed data a in SS-VDEV11S0 to the storage location address of the updated data a' in CR-VDEV11C0, replacing the address of compressed data a in CR-VDEV11C0. The processor also maps the address that was mapped to the address of compressed data a in page 14A to the storage location address of the updated data a' in CR-VDEV11C0, replacing the address of compressed data a in CR-VDEV11C0. If the data to be stored in CR-VDEV11CC is the updated data c' of compressed data c, the processor sets the storage location for the updated data c' to an available address in CR-VDEV11CC and invalidates the address of the compressed data c before the update. The processor maps the address that was mapped to the address of compressed data c in Dedup-VDEV11D to the storage location address of the updated data c' in CR-VDEV11CC, replacing the address of compressed data c in CR-VDEV11CC. The processor also maps the address that was mapped to the address of compressed data c in page 14B to the storage location address of the updated data c' in CR-VDEV11CC, replacing the address of compressed data a in CR-VDEV11CC. Generally, the above mappings are managed in small units such as 4KB or 8KB to maximize the effect of reducing the amount of data through snapshots and deduplication.

[0027] The CR-VDEV11C is a write-once type VDEV as described above, and garbage collection is performed. In other words, the processor performs garbage collection on the CR-VDEV11C to ensure that valid addresses (addresses of the latest data) are contiguous, and that addresses in the free memory area are also contiguous.

[0028] As shown in the example in Figure 2, in addition to SS-VDEV11S, which is the data storage location for SS-Family9, Dedup-VDEV11D is provided as the storage location for duplicate data in two or more SS-Family9s. Therefore, even if the address of the duplicate data C in Dedup-VDEV11D changes, only two address mappings (mappings for each of the two addresses in SS-VDEV11S0 and 11S1) need to be changed. On the other hand, in one comparative example, the data storage location for SS-Family9 and the storage location for duplicate data in two or more SS-Family9s are the same VDEV, and in this case, four address mappings (mappings for each of the four addresses in VOL10P0, 10S0, 10P1, and 10S1) need to be changed for the duplicate data C. In this embodiment, it is expected that the address mapping can be changed in a shorter time than in the comparative example.

[0029] In addition to Dedup-VDEV11D, CR-VDEV11CC is provided for Dedup-VDEV11D. Therefore, even if the address of compressed data c in CR-VDEV11CC changes, only one address mapping (a mapping for one address in Dedup-VDEV11D) needs to be changed. On the other hand, in one comparative example, the storage location of the compressed data in SS-Family9 and the storage location of the compressed data of duplicate data in two or more SS-Family9 are the same VDEV. In this case, the address mapping to be changed for the compressed data c of the duplicate data C is four mappings (a mapping for each of the four addresses in VOL10P0, 10S0, 10P1, and 10S1). In this embodiment, it is expected that the address mapping can be changed in a shorter time than in the comparative example.

[0030] The address change of compressed data c is performed during garbage collection in the CR-VDEV11CC, which is compatible with Dedup-VDEV11D. For example, during garbage collection in the CR-VDEV11CC, the processor changes the address of the updated data c' (updated data of compressed data c) in the CR-VDEV11CC, and maps the address that was mapped to the address before the change, which was the address in Dedup-VDEV11D, to the changed address in the CR-VDEV11CC. Since it is expected that the address mapping for the compressed data of duplicate data will be changed in a short time, garbage collection is expected to be performed in a short time. Note that garbage collection in the CR-VDEV11C, which is compatible with SS-VDEV11S, includes the following processing, for example. In other words, the processor changes the address of the update data a' (the update data for compressed data a) in CR-VDEV11C0, and maps the address that was mapped to the address before the change, which is the address in SS-VDEV11S0, to the changed address in CR-VDEV11C0.

[0031] For at least one CR-VDEV11C, a write-once VDEV that stores uncompressed data may be used instead of the CR-VDEV11C; however, in this embodiment, the CR-VDEV11C is used as the write-once VDEV. Therefore, the data ultimately stored in the memory is compressed data, and thus the amount of storage capacity consumed can be reduced.

[0032] Figure 3 shows an overview of the management of address mapping between SS-Family9 and SS-VDEV11S. In the figure, "GX" (where X is a non-negative integer) represents generation X. Figure 3 uses SS-Family9-0 and SS-VDEV11S0 as examples.

[0033] The processor can manage the mapping between addresses in SS-Family9-0's VOL10 and addresses in SS-VDEV11S0 using metadata. The metadata includes Dir-Info (directory information) and SS-Mapping-Info (snapshot mapping information). The processor manages the data in PVOL10P0 and SVOL10S0 by associating Dir-Info and SS-Mapping-Info. For data stored in VOL10, Dir-Info contains information representing the source address (address in VOL10), and the corresponding SS-Mapping-Info contains information representing the destination address (address in SS-VDEV11S0).

[0034] Furthermore, the processor manages the time series of PVOL10P0 and SVOL10S0 using generation information associated with Dir-Info, and for each data stored in SS-VDEV11S0, it manages generation information indicating the generation in which the data was created, associated with SS-Mapping-Info. In addition, the processor manages the latest generation information at that time as the latest generation.

[0035] Assume that, prior to the snapshot acquisition, data A0, B0, and C0 exist with PVOL10P0 as the storage location. Also, assume that the latest generation is "0". The Dir-Info associated with PVOL10P0 has "0" as its generation number (a number representing the generation) and contains reference information indicating the destination of all data A0, B0, and C0 of PVOL10P0. Hereafter, if the generation number associated with the Dir-Info is "X", it can be stated that the Dir-Info is of generation X.

[0036] SS-VDEV11S0 is designated as the storage location for data A0, B0, and C0, and an SS-Mapping-Info is associated with each of these data points. Furthermore, each SS-Mapping-Info is associated with the generation number "0". If the generation number associated with an SS-Mapping-Info represents "X", then the data corresponding to that SS-Mapping-Info is considered to be generation X data.

[0037] Before snapshot acquisition, for each of the data A0, B0, and C0, the information in Dir-Info refers to the SS-Mapping-Info corresponding to that data. By associating Dir-Info and SS-Mapping-Info in this way, PVOL10P0 and SS-VDEV11S0 can be associated, enabling data processing for PVOL10P0.

[0038] To take a snapshot, the processor creates a copy of the Dir-Info in the read-only SVOL10S0. Then, the processor increments the generation of the Dir-Info in PVOL10P0, and also increments the latest generation. As a result, for each of the data A0, B0, and C0, the SS-Mapping-Info is referenced by both the generation 0 Dir-Info and the generation 1 Dir-Info.

[0039] In this way, snapshots can be created by duplicating Dir-Info, and snapshots can be created without increasing the data on SS-VDEV11S0 or the SS-Mapping-Info.

[0040] When a snapshot is taken, the snapshot (SVOL10S0) with write protection disabled and data fixed at the time of acquisition becomes Generation 0, and PVOL10P0, which can still be written to after acquisition, becomes Generation 1. Generation 0 is "one generation older in the direct lineage" than Generation 1 and is referred to as the "parent" for convenience. Similarly, Generation 1 is "one generation newer in the direct lineage" than Generation 0 and is referred to as the "child" for convenience. The storage system manages the parent-child relationships of generations as the Dir-Info generation management tree 70. Also, the generation # of Dir-Info is the same as the generation # of the VOL10 corresponding to that Dir-Info. Furthermore, the generation # of SS-Mapping-Info is the oldest generation # among the generation #s of one or more Dir-Infos that reference that SS-Mapping-Info.

[0041] Figure 4 shows an overview of the mapping management between addresses in SS-VDEV11S and Dedup-VDEV11D, between addresses in SS-VDEV11S and CR-VDEV11C, and between addresses in Dedup-VDEV11D and CR-VDEV11C. Figure 4 uses SS-VDEV11S0, CR-VDEV11C0, and 11CC as examples.

[0042] The processor can use metadata to manage the mapping between addresses in SS-VDEV11S0 and addresses in Dedup-VDEV11D, the mapping between addresses in SS-VDEV11S0 and addresses in CR-VDEV11C0, and the mapping between addresses in Dedup-VDEV11D and addresses in CR-VDEV11CC. The metadata includes Dir-Info and CR-Mapping-Info, as described above. The processor manages the data in SS-VDEV11S0 and Dedup-VDEV11D by associating Dir-Info and CR-Mapping-Info. For data stored in SS-VDEV11S0, Dir-Info contains information representing the source address (address in SS-VDEV11S0), and the corresponding CR-Mapping-Info contains information representing the destination address (address in CR-VDEV11C0 or address in Dedup-VDEV11D). For data stored in Dedup-VDEV11D, Dir-Info contains information representing the source address (address in Dedup-VDEV11D), and the corresponding CR-Mapping-Info contains information representing the destination address (address in CR-VDEV11CC). The processor can identify the address in SS-VDEV11S or Dedup-VDEV11D from the address in CR-VDEV11C by referring to the compressed allocation information. Although not shown in the diagram, the storage system 201 maintains a compressed allocation management table 1011 in memory 212 as reverse mapping information from addresses in CR-VDEV to addresses in SS-VDEV or Dedup-VDEV in order to move valid data on CR-VDEV and secure contiguous free space.

[0043] Figure 5 shows the configuration of memory 212. The memory 212 includes a control information unit 901 where control information (which may also be called management information) is stored, a program unit 902 where programs are stored, and a cache unit 903 where data is temporarily stored.

[0044] Figure 6 shows the information stored in the control information unit 901. The control information unit 901 stores the owner rights management table 1001, the CR-VDEV management table 1002, the snapshot management table 1003, the VOL-Dir management table 1004, the latest generation table 1005, the recovery management table 1006, the generation management tree table 1007, the snapshot allocation management table 1008, the Dir management table 1009, the SS-Mapping management table 1010, the compression allocation management table 1011, the CR-Mapping management table 1012, the Dedup-Dir management table 1013, the Dedup allocation management table 1014, the Pool-Mapping management table 1015, and the Pool allocation management table 1016.

[0045] Figure 7 shows the program stored in the program unit 902. The program unit 902 stores a snapshot acquisition program 1101, a snapshot restore program 1102, a snapshot deletion program 1103, an asynchronous recovery program 1104, a read / write program 1105, a snapshot append program 1106, a Dedup append program 1107, a compressed append program 1108, a destaging program 1109, a GC (garbage collection) program 1110, a CPU determination program 1111, an owner transfer program 1112, and a snapshot allocation determination program 1113.

[0046] Figure 8 shows the configuration of the owner rights management table 1001. The owner rights management table 1001 manages the owner rights for VOL10 or VDEV11. For example, the owner rights management table 1001 has entries for each VOL10 and each VDEV11. Each entry contains information such as VOL# / VDEV#1201 and ownerCPU#1202.

[0047] VOL# / VDEV#1201 represents the identification number of VOL10 or VDEV11. OwnerCPU#1202 represents the identification number of the CPU that is the owner CPU of VOL10 or VDEV11 (the CPU that has ownership rights to VOL10 or VDEV11). Furthermore, instead of allocating the owner CPUs on a per-CPU 211 basis, they may be allocated on a per-CPU group basis, or on a per-storage controller 210 basis.

[0048] Figure 9 shows the configuration of the CR-VDEV management table 1002. The CR-VDEV management table 1002 represents a CR-VDEV11C associated with either an SS-VDEV11S or a Dedup-VDEV11D. For example, the CR-VDEV management table 1002 has an entry for each SS-VDEV11S and each Dedup-VDEV11D. The entries contain information such as VDEV#1301 and CR-VDEV#1302. VDEV#1301 represents the identification number for SS-VDEV11S or Dedup-VDEV11D. CR-VDEV#1302 represents the identification number for CR-VDEV11C.

[0049] Figure 10 shows the configuration of the snapshot management table 1003. Snapshot management table 1003 exists for each PVOL10P (each SS-Family9). Snapshot management table 1003 represents the acquisition time of each snapshot (SVOL10S). For example, snapshot management table 1003 has an entry for each SVOL10S. The entry contains information such as PVOL#1401, SVOL#1402, and acquisition time 1403. PVOL#1401 represents the identification number of PVOL10P. SVOL#1402 represents the identification number of SVOL10S. Acquisition time 1403 represents the acquisition time of SVOL10S.

[0050] Figure 11 shows the configuration of the VOL-Dir management table 1004. The VOL-Dir management table 1004 represents the correspondence between VOL and Dir-Info. For example, the VOL-Dir management table 1004 has an entry for every VOL10. The entry contains information such as VOL#1501, Root-VOL#1502, and Dir-Info#1503.

[0051] VOL#1501 represents the identification number of PVOL10P or SVOL10S. Root-VOL#1502 represents the identification number of Root-VOL. If VOL10 is PVOL10P, then Root-VOL is that PVOL10P, and if VOL10 is SVOL10S, then Root-VOL is the PVOL10P corresponding to that SVOL10S. Dir-Info#1503 represents the identification number of Dir-Info corresponding to VOL10.

[0052] Figure 12 shows the configuration of the latest generation table 1005. The latest generation table 1005 exists for every PVOL10P (every SS-Family9) and represents the generation (generation #) of that PVOL10P.

[0053] Figure 13 shows the configuration of the collection management table 1006. The retrieval management table 1006 is, for example, a bitmap, and exists for every PVOL10P (every SS-Family9), or in other words, for every 70 Dir-Info generation management trees. The retrieval management table 1006 has an entry for each Dir-Info. The entry contains information such as Dir-Info#1701 and retrieval request 1702. Dir-Info#1701 represents the identification number of the Dir-Info. Retrieval request 1702 indicates whether or not to request the retrieval of the Dir-Info. "1" means to request retrieval, and "0" means not to request retrieval.

[0054] Figure 14 shows the structure of the generation management tree table 1007. The generation management tree table 1007 exists for every 10 PVOLs (every 9 SS-Familys), or in other words, for every 70 Dir-Info generation management trees. The generation management tree table 1007 has an entry for each Dir-Info. The entry contains information such as Dir-Info#1801, generation#1802, Prev1803, and Next1804. Dir-Info#1801 represents the identification number of the Dir-Info. Generation#1802 represents the generation of VOL10 corresponding to the Dir-Info. Prev1803 represents the parent (one level above) Dir-Info of the Dir-Info. Next1804 represents the child (one level below) Dir-Info of the Dir-Info. The number of Next1804s can be the same as the number of child Dir-Infos. In Figure 14, there are two child Dir-Infos, so there are two Next1804s (Next-A1804A and Next-B1804B).

[0055] Figure 15 shows the configuration of the snapshot allocation management table 1008. The snapshot allocation management table 1008 exists for each SS-VDEV11S and represents the mapping from addresses in SS-VDEV11S to addresses in VOL10. The snapshot allocation management table 1008 has an entry for each address in SS-VDEV11S. The entry contains information such as block address 1901, status 1902, allocated VOL#1903, and allocated address 1904.

[0056] Block address 1901 represents the address of the block in SS-VDEV11S. Status 1902 indicates whether the block is assigned to the address of any VOL ("1" means assigned, "0" means free). Assigned VOL#1903 represents the identification number of VOL10 (PVOL10P or SVOL10S) that has the address to which the block is assigned ("n / a" means unassigned). Assigned address 1904 represents the address to which the block is assigned (block address) ("n / a" means unassigned).

[0057] Figure 16 shows the structure of the Dir management table 1009. The Dir management table 1009 exists for each Dir-Info and represents the Mapping-Info referenced for each data (each block of data). For example, the Dir management table 1009 has an entry for each address (block address). The entry contains information such as the VOL / VDEV address 2001 and the referenced Mapping-Info#2002. VOL / VDEV address 2001 represents the address (block address) in VOL10 (PVOL10P or SVOL10S) or the address in VDEV11 (SS-VDEV11S or Dedup-VDEV11D). Reference Mapping-Info#2002 represents the identification number of the reference Mapping-Info.

[0058] Figure 17 shows the configuration of the SS-Mapping management table 1010. SS-Mapping management table 1010 exists for each Dir-Info in VOL10. SS-Mapping management table 1010 has an entry for each SS-Mapping-Info corresponding to the Dir-Info in VOL10. Each entry contains information such as Mapping-Info#2101, reference address 2102, reference SS-VDEV#2103, and generation#2104.

[0059] Mapping-Info#2101 represents the identification number of SS-Mapping-Info. Reference address 2102 represents the address referenced by SS-Mapping-Info (the address in SS-VDEV11S). Reference SS-VDEV#2103 represents the identification number of SS-VDEV11S that has the address referenced by SS-Mapping-Info. Generation#2104 represents the generation of data corresponding to SS-Mapping-Info.

[0060] Figure 18 shows the configuration of the compression allocation management table 1011. The compression allocation management table 1011 exists for each CR-VDEV11C and contains compression allocation information for each subblock in the CR-VDEV11C. The compression allocation management table 1011 has an entry corresponding to the compression allocation information for each subblock in the CR-VDEV11C. The entry contains information such as the subblock address 2201, data length 2202, status 2203, starting subblock address 2204, assigned VDEV# 2205, and assigned address 2206.

[0061] Subblock address 2201 represents the address of the subblock. Data length 2202 represents the number of subblocks that make up the group of subblocks (one or more subblocks) in which the compressed data is stored (for example, "2" means that the compressed data exists in two subblocks). Status 2203 represents the status of the subblock ("0" means free, "1" means allocated, and "2" means it is subject to GC (garbage collection)). Leading subblock address 2204 represents the address of the leading subblock of one or more subblocks that contain the subblock (one or more subblocks in which the compressed data is stored). Assignee VDEV#2205 represents the identification number of the VDEV11 (SS-VDEV11S or Dedup-VDEV11D) that has the block to which the subblock is allocated. Assignee address 2206 represents the address of the block to which the subblock is allocated (the block address in SS-VDEV11S or Dedup-VDEV11D).

[0062] Figure 19 shows the configuration of the CR-Mapping management table 1012. The CR-Mapping management table 1012 exists for each Dir-Info of Dedup-VDEV11D and for each Dir-Info of SS-VDEV11S. The CR-Mapping management table 1012 has an entry for each CR-Mapping-Info corresponding to the Dir-Info of Dedup-VDEV11D and for each CR-Mapping-Info corresponding to the Dir-Info of SS-VDEV11S. The entry contains information such as Mapping-Info#2301, reference address 2302, reference CR-VDEV#2303, and data length 2304.

[0063] Mapping-Info#2301 represents the identification number of CR-Mapping-Info. Reference address 2302 represents the address referenced by CR-Mapping-Info (the address of the first subblock in the subblock group). Reference CR-VDEV#2303 represents the identification number of CR-VDEV11C that has the subblock address referenced by CR-Mapping-Info. Data length 2304 represents the number of blocks referenced by CR-Mapping-Info (blocks in Dedup-VDEV11D), or the number of subblocks that make up the subblock group referenced by CR-Mapping-Info.

[0064] Figure 20 shows the configuration of the Dedup-Dir management table 1013. The Dedup-Dir management table 1013 exists for each Dedup-VDEV11D and corresponds to Dedup-Dir-Info. The Dedup-Dir management table 1013 has an entry for each address in Dedup-VDEV11D. The entry contains information such as the Dedup-VDEV address 2401 and reference destination assignment information #2402. Dedup-VDEV address 2401 represents the address (block address) in Dedup-VDEV11D. Reference assignment information #2402 represents the identification number of the referenced Dedup assignment information.

[0065] Figure 21 shows the structure of the Dedup assignment management table 1014. The Dedup allocation management table 1014 exists for each Dedup-VDEV11D (for each Dedup-Dir-Info) and represents a reverse reference mapping from the Dedup allocation information corresponding to the address in Dedup-VDEV11D to the address in SS-VDEV11S. The Dedup allocation management table 1014 has an entry for each piece of Dedup allocation information. The entry contains information such as allocation information #2501, the assigned SS-VDEV #2502, the assigned address 2503, and concatenated allocation information #2504. Assignment information #2501 represents the identification number of the Dedup assignment information. Assigned destination SS-VDEV#2502 represents the identification number of the SS-VDEV11S that has the address referenced by the Dedup assignment information. Assigned destination address 2503 represents the address referenced by the Dedup assignment information (the block address in the SS-VDEV11S). Linked assignment information #2504 represents the identification number of the Dedup assignment information linked to the Dedup assignment information.

[0066] According to Figure 21, Dedup assignment information #1 is linked to Dedup assignment information #3, and there is no Dedup assignment information linked to Dedup assignment information #3. Therefore, it can be seen that the duplicate data in the Dedup-VDEV address corresponding to Dedup assignment information #1 is duplicate data in the SS-VDEV11S referenced by Dedup assignment information #1 and the SS-VDEV11S referenced by Dedup assignment information #3. Since the number of duplicate data is undefined, Dedup assignment information is linked according to the number of duplicate data. If duplicate data exists in N SS-VDEV11S, sequential Dedup assignment information for N is prepared.

[0067] Figure 22 shows the configuration of the Pool-Mapping management table 1015. The Pool-Mapping management table 1015 exists for each CR-VDEV11C. The Pool-Mapping management table 1015 has an entry for each area in the CR-VDEV11C that corresponds to a page size. The entry contains information such as VDEV address 2601 and page #2602. VDEV address 2601 represents the starting address of a region (e.g., multiple blocks) in page size units. Page #2602 represents the identification number of the assigned page 14 (e.g., the address of page 14 in pool 13). If there are multiple pools 13, page #2602 may include the identification number of the pool 13 containing page 14.

[0068] Figure 23 shows the configuration of the Pool allocation management table 1016. The Pool allocation management table 1016 exists for each pool 13, for example, if there are multiple pools 13. The Pool allocation management table 1016 represents the correspondence between page 14 and the area in CR-VDEV11C. The Pool allocation management table 1016 has an entry for each page 14. The entry contains information such as page #2701, RG #2702, starting address 2703, status 2704, assigned VDEV #2705, and assigned address 2706.

[0069] Page #2701 represents the identification number of page 14. RG#2702 represents the identification number of the RAID group on which page 14 is based (in this embodiment, a RAID group consisting of two or more SSDs 220). Starting address 2703 represents the starting address of page 14. Status 2704 represents the status of page 14 ("1" means allocated, and "0" means free). Assigned VDEV#2705 represents the identification number of the CR-VDEV11C to which page 14 is allocated ("n / a" means unassigned). Assigned address 2706 represents the assigned address of page 14 (the address in the CR-VDEV11C) ("n / a" means unassigned).

[0070] Figure 24 shows the flow of the snapshot acquisition process. The snapshot acquisition process is executed by the snapshot acquisition program 1101 in response to a snapshot acquisition instruction from the management system 203 (or another system such as the server system 202). In the snapshot acquisition instruction, for example, the target PVOL10P is specified.

[0071] First, the snapshot acquisition program 1101 allocates the destination Dir management table 1009 and updates the VOL-Dir management table 1004 (S2401). The snapshot acquisition program 1101 increments the latest generation number (S2402) and updates the generation management tree table 1007 (Dir-Info generation management tree 70) (S2403). At this time, the snapshot acquisition program 1101 sets the latest generation number as the source and the previous generation number as the destination.

[0072] The snapshot acquisition program 1101 determines whether or not there is cache dirty data for the target PVOL10P (S2404). "Cache dirty data" refers to data stored in the cache unit 903 that has not yet been written to pool 13.

[0073] If the result of the S2404 check is true (S2404:Yes), the snapshot acquisition program 1101 causes the snapshot appending program 1106 to execute the snapshot appending process (S2405). If the result of S2404 is false (S2404: No), or after S2405, the snapshot acquisition program 1101 copies the Dir management table 1009 of the target PVOL10P to the destination Dir management table 1009 (S2406).

[0074] Subsequently, the snapshot acquisition program 1101 updates the snapshot management table 1003 (S2407) and terminates the process. In S2407, an entry is added that has PVOL#1401 representing the identification number of the target PVOL10P, SVOL#1402 representing the identification number of the acquired snapshot (SVOL10S), and acquisition time 1403 representing the acquisition time.

[0075] Figure 25 shows the flow of the snapshot restore process. The snapshot restore process is executed by the snapshot restore program 1102 in response to a restore instruction from the management system 203 (or another system such as the server system 202). The restore instruction specifies, for example, the source SVOL and the destination PVOL.

[0076] First, the snapshot restore program 1102 assigns the destination Dir management table 1009 and updates the VOL-Dir management table 1004 (S2501). The snapshot restore program 1102 increments the latest generation number (S2502) and updates the generation management tree table 1007 (Dir-Info generation management tree 70) (S2503). At this time, the snapshot restore program 1102 sets the source to the generation number before the increment and the destination to the latest generation number.

[0077] The snapshot restore program 1102 purges the cache area (the area in the cache section 903) of the restore destination PVOL (S2504). The snapshot restore program 1102 copies the Dir management table 1009 of the source SVOL to the Dir management table 1009 of the destination PVOL (S2505).

[0078] Subsequently, the snapshot restore program 1102 registers the Dir-Info# of the old Dir-Info of the restore destination in the recovery management table 1006 (S2506), and then terminates the process. In S2506, the recovery request 1702 corresponding to the said Dir-Info# is set to "1".

[0079] Figure 26 shows the flow of the snapshot deletion process. The snapshot deletion process is executed by the snapshot deletion program 1103 in response to a snapshot deletion instruction from the management system 203 (or another system such as the server system 202). In a restore instruction, for example, the target SVOL is specified.

[0080] First, the snapshot deletion program 1103 refers to the VOL-Dir management table 1004 and disables the Dir-Info (Dir-Info#1503) of the target SVOL (S2601). Then, the snapshot deletion program 1103 updates the snapshot management table 1003 (S2602), registers the old Dir-Info# of the target SVOL in the retrieval management table 1006 (S2603), and terminates the process. In S2603, the retrieval request 1702 corresponding to the Dir-Info# is set to "1".

[0081] Figure 27 shows the flow of the asynchronous retrieval process. The asynchronous retrieval process is executed, for example, periodically by the asynchronous retrieval program 1104. First, the asynchronous retrieval program 1104 identifies the Dir-Info# to be retrieved from the retrieval management table 1006 (S2701). The "Dir-Info# to be retrieved" is the Dir-Info# for which the retrieval request 1702 is "1". The asynchronous retrieval program 1104 refers to the generation management tree table 1007, checks the entry for the Dir-Info# to be retrieved, and does not select a Dir-Info that has two or more children.

[0082] Subsequently, the asynchronous retrieval program 1104 determines whether or not there are any unprocessed entries (S2702). Here, "unprocessed entries" refer to entries in the retrieval management table 1006 where the retrieval request 1702 is "1" but the asynchronous retrieval process has not yet been processed.

[0083] If the result of the determination in S2702 is true (S2702:Yes), the asynchronous retrieval program 1104 determines the entry to be processed (the entry containing retrieval request 1702"1") from one or more unprocessed entries (S2703), and identifies the referenced Mapping-Info#2002 from the Dir management table 1009 corresponding to the target Dir-Info (the Dir-Info identified from Dir-Info#1701 in the entry to be processed) (S2704).

[0084] The asynchronous retrieval program 1104 refers to the generation management tree table 1007 and determines whether or not a child generation Dir-Info of the target Dir-Info exists (S2705). If the result of the S2705 determination is true (S2705:Yes), the asynchronous retrieval program 1104 identifies the referenced Mapping-Info#2002 from the Dir management table 1009 corresponding to the child generation Dir-Info, and determines whether the referenced Mapping-Info#2002 of the target Dir-Info matches the referenced Mapping-Info#2002 of the child generation Dir-Info (S2706). If the result of the S2706 determination is true (S2706:Yes), the process returns to S2702.

[0085] If the result of the determination in S2706 is false (S2706:No), or if the result of the determination in S2705 is false (S2705:No), the asynchronous retrieval program 1104 determines whether the generation # of the parent generation Dir-Info of the target Dir-Info is older than the generation #2104 of the Mapping-Info referenced by the target Dir-Info (see Figure 21) (S2707). If the result of the determination in S2707 is false (S2707:No), the process returns to S2702.

[0086] If the result of the S2707 check is true (S2707:Yes), the asynchronous retrieval program 1104 initializes the target entry in the SS-Mapping management table 1010 and releases the target entry in the snapshot allocation management table 1008 (S2708). Then, the process returns to S2702. The release in S2708 corresponds to the release of a block in SS-VDEV.

[0087] If the result of the S2702 judgment is false (S2702: No), the asynchronous retrieval program 1104 updates the retrieval management table 1006 (S2709) and also updates the generation management tree table 1007 (Dir-Info generation management tree 70) (S2710), and then terminates the process.

[0088] Figure 28 shows the flow of the Write process (frontend). The Write process (frontend) is executed by the read / write program 1105 when a write request is received from the server system 202.

[0089] First, the read / write program 1105 determines whether the data targeted by the write request is a cache hit (S2801). A "cache hit" means that a cache area corresponding to the write destination VOL address of the target data (the VOL address specified in the write request) has already been allocated. If the result of the determination in S2801 is false (S2801: No), the read / write program 1105 allocates a cache area from the cache unit 903 corresponding to the write destination VOL address of the target data (S2802). After that, the process proceeds to S2806.

[0090] If the result of S2801 is true (S2801:Yes), the read / write program 1105 determines whether the cache-hit data (data in the allocated cache area) is dirty data (data not yet reflected (not written) to pool 13) (S2803). If the result of S2803 is false (S2803:No), the process proceeds to S2806.

[0091] If the result of S2803 is true (S2803:Yes), the read / write program 1105 determines whether the WR (Write) generation # of the dirty data matches the generation # of the data targeted by the write request (S2804). The "WR generation #" is stored, for example, in the cache data management information (not shown). The generation # of the data targeted by the write request is obtained from the latest generation #403. S2804 is a process to prevent overwriting the snapshot data by updating the dirty data with the data targeted by the write request before the append processing of the target data (dirty data) of the immediately taken snapshot has been completed.

[0092] If the result of the S2804 check is false (S2804: No), the read / write program 1105 instructs the snapshot append program 1106 to perform the snapshot append process (S2805).

[0093] After S2802, or if the result of S2804 is true (S2804: Yes), the read / write program 1105 writes the data subject to the write request to the cache area allocated in S2802, or to the cache area obtained through S2805 (S2806). Then, the read / write program 1105 sets the WR generation # of the data written in S2806 to the latest generation # compared in S2804 (S2807), and returns a good response to the server system 202 (S2808).

[0094] Figure 29 shows the flow of the Write process (backend). The Write process (backend) is the process of writing unreflected data (dirty data) to the pool 13 if such data exists in the cache unit 903. The Write process (backend) is performed synchronously or asynchronously with the Write process (frontend). The Write process (backend) is executed by the read / write program 1105.

[0095] The read / write program 1105 determines whether or not there is dirty data on the cache unit 903 (S2901). If the result of the determination in S2901 is true (S2901:Yes), the read / write program 1105 causes the snapshot append program 1106 to execute the snapshot allocation determination process (S2902).

[0096] Figure 30 shows the flow of the Snapshot allocation determination process. The Snapshot allocation determination process determines whether to allocate a new area on SS-VDEV11S for the area of PVOL10P that is the target of the write request from the host, or to overwrite an area already allocated on SS-VDEV11S. The Snapshot allocation determination process is executed by the Snapshot allocation determination program 1113, which is called from the read / write program 1105.

[0097] The Snapshot allocation determination program 1113 first determines whether the generation # of the Dir-Info of the target VOL (the VOL to which the data is written) matches the generation # of the SS-Mapping-Info before the appending (S3001). If the result of the determination in S3001 is false (S3001: No), the Snapshot allocation determination program 1113 executes the Snapshot appending process (S3007).

[0098] If the result of S3001 is true (S3001:Yes), proceed to S3002. If the result of S3001 is true, the data in the range for which a write request has been received from the host is not referenced by other snapshots (hereinafter referred to as the standalone state), so by overwriting the same address on SS-VDEV, it is possible to avoid updating the snapshot allocation management table and the SS-Mapping management table.

[0099] Next, the Snapshot allocation determination program 1113 obtains the transfer length of the Write data requested from the host computer and determines whether that length is below a threshold (S3002). If the transfer length is less than or equal to the threshold (S3002: Yes), the program proceeds to S3003. If the transfer length is greater than the threshold (S3002: No), the program proceeds to S3006. The threshold here is set to switch between whether it is more performance-wise to overwrite the data without allocating a new area in the Snapshot space and proceed to the Dedup append process, or to allocate a new area in the Snapshot space and update the mapping information all at once (Snapshot append process). The specific threshold is set according to the actual program implementation and performance characteristics.

[0100] Next, the Snapshot allocation determination program 1113 determines whether it is possible to allocate a new contiguous area on SS-VDEV for the area to be written (S3003). Specifically, it refers to the snapshot allocation management table 1008 to determine whether it is possible to secure an unallocated contiguous area. If it determines that allocation is possible (S3003: Yes), the Snapshot allocation determination program 1113 executes the Snapshot append process (S3007). If it determines that allocation is not possible (S3003: No), it proceeds to S3004.

[0101] Next, the Snapshot allocation determination program 1113 determines whether a new VDEV can be allocated on SS-VDEV (S3004). If it determines that allocation is possible (S3004: Yes), it proceeds to S3005 and allocates the new VDEV to SS-VDEV. Next, the Snapshot allocation determination program 1113 executes the Snapshot append process (S3007) and allocates a new area on SS-VDEV. If it determines that a new VDEV cannot be allocated (S3004: No), it proceeds to S3006 and performs an overwrite on SS-VDEV.

[0102] If the process proceeds to S3006, the Snapshot allocation determination program 1113 executes the Dedup append process. In this case, unlike when the process proceeds to the snapshot append process in S3007, the snapshot append process is skipped, and the same address on SS-VDEV is overwritten. This eliminates the need to update the snapshot allocation management table and the SS-Mapping management table.

[0103] Figure 31 shows the flow of the snapshot append process. The snapshot append process is the process of allocating a new area on SS-VDEV11S to the area of PVOL10P that is the target of a write request from the host. The snapshot append process is executed by the snapshot append program 1106, which is called from the snapshot acquisition program 1101 or the read / write program 1105.

[0104] The snapshot append program 1106 updates the snapshot allocation management table 1008 to secure a new area (a block address with status 1902 of "0") in the target SS-VDEV11S (an SS-VDEV11S corresponding to SS-Family9 that includes the target VOL (for example, the SVOL to be acquired, or the VOL to which the data will be written)) (S3101). Then, the snapshot append program 1106 instructs the Dedup append program 1107 to execute the Dedup append process (S3102).

[0105] Subsequently, the snapshot append program 1106 updates the SS-Mapping management table 1010 (S3103). In S3103, for example, the snapshot append program 1106 sets the latest generation # (the generation # represented by the latest generation table 1005) to generation #2104, which corresponds to the Mapping-Info# of the target SS-Mapping-Info. Here, "target SS-Mapping-Info" refers to the SS-Mapping-Info corresponding to the data in the target VOL.

[0106] The snapshot append program 1106 updates the Dir management table 1009 corresponding to the Dir-Info of the target SS-VDEV11S (S3104). In S3104, the SS-Mapping-Info (information representing the reference address in SS-VDEV) for the data to be written is associated with the address in VOL10 of that data.

[0107] The snapshot appending program 1106 refers to the generation management tree table 1007 (Dir-Info generation management tree 70) (S3105) and determines whether the generation # of the Dir-Info of the target VOL (the VOL to which the data is written) matches the generation # of the SS-Mapping-Info before the appending (S3106).

[0108] If the result of S3106 is true (S3106:Yes), it means that no snapshot sharing the area on SS-VDEV pointed to by the SS-Mapping-Info before appending has been created since the data was stored in that area. In other words, it can be determined that the area on SS-VDEV pointed to by the SS-Mapping-Info before appending has become garbage. Therefore, the snapshot appending program 1106 releases the target entry in the snapshot allocation management table 1008 (S3107) and terminates processing. At this time, the mapping information to the CR-VDEV or Dedup-VDEV space corresponding to the area on SS-VDEV pointed to by the SS-Mapping-Info before appending remains valid. In other words, the target entry in the SS-Mapping management table 1010 has not been initialized.

[0109] If the result of S3106 is false (S3106: No), it means that after data was stored in the area on SS-VDEV pointed to by the SS-Mapping-Info before the append, a snapshot sharing that area was created and the generation # of DIR-Info was incremented. In this case, the area on SS-VDEV before the append does not become garbage, so the SS-Mapping management table 1010 before the append remains as is, and the process is terminated.

[0110] Figure 32 shows the flow of the Dedup append process. The Dedup append process is executed by the Dedup append program 1107, which is called from the snapshot append program 1106.

[0111] The Dedup append program 1107 determines whether or not duplicate data exists in the data stored in the Pool (S3201). Although not explained in detail in the diagram, checking for identical data for all data would be computationally intensive. Therefore, a method is used in which a representative value of the data, such as a hash value, is calculated by performing calculations using a hash function for each data, and comparison processing is performed only between data whose representative values match. If the result of the determination in S3201 is false (S3201: No), the Dedup append program 1107 causes the compression append program 1108 to execute the compression append process (S3207). As a result, the compressed data of the data stored in SS-VDEV11S is stored in CR-VDEV11C without going through Dedup-VDEV11D and without the processing main CPU 211 being changed.

[0112] If the result of S3201 is true (S3201:Yes), the Dedup append program 1107 updates the Dedup assignment management table 1014 (S3202). In S3202, an entry for Dedup assignment information corresponding to the data where the target Dedup-VDEV11D is stored is added to the Dedup assignment management table 1014.

[0113] The Dedup append program 1107 updates the CR-Mapping management table 1012 (S3203). In S3203, an entry for the CR-Mapping-Info corresponding to the data to which Dedup-VDEV11D is stored is added to the CR-Mapping management table 1012.

[0114] The Dedup append program 1107 updates the Dir management table 1009 corresponding to the Dir-Info of the target Dedup-VDEV11D (S3204). In S3204, the CR-Mapping-Info (information representing the reference address in CR-VDEV11CC) for the duplicate data is associated with the address in the target Dedup-VDEV11D of that data. In this way, the capacity used by the pool is reduced by associating the duplicate data with data that has already been stored.

[0115] The Dedup append program 1107 invalidates the pre-update allocation information (S3205). In S3205, the Dedup append program 1107 updates the Dedup allocation management table 1014 and the compressed allocation management table 1011. Also in S3205, for areas in the Dedup allocation management table 1014 where the number of allocated destinations has become zero, the program garbage collects the entries in the compressed allocation information.

[0116] Figure 33 shows the flow of the compressed append process. The compressed append process is executed by the compressed append program 1108, which is called from the Dedup append program 1107. The compression append program 1108 determines whether the data length that can be compressed and appended is greater than or equal to a threshold. This data length may be the data length requested by the Dedup append program 1107 mentioned above, or it may be the data length that includes uncompressed data already stored in the cache memory and not yet reflected on the drive. The threshold is a predetermined data length, such as 256KB or 512KB, which is assumed to be larger than the mapping unit that manages snapshots and deduplication. When compressing and appending data on the cache memory, processing contiguous data together whenever possible allows the update process of the compression allocation management table (S3303) and the CR-Mapping management table (S3305), described later, to be updated all at once, thereby reducing processing overhead. If the result of the determination in S3301 is true (S3301:Yes), the compression append program 1108 proceeds to S3302. If the result of the determination in S3301 is false (S3301:No), the process terminates. In this case, the data is temporarily stored in an uncompressed state on the cache memory. This allows the overhead of updating mapping information to be reduced because, when other data is written by the host computer, the data is written consecutively to the cache memory through the Snapshot append process. The compression append program 1108 compresses the data to be written (S3302). The compression append program 1108 updates the compression allocation management table 1011 (S3303). In S3303, for each of the one or more subblocks that will be used to store the compressed data from S3302, the corresponding entry for that subblock is updated.

[0117] The compressed append program 1108 instructs the destaging program 1109 to perform the destaging process (S3304). The compressed append program 1108 updates the CR-Mapping management table 1012 after the append (S3305). The compressed append program 1108 updates the Dir management table 1009 (S3306).

[0118] The compressed append program 1108 invalidates the pre-update allocation information (S3307). In S3307, the Dedup append program 1107 updates the Dedup allocation management table 1014 and the compressed allocation management table 1011. Also in S3307, for areas in the Dedup allocation management table 1014 where the number of allocated destinations has become zero, the program garbage collects the entries targeted by the compressed allocation information.

[0119] Figure 34 shows the flow of the destaging process. The destaging process is performed by the destaging program 1109, which is called from the compression appending program 1108. The destaging program 1109 determines whether or not there is additional data (one or more compressed data) for the RAID stripe in the cache unit 903 (S3401). A "RAID stripe" is a stripe in a RAID group (a storage area spanning multiple SSDs 220 that make up the RAID group). If the RAID level of the RAID group requires parity, the size of the "additional data for the RAID stripe" can be the size of the stripe minus the size of the parity. If the result of the determination in S3401 is false (S3401: No), the process ends.

[0120] If the result of S3401 is true (S3401:Yes), the destaging program 1109 refers to the Pool-Mapping management table 1015 and determines whether page 14 has been allocated to the storage location (address in CR-VDEV11C) for the additional data for the RAID stripe (S3402). If the result of S3402 is false (S3402:No), the process proceeds to S3405.

[0121] If the result of S3402 is true (S3402:Yes), the destaging program 1109 updates the Pool allocation management table 1016 (S3403). Specifically, the destaging program 1109 allocates page 14. In S3403, the entries in the Pool allocation management table 1016 corresponding to the allocated page 14 (for example, status 2704, assigned VDEV#2705, and assigned address 2706) are updated.

[0122] The destaging program 1109 registers page #2602 of the allocated pages in the entry corresponding to the storage location of the additional data for the RAID stripe in the Pool allocation management table 1016 (S3404).

[0123] The destaging program 1109 writes the additional data for the RAID stripe to the stripe on which the page is based (S3405). If the RAID level is one that requires parity, the destaging program 1109 generates parity based on the additional data for the RAID stripe and also writes the parity to the stripe.

[0124] Figure 35 is a flowchart showing the processing procedure for read operations. Read operations are performed by the Read / Write program 1105 in response to a read request from the host device. First, in S3500, the Read / Write program 1105 obtains the address within the PVOL or Snapshot of the data targeted by the read request from the server system 202. Next, in S3501, the Read / Write program 1105 determines whether the data targeted by the read request is a cache hit. If the data targeted by the read request is a cache hit (S3501: Yes), the Read / Write program 1105 moves processing to S3508; if it is not a cache hit (S3501: No), it moves processing to S3502.

[0125] In S3502, the Read / Write program 1105 refers to the Dir management table 1009 and the SS-Mapping management table 1010 to obtain the address on the referenced SS-VDEV based on the PVOL / Snapshot address obtained in S3500. If the size of the data targeted by the read request is larger than the management unit in the SS-Mapping management table 1010, all entries are referred to to obtain the address on the SS-VDEV.

[0126] Next, in S3503, the Read / Write program 1105 refers to the Dir management table 1009 and the CR-Mapping management table 1012 to obtain an address within the CR-VDEV or Dedup-VDEV based on the SS-VDEV address obtained in S3502.

[0127] Next, in S3504, it determines whether the reference address obtained in S3503 is on Dedup-VDEV. Specifically, it obtains the reference CR-VDEV#2303 from the CR-Mapping management table 1012, and then refers to the CR-VDEV management table 1002 to identify VDEV#1301, which matches CR-VDEV#2303 and CR-VDEV#1302. If the result of the determination in S3504 is false (S3504: No), the Read / Write program 1105 proceeds to S3506. On the other hand, if the result of the determination in S3504 is true (S3504: Yes), the Read / Write program 1105 uses the address in Dedup-VDEV identified in S3503 to refer to the Dir management table 1009 and the CR-Mapping management table 1012 to obtain the address in CR-VDEV (S3505).

[0128] Next, in S3506, the Read / Write program 1105 stages the data stored at the CR-VDEV address identified in S3503 or S3505 into cache memory while expanding it.

[0129] Next, the Read / Write program 1105 determines whether it has read all the data within the range requested by the host device into the cache (S3507). If the result of the determination is true, the Read / Write program 1105 proceeds to S3508, transfers the data that was hit in the cache in S3501 or the data that was staged in S3506 to the host device, and terminates processing.

[0130] On the other hand, if the result of S3507 is false (S3507: No), the Read / Write program 1105 returns to S3502 and re-stages the missing data on the cache. In this way, if the data requested by the host is stored in a contiguous area on the CR-VDEV, the data can be aligned on the cache with just one staging from the drive. However, if the data is stored in a distributed manner on the SS-VDEV, Dedup-VDEV, or CR-VDEV, multiple metadata lookups or staging operations from the drive occur, resulting in reduced throughput performance.

[0131] Figure 36 shows the flow of the GC (Garbage Collection) process. The GC process is performed by the GC program 1110, for example, periodically (or in response to instructions from the management system 203). The GC program 1110 refers to the Pool-Mapping management table 1015 and the Compression Allocation management table 1011 to identify pages with subblocks in a garbage state (status 2203 "2") (S3601). If there are no pages with subblocks in a garbage state, the GC process may be terminated. In S3601, the GC program 1110 may also preferentially select the CR-VDEV11C with the least amount of free space among multiple CR-VDEV11Cs. In S3601, the GC program 1110 may also preferentially identify the page with the most subblocks in a garbage state among the CR-VDEV11Cs. Furthermore, GC may be performed on a different area unit than page 14. The GC program 1110 determines whether there are any unprocessed subblocks (that have not yet been determined in S3603) in the page identified in S3601 (S3602). If the result of the S3602 determination is true (S3602:Yes), the GC program 1110 refers to the compression allocation management table 1011 and determines the subblock to be processed (S3603). The GC program 1110 determines whether the status 2203 corresponding to the subblock to be processed is "1" (allocated) or not (S3604). If the result of the S3604 determination is false (S3604:No), the process returns to S3602. If the result of the S3604 determination is true (S3604:Yes), the GC program 1110 determines whether the allocation of the subblock to be processed is also valid on SS-VDEV (S3605). Specifically, the GC program 1110 refers to the compression allocation management table 1011 and identifies the VDEV 2205 and Address 2206 of the entry corresponding to the subblock to be processed, thereby identifying the address on SS-VDEV to which the subblock to be processed is allocated. Then, the snapshot allocation management table 1008 is referenced to determine whether the Status 1902 of the entry corresponding to the address on SS-VDEV is 1 (allocated). If Status 1902 is 1 (allocated), it is determined that the subblock to be processed is also allocated on SS-VDEV. If the result of the S3605 check is false (S3605: No), the process returns to S3602. If the result of the S3605 check is true (S3605: Yes), the GC program 1110 appends the subblock to be processed to a separate area (S3606). This "separate area" may be a free subblock (a subblock with status 2203 of "0") in a CR-VDEV11C other than the CR-VDEV11C that is the target of the GC processing (the CR-VDEV11C that has the subblock to which the page identified in S3601 is allocated). Also, this "separate CR-VDEV11C" may be a CR-VDEV11C in which all subblocks are flow subblocks. Also, page 14 may be allocated to this "separate area," and the compressed data in the subblock to be processed may be written to page 14 (in other words, the compressed data may be moved from the page allocated to the subblock to be processed to the page allocated to the separate area). If the result of the S3602 check is false (S3602: No), the GC program 1110 updates all entries in the compression allocation management table 1011 corresponding to the CR-VDEV11C that is the target of the GC process (S3607). In S3607, for example, the status 2203 of all entries becomes "0". Furthermore, the GC program 1110 updates the Pool-Mapping management table 1015 and the Pool assignment management table 1016 (S3608). In S3806, for example, page #2602 of all entries in the Pool-Mapping management table 1015 corresponding to the CR-VDEV11C subject to GC processing may be initialized, and the status 2704 corresponding to all pages that were assigned to the CR-VDEV11C subject to GC processing may be set to "0" (free). Thus, the GC process according to this embodiment may be performed by transferring effective compressed data (compressed data in the allocated subblocks) between CR-VDEV11C units in order to bring multiple allocated subblocks in a discontinuous state into a continuous state. Note that bringing multiple allocated subblocks in a discontinuous state into a continuous state may be performed without data transfer between CR-VDEV11C units.

[0132] As described above, the disclosed storage system 201 comprises an SSD 220 as a storage device and a processor 211 that accesses the storage device, the processor 211 manages a primary volume 10P that is subject to read / write operations by the host and a snapshot volume 10S generated from the primary volume 10P as a snapshot family 9, the processor 211 uses a snapshot virtual device 11S, which is a logical address space associated with the snapshot family 9, as the storage location for the data of the primary volume 10P and the snapshot volume 10S, and the snapshot The data stored in the snapshot virtual device is compressed and stored in the compressed virtual device, and the data stored in the compressed virtual device is stored in the storage device. When the processor 211 receives a write request from the host, it switches between an overwrite process, which overwrites the area on the snapshot virtual device 11S that has been allocated to large-sized data, and a new allocation process, which allocates a new area on the snapshot virtual device 11S to the address range of the write destination for small-sized data, depending on the size of the address range of the write destination. The multiple small-sized data stored in the new area are then compressed and stored together in the compressed virtual device. Therefore, in a storage system that provides snapshots, high throughput performance can be achieved by optimizing the amount of mapping information updates and data transfers according to the data length written from the host and the mapping status, and by switching the allocation of virtual device space. In other words, in the case of random writes, throughput performance is improved by being able to move fragmented data in batches from a snapshot virtual device to a compressed virtual device.

[0133] Furthermore, a new, contiguous area on the snapshot virtual device is allocated to each of the multiple small-sized data points. Therefore, the storage of data from snapshot virtual devices to compressed virtual devices can be made more efficient.

[0134] Furthermore, the processor 211 uses a snapshot allocation management table 1008 that shows a mapping from the address in the snapshot virtual device 11S to the address in the primary volume and / or the snapshot volume to manage whether the address in the snapshot virtual device 11S is assigned to the address of any volume. When the processor 211 receives a write request and performs the new allocation process, it updates the snapshot allocation management table 1008 to show the area of the snapshot virtual device 11S that was assigned to the address range of the write destination before the new allocation process as an area not assigned to the address of any volume. When the processor 211 performs garbage collection, it identifies the address of the snapshot virtual device 11S that the storage area of the compressed virtual device that is a candidate for reclamation refers to, and refers to the snapshot allocation management table 1008 for the identified address, and sets the condition for reclamation as the area not assigned to the address of any volume. This process eliminates the need to update the SS-Mapping management table 1010, etc., during write operations, thus speeding up the write process. This is because the snapshot allocation management table 1008 is small in size, and updating it takes less time than updating the SS-Mapping management table 1010, etc. The SS-Mapping management table 1010 is updated when the appended write data is reflected in the storage device in a batch. Furthermore, during garbage collection, in addition to the condition that status 2203 of the compressed allocation management table 1011 indicates "garbage collection required," the snapshot allocation management table 1008 is used to check whether the allocation on the snapshot virtual device is valid, so the target for garbage collection can be accurately determined even if the CR-Mapping management table 1012 is not updated during write operations.

[0135] Furthermore, the processor 211 performs the overwrite process only if there are no references from the snapshot volume 10S to the address range to be written, and the address range to be written has a transfer length equal to or greater than the threshold on the snapshot virtual device 11S. If the transfer length is less than the threshold, the new allocation process is performed.

[0136] Furthermore, if multiple volumes belonging to the same snapshot family 9 have the same data, the disclosed storage system allocates a predetermined area on the snapshot virtual device 11S to the same data, and the multiple volumes refer to that predetermined area. Therefore, snapshots allow for efficient management of the data being referenced.

[0137] Furthermore, if multiple volumes belonging to different snapshot families 9 have the same data, the disclosed storage system allocates a predetermined area on the deduplication virtual device 11D referenced by the snapshot virtual device 11S to the same data. Therefore, duplicate data can be managed efficiently.

[0138] Furthermore, the processor 211 includes information about the snapshot generation in the mapping information that manages the correspondence between addresses in volumes belonging to the snapshot family 9 and areas on the snapshot virtual device 11S, and performs the new allocation process if the generation of the address range to be written to does not match the latest generation. Therefore, it is possible to efficiently manage whether the data being written to is independent data or not.

[0139] It should be noted that the present invention is not limited to the embodiments described above, and various modifications are included. For example, the embodiments described above are explained in detail to make the present invention easier to understand, and are not necessarily limited to those having all the configurations described. Furthermore, it is possible to replace or add configurations, not just delete them. [Explanation of symbols]

[0140] 9: Snapshot Family, 10P: Primary Volume, 10S: Snapshot Volume, 11C: Compressed Append Virtual Device, 11D: Deduplication Virtual Device, 11S: Snapshot Virtual Device, 13: Pool, 70: Dir-Info Generation Management Tree, 100: Computer System, 201: Storage System, 202: Server System, 203: Management System, 204: Storage Network, 205: Management Network, 210: Storage Controller, 211: CPU, 212: Memory, 213: Backend Interface, 214: Frontend Interface, 215: Management Interface, 901: Control Information Unit, 902: Program Unit, 903: Cache Unit

Claims

1. Memory device and The system comprises a processor that accesses the aforementioned storage device, The processor manages the primary volume that the host reads and writes, and the snapshot volume created from the primary volume, as a snapshot family. The aforementioned processor, The snapshot virtual device, which is a logical address space associated with the snapshot family, is used as the storage location for the data of the primary volume and the snapshot volume. The data stored in the aforementioned snapshot virtual device is compressed and stored in the compressed virtual device. The data stored in the compressed virtual device is stored in the storage device, When the processor receives a write request from the host, Depending on the size of the address range to which the data is written, the process switches between an overwrite process that overwrites the area on the snapshot virtual device already allocated for large-sized data, and a new allocation process that allocates a new area on the snapshot virtual device to the address range to which the data is written for small-sized data. The multiple small data files stored in the new area are compressed and stored together in the compressed virtual device. A storage system characterized by the following features.

2. A storage system according to claim 1, A new, contiguous area on the snapshot virtual device is allocated to multiple of the aforementioned small-sized data. A storage system characterized by the following features.

3. A storage system according to claim 1, The processor uses a snapshot allocation management table that shows a mapping from the address in the snapshot virtual device to the address in the primary volume and / or the snapshot volume, and manages whether the address in the snapshot virtual device is assigned to the address of any volume. When the processor receives the write request and performs the new allocation process, it updates the snapshot allocation management table to show the area of the snapshot virtual device that was allocated to the address range of the write destination before the new allocation process as an area that is not allocated to any volume address. The storage system is characterized in that, when the processor performs garbage collection processing, it identifies the address of the snapshot virtual device referenced by the storage area of the compressed virtual device that is a candidate for recovery, refers to the snapshot allocation management table for the identified address, and sets the recovery condition that the area is not allocated to any volume address.

4. A storage system according to claim 1, The storage system is characterized in that the processor performs the overwrite process when there is no reference from the snapshot volume to the address range of the write destination and the data length of the write request is greater than or equal to a threshold, and performs the new allocation process when it is less than the threshold.

5. A storage system according to claim 1, A storage system characterized in that, when multiple volumes belonging to the same snapshot family have the same data, a predetermined area on the snapshot virtual device is allocated to the same data, and the multiple volumes refer to the predetermined area.

6. A storage system according to claim 1, A storage system characterized in that, when multiple volumes belonging to different snapshot families have the same data, a predetermined area on a deduplication virtual device referenced by the snapshot virtual device is allocated to the same data.

7. A storage system according to claim 1, The aforementioned processor, The mapping information that manages the correspondence between addresses in volumes belonging to the snapshot family and areas on the snapshot virtual device includes information about the snapshot generation, A storage system characterized in that if the generation of the address range to which the write destination is located does not match the latest generation, the new allocation process is performed.

8. A storage system according to claim 3, The processor, if there are no references from the snapshot volume to the address range to which the write is intended, and the data length of the write request is less than or equal to the threshold, A storage system characterized by determining whether the area required for the new allocation process exists on the snapshot virtual device, performing the allocation if the required area exists, and expanding the snapshot virtual device and then performing the new allocation process if the required area does not exist.

9. A data processing method for a storage system comprising a storage device and a processor that accesses the storage device, The processor manages the primary volume that the host reads and writes, and the snapshot volume created from the primary volume, as a snapshot family. The processor uses the snapshot virtual device, which is a logical address space associated with the snapshot family, as the storage location for the primary volume and the data of the snapshot volume. The processor compresses the data stored in the snapshot virtual device and stores it in the compressed virtual device. The processor stores the data stored in the compressed virtual device in the storage device. The data processing method is characterized in that, when the processor receives a write request from the host, it switches between an overwrite process, which overwrites an area on the snapshot virtual device already allocated for large data, according to the size of the address range to be written, and a new allocation process, which allocates a new area on the snapshot virtual device to the address range to be written for small data, and compresses a plurality of small data stored in the new area and stores them together in the compressed virtual device.

10. The storage system according to claim 1, When the processor receives a write request from the host, The primary volume and the data written to the snapshot volume included in the snapshot family are stored in a snapshot virtual device, which is a logical address space associated with the snapshot family. The data stored in the aforementioned snapshot virtual device is compressed and stored in the compressed virtual device. The data stored in the compressed virtual device is stored in the storage device. The processor, in storing data in the snapshot virtual device and the compressed virtual device, switches between the following processes depending on the size of the address range of the write destination for the write request that updates the data: an overwrite process in which the updated large-sized data is overwritten in the area on the snapshot virtual device allocated for the pre-update data, and the overwritten large-sized data is compressed and stored in the compressed virtual device; and a new allocation process in which a new area on the snapshot virtual device is allocated to the address range of the write destination for the updated small-sized data, the updated small-sized data is stored there, and the multiple small-sized data stored in the new area are compressed and stored together in the compressed virtual device. A storage system characterized by the following features.