Controllers, memory devices

The controller addresses data loss during RAID storage media replacement by transferring data to spare areas, ensuring data integrity through error detection and management by RAID firmware, preventing data loss during rebuilds.

JP2026100864APending Publication Date: 2026-06-22KK TOSHIBA

Patent Information

Authority / Receiving Office
JP · JP
Patent Type
Applications
Current Assignee / Owner
KK TOSHIBA
Filing Date
2024-12-10
Publication Date
2026-06-22

AI Technical Summary

Technical Problem

Conventional RAID-configured storage media replacement and reconfiguration can lead to data loss due to media error sectors, especially during rebuild processes.

Method used

A controller with a monitoring unit that detects errors on physical disks and transfers data to spare areas on other disks, ensuring data integrity by writing data from error-prone areas to spare areas before replacement and rebuild, using RAID firmware to manage these operations.

Benefits of technology

Prevents data loss by maintaining data integrity during disk replacements and rebuilds by utilizing spare areas to store data before physical disk replacement, even in the presence of media error sectors.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure 2026100864000001_ABST
    Figure 2026100864000001_ABST
Patent Text Reader

Abstract

This invention provides a controller and storage device that can suppress data loss even when reconfiguring storage media in a RAID configuration. [Solution] The controller of this embodiment is a controller that reads and writes common data to the first and second physical disks to configure a logical disk, and is capable of executing disk input / output requests to the first and second physical disks in response to disk input / output requests to the logical disk received from a host computer. This controller includes, for each of the first and second physical disks, an area setting unit that sets a data area to be the target of disk input / output requests and a spare area on which data recorded in the data area can be recorded; a monitoring unit that monitors the status of each of the first and second physical disks; and a maintenance processing unit that, if an error in the first physical disk is detected as a result of monitoring by the monitoring unit, writes the data in the data area of ​​the first physical disk to the spare area of ​​the second physical disk.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] Embodiments of the present invention relate to a controller and a storage device.

Background Art

[0002] In host computers that require reliability, RAID controllers are used. A RAID controller realizes performance improvement and redundancy assurance by bundling a plurality of storage media (physical disks) by RAID (Redundant Arrays of Inexpensive Disks) technology. A RAID controller has a storage medium interface for connecting a plurality of storage media and a RAID control program (RAID firmware) for realizing a RAID function.

[0003] The RAID control program bundles storage media by RAID technology to configure a logical drive. The host computer recognizes the logical drive through a host interface. The host computer can access the logical drive of the RAID controller from a user program (application) through a file system by incorporating a device driver for the RAID controller into an OS (Operating System).

[0004] When a storage medium constituting a RAID configuration is replaced due to a failure or the like, rebuilding (Rebuild) of the replaced new storage medium is performed. Rebuilding is to reconstruct the content of the storage medium before replacement onto the storage medium after replacement.

[0005] When a storage medium in a RAID configuration is operated for a long period of time, repairable media error sectors may occur in each storage medium. When the replacement and rebuilding of the storage medium are executed in this state, the media error sectors become data loss and cannot be accessed. Depending on the data content of the data loss, a system failure may occur.

Prior Art Documents

Patent Documents

[0006] [Patent Document 1] Japanese Patent Publication No. 2022-142632 [Patent Document 2] Japanese Patent Application Publication No. 11-345095 [Overview of the project] [Problems that the invention aims to solve]

[0007] Thus, conventional controllers and storage devices may experience data loss when RAID-configured storage media are replaced and reconfigured. Embodiments of the present invention have been made to solve this problem, and aim to provide a controller and storage device that can suppress the occurrence of data loss even when RAID-configured storage media are reconfigured. [Means for solving the problem]

[0008] The controller of this embodiment is a controller that reads and writes common data to the first and second physical disks to configure a logical disk, and is capable of executing disk input / output requests to the first and second physical disks in response to disk input / output requests to the logical disk received from a host computer. This controller includes, for each of the first and second physical disks, an area setting unit that sets a data area to be the target of disk input / output requests and a spare area on which data recorded in the data area can be recorded; a monitoring unit that monitors the status of each of the first and second physical disks; and a maintenance processing unit that, if an error in the first physical disk is detected as a result of monitoring by the monitoring unit, writes the data in the data area of ​​the first physical disk to the spare area of ​​the second physical disk. [Brief explanation of the drawing]

[0009] [Figure 1] This is a block diagram showing the configuration of the computer system according to the first embodiment. [Figure 2]This is a block diagram showing the functional configuration of the computer system according to the first embodiment. [Figure 3] This diagram shows how RAID1 repairs media error sectors in a storage device. [Figure 4] This diagram shows how RAID1 repairs media error sectors in a storage device. [Figure 5] This diagram shows how RAID1 repairs media error sectors in a storage device. [Figure 6] This diagram shows how RAID1 repairs media error sectors in a storage device. [Figure 7] This is a flowchart illustrating the operation of the storage device according to the first embodiment. [Figure 8] This figure shows the data writing process of a physical disk in a storage device according to the first embodiment. [Figure 9] This figure shows the data writing process of a physical disk in a storage device according to the first embodiment. [Figure 10] This figure shows the data writing process of a physical disk in a storage device according to the first embodiment. [Figure 11] This figure shows the data writing process of a physical disk in a storage device according to the first embodiment. [Figure 12] This figure shows the data writing process of a physical disk in a storage device according to the first embodiment. [Figure 13] This figure shows the data writing process of a physical disk in a storage device according to the first embodiment. [Figure 14] This is a flowchart illustrating the operation of the storage device according to the second embodiment. [Figure 15] This is a block diagram showing the configuration of the computer system according to the third embodiment. [Figure 16] This is a flowchart illustrating the operation of the storage device according to the third embodiment. [Modes for carrying out the invention]

[0010] (Errors in a storage medium forming a RAID configuration) The occurrence of errors in a storage device having a RAID configuration will be described. To simplify the explanation, the case where the RAID configuration is RAID1 will be described as an example. FIGS. 3 to 6 are diagrams showing how a media error sector of a storage device by RAID1 is repaired.

[0011] The computer system 9 shown in FIG. 3 has a host computer 40 and a storage device 19. The storage device 19 has a RAID controller 29 to which two physical disks 39a and 39b are connected. That is, the RAID controller 29 constitutes a RAID1 to which two physical disks 39a and 39b are connected. The RAID controller 29 records data of the same content on the physical disks 39a and 39b.

[0012] Due to scratches, dust, or other factors, the physical disks 39a and 39b may have a media error that prevents data from being read. The media error can be detected at the timing of reading from the media error sector.

[0013] Upon receiving a disk input / output request indicating a read command from the host computer 40, the RAID controller 29 issues a read command to a logical drive composed of the physical disks 39a and 39b. In the example shown in FIG. 3, the RAID controller 29 is executing a read process on the physical disk 39a (FIG. 3(1)). The physical disk 39a detects a media error e1 at this timing and returns an error response to the RAID controller 29. (FIG. 3(2)).

[0014] Upon receiving an error response from the physical disk 39a, the RAID controller 29 issues a Reassign command to the physical disk 39a (Fig. 4(3)). Subsequently, the RAID controller 29 issues a read command to the physical disk 39b (Fig. 4(4)). Using the data obtained from the physical disk 39b (Fig. 4(5)), the RAID controller 29 restores the media error e1 of the physical disk 39a and issues a write command for the restored data to the physical disk 39a (Fig. 4(6)). Through this series of repair processes, the media error e1 is restored.

[0015] Even when performing such repair processes, due to the long-term operation of the storage device 19, both of the two physical disks 39a and 39b may malfunction, and there may be a number of repairable media error sectors. For example, as shown in Fig. 5, consider the case where there are media error sectors e2 to e5 on the physical disk 39a and a media error sector e6 on the physical disk 39b, and the malfunctioning physical disk 39b is replaced. The RAID controller 29 executes a rebuild on the newly replaced physical disk 39b.

[0016] In the rebuild process, the content of the physical disk 39a is copied to the newly replaced physical disk 39b. However, as shown in Fig. 6, the physical disk 39a contains media error sectors e2 to e5. Since the data in the media error sectors e2 to e5 is lost, the copy by the rebuild process cannot be performed. As a result of the rebuild process of the physical disk 39b, the data in the sectors e7 to e10 corresponding to the media error sectors e2 to e5 is missing. That is, as a result of the rebuild process, the data corresponding to the media error sectors e2 to e5 is lost from both the physical disks 39a and 39b. That is, there is a possibility that data may be lost by performing the replacement and rebuild processes of the physical disk. The computer system of the embodiment can prevent such data loss.

[0017] (Configuration of the First Embodiment) The computer system of the embodiment will be described in detail below with reference to the drawings. Figure 1 is a block diagram showing the configuration of the computer system according to the first embodiment. Figure 2 is a block diagram showing the functional configuration of the computer system according to the first embodiment.

[0018] As shown in Figure 1, the computer system 1 of this embodiment includes a storage device 10, a host bus interface 15, and a host computer 40. The storage device 10 is connected to the host computer 40 via the host bus interface 15. The storage device 10 includes a RAID controller 20 and a plurality of physical disks 30 (30a, ..., 30n).

[0019] The storage device 10 is an auxiliary storage device for the host computer 40. The storage device 10 includes a RAID controller 20 that receives disk I / O requests from the host computer 40, a physical disk 30 on which the RAID controller 20 performs disk I / O, and a host bus interface 15.

[0020] The RAID controller 20 is a functional element capable of configuring a logical disk (virtual drive) using multiple physical disks 30. In this embodiment, the RAID controller 20 can control the host computer 40 so that the multiple physical disks 30 appear as a single logical disk. In this embodiment, the RAID controller 20 can, for example, provide a mirroring function compliant with RAID 1, thereby improving the performance and fault tolerance of the storage device 10.

[0021] The physical disk 30 is the storage medium of the storage device 10. The physical disk 30 can be implemented as, for example, a hard disk drive (HDD) or a solid-state drive (SSD). The physical disk 30 may consist of multiple physical disks 30a to 30n. The physical disks 30a to 30n have data areas 31a to 31n where data is stored, and spare areas 32a to 32n with a capacity equal to or greater than that of the data areas 31a to 30n. The data areas 31a to 31n are the areas that are subject to disk input / output requests from the host computer 40. The spare areas 32a to 32n are areas on which data recorded in the data areas 31a to 31n can be recorded.

[0022] The host bus interface 15 is an interface that connects the host computer 40 and the storage device 10. The host bus interface 15 can use a bus interface that conforms to standards such as PCI Express®.

[0023] The host computer 40 is a computer that performs disk input / output requests (disk I / O requests). The host computer 40 has an arithmetic unit (not shown) including a CPU (Central Processing Unit) and a main memory (not shown) for storing data and programs. The host computer 40 is connected to the storage device 10 and can send commands to the storage device 10 for reading and writing data. The host computer 40 is equipped with a host bus interface 15.

[0024] Next, with reference to Figure 2, the functional configuration of the computer system in this embodiment will be described in detail.

[0025] As shown in Figure 2, the RAID controller 20 has a host interface 210, a SoC (System on a Chip) 220, and a storage medium interface 240.

[0026] The host interface 210 is a bus interface that connects to the host computer 40. The host interface 210 constitutes the host bus interface 15. The host interface 210 can be implemented using a standard such as PCI Express.

[0027] The SoC220 is a processor that controls the RAID controller 20. The SoC220 has RAID firmware 222. The SoC220 constructs logical disks 224.

[0028] RAID firmware 222 is a program element that enables the SoC220 to function. RAID firmware 222 is a RAID control program that implements RAID functionality. RAID firmware 222 may be stored in a rewritable, non-volatile memory (not shown).

[0029] The storage medium interface 240 is an interface for connecting multiple physical disks 30. The storage medium interface 240 can use interface standards such as SATA (Serial Advanced Technology Attachment), SCSI (Small Computer System Interface; registered trademark), and SAS (Serial Attached SCSI). The storage medium interface 240 can accommodate multiple physical disks 30. The RAID firmware 222 configures a logical RAID drive (logical disk) using the multiple physical disks 30 with known RAID technology.

[0030] As shown in Figure 2, the host computer 40 has an operating system (OS) 120 and applications 140.

[0031] OS120 is a program element that controls the host computer 40. OS120 is deployed in the main memory of the host computer 40 (not shown) and is operated by a CPU (not shown). OS120 includes a file system 121 and a device driver 122.

[0032] The device driver 122 is a program element (device driver) that manages the connection path (channel) between the storage device 10 and the host computer 40. In the example shown in Figure 2, the device driver 122 recognizes the logical disk 224 as a RAID drive. Generally, the device driver 122 is built into OS 120 as standard.

[0033] The file system 121 is a functional element that manages data stored on the logical disk 224 provided by the RAID controller 20. The file system 121 is built into OS 120 as a standard feature.

[0034] Application 140 is a program element that runs on OS 120. Application 140 can access the logical disk 224, which is configured as a RAID drive by the storage device 10.

[0035] (Operation of the first embodiment) Next, the operation of the storage device of the embodiment will be described in detail with reference to Figures 1-2 and 7-13. Figure 7 is a flowchart illustrating the operation of the storage device according to the first embodiment. Figures 8-13 are diagrams showing the data writing process of the physical disk in the storage device according to the first embodiment. In the following description, the storage device 10 will be described as having two physical disks 30a and 30b (n = "b").

[0036] The RAID firmware 222 divides the storage area of ​​each physical disk 30a and 30b into two parts to form data areas 31a and 31b and spare areas 32a and 32b (S100). It is desirable that the data areas 31a and 31b have a capacity equal to or smaller than the spare areas 32a and 32b.

[0037] Figure 8 shows how data areas 31a and 31b, and spare areas 32a and 32b are formed on physical disks 30a and 30b, respectively. As shown in Figure 8, the physical disks 30a and 30b forming the RAID configuration have media error sectors generated due to long-term use. Specifically, data area 31a has media error sectors e11 to e13, and spare area 32a has media error sector e14. In addition, data area 31b has media error sector e15, and spare area 32b has media error sector e16.

[0038] RAID firmware 222 monitors the status of physical disks 30a and 30b (S110). Examples of data monitored by RAID firmware 222 include the number of media errors on the physical disks and SMART (Self-Monitoring, Analysis, and Reporting Technology) data.

[0039] If the physical disks 30a and 30b are in a normal state and are subject to preventive maintenance, i.e., there is no need to preventively maintain the physical disks (No in S120), the RAID firmware 222 continues to monitor the physical disks 30a and 30b (S110).

[0040] For example, if the RAID firmware 222 detects that the gradient of the number of media errors occurring on physical disk 30a or the value of the SMART data exceeds a threshold (Yes in S120), the RAID firmware 222 determines that physical disk 30a is a target for preventive maintenance. A target for preventive maintenance is a physical disk where data preservation is necessary before replacing the physical disk. The RAID firmware 222 issues a command to read the data stored in the data area 31a of physical disk 30a and write the read data to the spare area 32b of physical disk 30b (S130). Figure 9 shows how the data stored in the data area 31a is copied to the spare area 32b. As shown in Figure 9, the data of the physical disk that has been designated for preventive maintenance is preserved in the spare area of ​​another physical disk.

[0041] If the RAID firmware 222 fails to read data stored in the data area 31a of physical disk 30a in S130, that is, if it detects a media error sector, it issues a command to read the corresponding data from the data area 31b of physical disk 30b for the sector related to the data that failed to be read, and to write the read data to the spare area 32b of physical disk 30b (S140). In other words, RAID1 copies the data from the data area of ​​another physical disk that has the same content to its spare area. Figure 10 shows how the data that failed to be read is copied from the data area 31b to the spare area 32b. This prevents data loss due to the inability to copy media error sectors that occurred on the data area 31a of physical disk 30a to the spare area 32b of physical disk 30b.

[0042] RAID firmware 222 disconnects from physical disk 30a after it has finished copying all data from data area 31a of physical disk 30a. At this stage, the user can replace physical disk 30a (S150).

[0043] When the new physical disk 30a is connected, the RAID firmware 222 issues a command to read the data copied in S130 and S140 from the spare area 32b of physical disk 30b and write it to the data area 31a of the replaced physical disk 30a (S160). This process is a rebuild process and is called copyback. Figure 11 shows the copyback process from the spare area 32b to the data area 31a.

[0044] If a media error sector is detected in the spare area 32b of physical disk 30b during copyback, the RAID firmware 222 issues a command to write from the data area 31b of physical disk 30b to the data area 31a of physical disk 30a (S170). In other words, instead of the media error sector in the spare area 32b of physical disk 30b, the data from the normal sector in the corresponding data area 31b is used to write to the data area 31a of physical disk 30a.

[0045] Figure 12 shows how, when the spare area 32b has a media error sector e16, the data from the corresponding sector e16b is copied from the data area 31b to the data area 31a in place of the media error sector e16. Through this operation, the media error sector e16 that was present in the spare area 32b shown in Figure 13 is restored, and the correct data is written to the data area 31a.

[0046] If the host computer 40 receives a disk I / O request during the copy-back process indicating a write command different from the rebuild process (Yes in S180), the RAID firmware 222 executes the received write command on the data area 31a of physical disk 30a, the data area 31b of physical disk 30b, and the spare area 32b of physical disk 30b (S190). This ensures data integrity in each area even when a write command is received during the copy-back process.

[0047] Furthermore, the execution of the data integrity maintenance process in S190 is not limited to when a disk I / O request is received during copy-back. It is also performed during the write operation from the data area 31a of physical disk 30a to the spare area 32b of physical disk 30b in S130 and S140. In other words, it is performed not only during copy-back after replacement of a physical disk subject to preventive maintenance, but also during data copying of the physical disk subject to preventive maintenance.

[0048] In the example described above, RAID firmware 222 uses write commands to perform data copying between areas, but is not limited to this. Instead of write commands, write and verify commands may be issued. This allows physical disk 30b to perform auto reassignment to repair the erroneous sectors in the destination media, even if there are media errors in the spare area 32b of physical disk 30b in S130 or S140.

[0049] Thus, according to the computer system 1 of the first embodiment, a reserve area is provided on the physical disk, and data is saved to the reserve area before the physical disk is replaced. Therefore, even when the physical disk is replaced and a rebuild process is performed, data loss can be prevented.

[0050] (Operation of the second embodiment) Next, the operation of the storage device of the second embodiment will be described in detail with reference to Figures 1-2 and 14. Figure 14 is a flowchart illustrating the operation of the storage device according to the second embodiment. The storage device of the first embodiment is an example in which two physical disks 30 constitute RAID1, but it can be similarly applied to other RAID levels except RAID0. The storage device of the second embodiment is an example in which any n physical disks shown in Figure 2 constitute RAID. In the following description, the storage device 10 will be described as having n physical disks 30a to 30n.

[0051] The RAID firmware 222 divides the storage area of each of the physical disks 30a to 30n into two parts to form data areas 31a to 31n and spare areas 32a to 32n (S200). It is desirable that the data areas 31a to 31n have a capacity equal to or smaller than that of the spare areas 32a to 32n.

[0052] The RAID firmware 222 monitors the states of the physical disks 30a to 30n (S210). Examples of the monitoring content by the RAID firmware 222 include the number of media errors occurring in the physical disks and S.M.A.R.T. (Self-Monitoring, Analysis, and Reporting Technology) data.

[0053] When the states of the physical disks 30a to 30n are normal and there is no need for preventive maintenance, that is, when there is no need to preventively maintain the physical disks (No in S220), the RAID firmware 222 continues to monitor the physical disks 30a to 30n (S210).

[0054] For example, when the RAID firmware 222 detects that the gradient of the number of media errors in the physical disk 30n or the value of the S.M.A.R.T. data exceeds the threshold (Yes in S220), the RAID firmware 222 determines that the physical disk 30n is a preventive maintenance target. The RAID firmware 222 issues a command to read the data stored in the data area 31n of the physical disk 30n and write the read data to the spare area 32m of the physical disk 30m (1 ≤ m < n) (S230).

[0055] In S230, if the RAID firmware 222 fails to read data stored in the data area 31n of physical disk 30n, i.e., if it detects a media error sector, it issues a command (S240) to read the corresponding data from the data area of ​​another physical disk other than physical disk 30n for the sector related to the data that failed to be read, and to write the read data to the spare area 32m of physical disk 30m. In other words, the RAID copies the data from the data area of ​​another physical disk that has the same content to the spare area 32m of physical disk 30m. This prevents data loss that would occur if a media error sector on the data area 31n of physical disk 30n could not be copied to the spare area 32m of physical disk 30m.

[0056] RAID firmware 222 disconnects from physical disk 30n once it has finished copying all data from data area 31n of physical disk 30n. At this stage, the user can replace physical disk 30n (S250).

[0057] When the new physical disk 30n is connected, the RAID firmware 222 issues a command to read the data copied in S230 and S240 from the spare area 32m of physical disk 30m and write it to the data area 31n of the replaced physical disk 30n (S260).

[0058] If a media error sector is detected in the spare area 32m of physical disk 30m during copyback, the RAID firmware 222 issues a command to write from the data area of ​​another physical disk (other than physical disk 30n) to the data area 31n of physical disk 30n (S270). In other words, instead of the media error sector in the spare area 31m of physical disk 30m, the data is written to the data area 31n of physical disk 30n using data from a normal sector in the corresponding data area.

[0059] Similar to the first embodiment, if a disk I / O request is received during copy-back processing or while data is being copied from a physical disk subject to preventive maintenance, processing is performed to maintain data integrity.

[0060] In the example described above, physical disks 30a to 30n each store common data, but this is not limited to this. It may also be applied to configurations such as RAID5 or RAID6, where, for example, physical disks 30a and 30b form a pair of storage media, and physical disk 30c is a storage medium that stores their parity data. In this case, even if a media error sector is detected in the spare area 32b during the copy-back process from the spare area 32b of physical disk 30b to the data area 31a of physical disk 30a after replacing physical disk 30a, the media error sector can be restored using the data in the data area 31b of physical disk 30b and the parity data in the data area 31c of physical disk 30c.

[0061] In the example described above, RAID firmware 222 uses write commands to perform data copying between areas, but it is not limited to this. Instead of write commands, a Write and Verify command may be issued. This allows, for example, if there is a media error in the spare area 32m of physical disk 30m in S230 or S240, physical disk 30m can perform Auto Reassign to repair the media error sectors in the copy destination.

[0062] Thus, according to the computer system of the second embodiment, a reserve area is provided on the physical disk, and data is saved to the reserve area before the physical disk is replaced. Therefore, even when the physical disk is replaced and a rebuild process is performed, data loss can be prevented.

[0063] (Configuration of the third embodiment) Next, the storage device of the third embodiment will be described with reference to Figures 15 and 16. Figure 15 is a block diagram showing the configuration of the computer system according to the third embodiment. Figure 16 is a flowchart illustrating the operation of the storage device of the third embodiment. The storage devices of the first and second embodiments had a spare area with a capacity equal to or greater than the data area configured in the physical device. The storage device of the third embodiment aims to improve capacity efficiency compared to the first and second embodiments. In the following description, common components and operations with the first and second embodiments will be denoted by the same reference numerals, and redundant explanations will be omitted.

[0064] As shown in Figure 15, the computer system 3 of this embodiment includes a storage device 13, a host bus interface 15, and a host computer 40. The storage device 13 is connected to the host computer 40 via the host bus interface 15. The storage device 13 comprises a RAID controller 20 and a plurality of physical disks 33 (33a, ..., 33n).

[0065] The storage device 13 is an auxiliary storage device for the host computer 40. The storage device 13 includes a RAID controller 23 that receives disk I / O requests from the host computer 40, a physical disk 33 on which the RAID controller 23 performs disk I / O, and a host bus interface 15.

[0066] The RAID controller 23 is a functional element capable of configuring a logical disk (virtual drive) using multiple physical disks 33. In this embodiment, the RAID controller 23 can control the host computer 40 so that the multiple physical disks 33 appear as a single logical disk.

[0067] The physical disk 33 is the storage medium of the storage device 13. The physical disk 33 can be implemented as, for example, a hard disk drive (HDD) or a solid-state drive (SSD). The physical disk 33 can be composed of multiple physical disks 33a to 33n. The physical disks 33a to 33n have data areas 34a to 34n where data is stored, and parity areas 35a to 35n.

[0068] As shown in Figure 15, data areas 34a to 34n have m data areas (1) to (m). Data areas (1) to (m) are the areas that are the target of disk input / output requests from the host computer 40. Parity areas 35a to 35n are areas that store the exclusive OR of data areas (1) to (m). In other words, the physical disks 33a to 33n in this embodiment store the contents that would normally be stored on m physical disks constituting RAID4 (fixed parity) by dividing each physical disk 33 into m data areas (1) to (m). Parity areas 35a to 35n store the exclusive OR of each data area (1) to (m) of physical disks 33a to 35n.

[0069] In this embodiment, when data writing is performed, in addition to the normal RAID processing, the data obtained by taking the exclusive OR of each data area of ​​each physical disk is written to the parity area of ​​each physical disk. When a physical disk is replaced, the normal rebuild process is performed, but if a media error is found in the read physical disk, the data is restored from the data area and parity area of ​​other physical disks.

[0070] (Operation of the third embodiment) Next, the operation of the storage device of the third embodiment will be described in detail. The RAID firmware 222 divides the storage area of ​​each physical disk 33a to 33n into m+1 parts, forming m data areas 34a to 34n and parity areas 35a to 35n (S300). The storage device 13 stores data in each data area (1) to (m) of each physical disk 33a to 33n, for example using RAID 4, and stores the exclusive OR of data areas (1) to (m) in the parity area.

[0071] RAID firmware 222 monitors the status of physical disks 33a to 33n (S310). Examples of data monitored by RAID firmware 222 include the number of media errors on the physical disks and SMART (Self-Monitoring, Analysis, and Reporting Technology) data.

[0072] If the physical disks 33a to 33n are in a normal state and are subject to preventive maintenance, i.e., there is no need to preventively maintain the physical disks (No in S320), the RAID firmware 222 continues to monitor the physical disks 33a to 33n (S310).

[0073] For example, if the RAID firmware 222 detects that the gradient of media errors on physical disk 30n or the value of the SMART data exceeds a threshold (Yes in S320), or if it detects a problem with physical disk 33a, the RAID firmware 222 determines that physical disk 33a is subject to preventive maintenance and disconnects the physical disk 33a. At this stage, the user can replace physical disk 33a (S330).

[0074] The RAID firmware 222 reads the data stored in the data areas 34b to 34n and parity areas 35b to 35n of physical disks 33b to 33n (S340), and issues a command to write the data obtained based on the read data to the data area 34a and parity area 35a of physical disk 30a (S350).

[0075] If a read fails in S340 (Yes in S360), for example, if a media error sector is found in the data area 34m of physical disk 33m, RAID firmware 222 issues a command to read the corresponding data from the data area and parity area of ​​one of the physical disks 33b to 33n (excluding physical disk 30m) for the sector related to the data that failed to be read, and then write the restored data based on the read data to the corresponding sector on physical disk 30m (S370, S380). In other words, the data of the media error sector on physical disk 30m is restored from the data area and parity area of ​​the other physical disks which have the same content due to RAID, and copied to physical disk 33m.

[0076] In the example described above, RAID firmware 222 uses write commands to perform data copying between areas, but it is not limited to this. Instead of write commands, a Write and Verify command may be issued. This allows, for example, if there is a media error in the spare area 32m of physical disk 30m in S230 or S240, physical disk 30m can perform Auto Reassign to repair the media error sectors in the copy destination.

[0077] While several embodiments of the present invention have been described, these embodiments are presented as examples only and are not intended to limit the scope of the invention. These novel embodiments can be carried out in a variety of other forms, and various omissions, substitutions, and modifications can be made without departing from the spirit of the invention. These embodiments and their variations are included in the scope and spirit of the invention, as well as in the claims of the invention and its equivalents. [Explanation of Symbols]

[0078] 1,3,9…Computer Systems 10, 13, 19… Memory devices 15…Host bus interface 20, 23, 29…RAID controllers 30, 30a~30n, 33, 33a~33n, 39, 39a~39n... Physical disk 31a~31n, 34b~34n...Data area 32a~32n... Reserve area 35a~35n... Parity region 40…Host computer 121… File System 122... Device Driver 140…Application 210…Host Interface 220…SoC 222…RAID firmware 224...Logical disk 240…Storage medium interface e1... Media error e2~e6... Media error sectors e7~e10... sectors e11~e16... Media error sectors e16b... sector

Claims

1. A controller that reads and writes common data to a first and a second physical disk to configure a logical disk, and is capable of executing disk input / output requests to the first and second physical disks in response to disk input / output requests to the logical disk received from a host computer, For each of the first and second physical disks, there is an area setting unit that sets a data area to be the target of the disk input / output request and a reserve area on which the data recorded in the data area can be recorded, A monitoring unit that monitors the status of the first and second physical disks, If the monitoring unit detects an error in the first physical disk as a result of its monitoring, the maintenance processing unit writes the data from the data area of ​​the first physical disk to the spare area of ​​the second physical disk. A controller equipped with [a specific feature / feature].

2. The controller according to claim 1, characterized in that when the first physical disk is replaced, the maintenance processing unit writes the data from the spare area of ​​the second physical disk to the data area of ​​the first physical disk.

3. The controller according to claim 1, characterized in that the maintenance processing unit writes the data in the data area of ​​the first physical disk to the spare area of ​​the second physical disk and performs an inspection process.

4. The controller according to claim 1, characterized in that when the maintenance processing unit receives a disk input / output request indicating a write to the logical disk while it is writing data from the data area of ​​the first physical disk to the spare area of ​​the second physical disk, the maintenance processing unit performs the write operation related to the disk input / output request to the data area of ​​the first physical disk and the spare area of ​​the second physical disk.

5. The controller according to claim 2, characterized in that when the maintenance processing unit receives a disk input / output request indicating a write to the logical disk while it is writing data from the spare area of ​​the second physical disk to the data area of ​​the first physical disk, the maintenance processing unit performs the write operation related to the disk input / output request to the spare area of ​​the second physical disk and the data area of ​​the first physical disk.

6. A controller that configures a logical disk using first, second, and third physical disks, and is capable of executing disk input / output requests to the first, second, and third physical disks in response to disk input / output requests to the logical disk received from a host computer, For each of the first, second, and third physical disks, there is an area setting unit that sets a data area to be the target of the disk input / output request and a reserve area on which the data recorded in the data area can be recorded, A monitoring unit that monitors the status of the first, second, and third physical disks, The system includes a maintenance processing unit that, when the monitoring unit detects an error in the first physical disk, writes the data from the data area of ​​the first physical disk to the reserve area of ​​the second physical disk. When the first physical disk is replaced, the maintenance processing unit writes the data from the spare area of ​​the second physical disk to the data area of ​​the first physical disk. A controller characterized in that, when the maintenance processing unit is writing data from the data area of ​​the first physical disk to the spare area of ​​the second physical disk, or when the maintenance processing unit is writing data from the spare area of ​​the second physical disk to the data area of ​​the first physical disk, it receives a disk input / output request indicating a write to the logical disk, the maintenance processing unit performs the write operation related to the disk input / output request on the data area of ​​the second physical disk and the data area of ​​the third physical disk.

7. A controller that configures a logical disk using first, second, and third physical disks, and is capable of executing disk input / output requests to the first, second, and third physical disks in response to disk input / output requests to the logical disk received from a host computer, For each of the first, second, and third physical disks, an area setting unit sets a plurality of data areas that are the target of the disk input / output request and a parity area that can record the exclusive OR of the data recorded in the plurality of data areas. When the first physical disk is replaced, a rebuild execution unit writes the data from the multiple data areas of the second physical disk to the data areas of the first physical disk. A controller comprising: a recovery unit that, when an error is detected in the data area of ​​the second physical disk, recovers data based on the data stored in the plurality of data areas and the parity area of ​​the third physical disk and writes the recovered data to the data area of ​​the first physical disk.

8. A storage device comprising the controller according to any one of claims 1 to 7.