Test method and device for disk failure, electronic equipment and storage medium

CN116312737BActive Publication Date: 2026-06-16CHINA UNITED NETWORK COMM GRP CO LTD +2

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
CHINA UNITED NETWORK COMM GRP CO LTD
Filing Date
2023-03-22
Publication Date
2026-06-16

AI Technical Summary

Technical Problem

Storage system testers from non-storage server vendors often find it difficult to directly obtain damaged disks to construct disk failure scenarios, and they also lack the ability to develop Linux kernel storage driver layers, making it difficult to detect storage hardware failures.

Method used

By utilizing Linux kernel hot patching technology, target faults are constructed in test cases. By modifying the response results of I/O operation callback handling functions and compiling kernel modules, target patches are generated to detect disk faults.

🎯Benefits of technology

It enables the dynamic construction of storage hardware failure scenarios without relying on actual disk failures, and tests the handling process of the processing software to detect the type of storage hardware failure.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN116312737B_ABST
    Figure CN116312737B_ABST
Patent Text Reader

Abstract

The application provides a disk fault test method and device, electronic equipment and a storage medium. The method is applied to a Linux system. In the execution process of a test case, a target patch is run to construct a target fault of a disk. The target patch is determined based on a preset type of small computer system interface (SCSI) fault. Then, it is determined whether the disk has the target fault. In the technical solution, a certain fault of a disk is constructed, so that a processing flow of the fault is executed by processing software and is tested, so that it is detected that the storage hardware has the fault.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of computer technology, and in particular to a method, apparatus, electronic device and storage medium for testing disk failure. Background Technology

[0002] In the field of computer technology, kernel failures often affect the authenticity and reliability of business input and output. Therefore, it is necessary to continuously investigate the causes of kernel failures and then take corresponding remedial measures to reduce the corresponding losses.

[0003] For testers of (centralized / distributed) storage systems who are not from storage server vendors, it is generally difficult to directly obtain various damaged disks to construct the above disk failure scenarios, and there are also no Linux kernel storage driver layer developers to perform professional disk failure injection development.

[0004] Therefore, how to detect what kind of faults exist in storage hardware has become an urgent technical problem to be solved. Summary of the Invention

[0005] This application provides a method, apparatus, electronic device, and storage medium for testing disk failures, in order to solve the technical problem of how to detect what kind of failure exists in storage hardware, which is an urgent issue to be addressed in the prior art.

[0006] Firstly, this application provides a method for testing disk failures, applied to a Linux system, the method comprising:

[0007] During the execution of the test cases, a target patch is run to construct a target fault present on the disk. The target patch is determined based on a preset type of Small Computer System Interface (SCSI) fault.

[0008] Determine whether the target fault exists on the disk.

[0009] In one possible design of the first aspect, before running the target patch to construct the target fault present on the disk, the method further includes:

[0010] When responding to the callback function of an input / output I / O operation, the response result corresponding to the I / O operation is modified to the target fault;

[0011] Compile the Linux kernel module to obtain the target patch.

[0012] In this possible design, before performing the compilation of the Linux kernel module to obtain the target patch, the method further includes:

[0013] After modifying the response result to the target fault, a mechanism for exiting the modification is added.

[0014] In another possible design of the first aspect, the SCSI failure includes: read / write media errors, other disk errors, and return errors specified in the SCSI protocol.

[0015] In another possible design of the first aspect, the running target patch includes:

[0016] Get the user-defined preset values;

[0017] Based on the random number and the preset value, determine the ratio of the random number to the preset value;

[0018] If the ratio is an integer, the target patch is executed.

[0019] In this possible design, the method further includes:

[0020] If the ratio is not an integer, the target patch will not be run.

[0021] Secondly, this application provides a disk failure testing device applied to a Linux system, the device comprising:

[0022] The construction module is used to run a target patch during the execution of test cases to construct a target fault existing on the disk. The target patch is determined based on a preset type of Small Computer System Interface (SCSI) fault.

[0023] A determination module is used to determine whether the target fault exists on the disk.

[0024] In one possible design of the second aspect, before running the target patch to construct the target fault present on the disk, the apparatus further includes:

[0025] The modification module is used to modify the response result corresponding to the I / O operation to the target fault when responding to the input / output I / O operation callback processing function;

[0026] The determining module is also used to perform the compilation of the Linux kernel module to obtain the target patch.

[0027] In this possible design, before performing the compilation of the Linux kernel module to obtain the target patch, the apparatus further includes:

[0028] The processing module is used to add a mechanism to exit the modification after the response result is modified to the target fault.

[0029] In another possible design in the second aspect, the SCSI failures include: read / write media errors, other disk errors, and return errors specified in the SCSI protocol.

[0030] In another possible design, the construction module runs the target patch, specifically for:

[0031] Get the user-defined preset values;

[0032] Based on the random number and the preset value, determine the ratio of the random number to the preset value;

[0033] If the ratio is an integer, the target patch is executed.

[0034] In this possible design, the construction module is also specifically used for:

[0035] If the ratio is not an integer, the target patch will not be run.

[0036] Thirdly, this application provides an electronic device, including: a processor, and a memory and a transceiver communicatively connected to the processor;

[0037] The memory stores computer-executed instructions; the transceiver is used for sending and receiving data.

[0038] The processor executes computer execution instructions stored in the memory to implement the disk failure testing method as described in the first aspect or any of the above embodiments.

[0039] Fourthly, this application provides a computer-readable storage medium storing computer-executable instructions, which, when executed by a processor, are used to implement the disk failure testing method described in the first aspect or any of the above embodiments.

[0040] This application provides a disk failure testing method, apparatus, electronic device, and storage medium. The method, applied to a Linux system, constructs a target fault on the disk by running a target patch during test case execution. This target patch is determined based on a preset type of Small Computer System Interface (SCSI) fault. The existence of the target fault on the disk is then determined. In this technical solution, by constructing a disk failure, the processing software's handling procedure for that fault is executed and tested, thereby detecting the type of fault in the storage hardware. Attached Figure Description

[0041] The accompanying drawings, which are incorporated in and form part of this specification, illustrate embodiments consistent with this application and, together with the description, serve to explain the principles of this application.

[0042] Figure 1 The execution flowchart of the pre-patch function provided in the embodiments of this application;

[0043] Figure 2 The execution flowchart of the patched function provided in the embodiments of this application;

[0044] Figure 3 This is a schematic diagram of the kernel-mode input / output call stack provided in an embodiment of this application;

[0045] Figure 4 Flowchart of the disk failure testing method provided in the embodiments of this application Figure 1 ;

[0046] Figure 5 Flowchart of the disk failure testing method provided in the embodiments of this application Figure 2 ;

[0047] Figure 6 Flowchart of the disk failure testing method provided in the embodiments of this application Figure 3 ;

[0048] Figure 7 Flowchart of the disk failure testing method provided in the embodiments of this application Figure 4 ;

[0049] Figure 8 A schematic diagram of the structure of an embodiment of the disk failure testing device provided in this application;

[0050] Figure 9 This is a schematic diagram of the structure of an electronic device provided in an embodiment of this application.

[0051] The accompanying drawings have illustrated specific embodiments of this disclosure, which will be described in more detail below. These drawings and descriptions are not intended to limit the scope of the concept in any way, but rather to illustrate the concepts of this disclosure to those skilled in the art through reference to particular embodiments. Detailed Implementation

[0052] To make the objectives, technical solutions, and advantages of the embodiments of this application clearer, the technical solutions of the embodiments of this application will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of this application, not all embodiments. Based on the embodiments of this application, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of this application.

[0053] Before introducing the embodiments of this application, the technical terms and application background of the embodiments of this application will be explained first:

[0054] Linux kernel hot patching technology:

[0055] Linux kernel hot patching is a technology that dynamically patches the kernel without restarting the operating system. Based on this technology, kernel program bugs or security vulnerabilities can be fixed without restarting the operating system, minimizing operating system downtime and increasing operating system availability.

[0056] Currently, Linux kernel hot patching technology is widely used in large-scale cloud computing. Its principle is mainly to replace the reserved bytes at the entry point of the "function being replaced" with a jump instruction during patching, causing a jump to the execution flow of the "new function" provided by the patch, thus achieving online replacement of the execution flow at the function level. There are several different implementation schemes for hot patching, but the principle is the same. The function execution flow before and after patching is as follows. Figure 1 and Figure 2 As shown:

[0057] Figure 1 This is a flowchart illustrating the execution of the pre-patch function provided in an embodiment of this application. Figure 2 The flowchart illustrates the execution of the patched function provided in this application embodiment.

[0058] Specifically, in Figure 1 In the context of patching, upon receiving an operation request (call), the system responds by executing the original function to return the execution result.

[0059] exist Figure 2 In the context of patching, upon receiving an operation request, the system responds to that request by skipping the original function and executing the replacement function to obtain the corresponding execution result. Specifically, kpatch... Figure 2 In step B), `stop_machine()` is called to ensure that no thread is currently calling the original function, thus guaranteeing the safety of the operating system; ftrace technology is used ( Figure 2 A) In this context, kernel function replacement is implemented.

[0060] Linux kernel interrupt handling bottom half mechanism:

[0061] Interrupt service routines execute when interrupt requests are disabled. During this time, the CPU cannot respond to other interrupt requests. Therefore, the design principle of the Linux kernel is to process interrupt requests as quickly as possible and postpone more processing. The kernel divides interrupt handling into two parts: the top half and the bottom half. The top half is the interrupt service routine, which is executed immediately by the kernel and interrupts are disabled during its execution. The bottom half consists of some kernel functions that allow interrupt requests.

[0062] Linux kernel Small Computer System Interface (SCSI) subsystem Figure 3 (Diagram of kernel-mode input / output call stack provided in the embodiments of this application):

[0063] like Figure 3 As shown, in a Linux system, user-mode file read and write operations are performed through system calls, which invoke the kernel-mode Virtual File System (VFS) and the specific file system processing. The file system calls the block input / output (I / O) subsystem for processing. After optimization processing such as scheduling and sorting, the block I / O subsystem calls the SCSI subsystem. The SCSI subsystem converts the block device's input / output (BIO) into SCSI protocol standard I / O, and then calls the underlying hardware driver interface to submit the read / write request to the disk.

[0064] The file systems include: second extended filesystem (ext2), third extended filesystem (ext3), and a high-performance journaling file system, XFS.

[0065] In addition, the SCSI subsystem checks the read / write response results returned by the underlying driver and performs corresponding error recovery processing for erroneous commands. The underlying command return is parsed and categorized into the following error codes (where NO_SENSE0x00 represents success), such as not ready (meaning the hardware is not yet ready); medium_error (meaning a read / write media error); which could be a physical media error or a logical media error (recoverable).

[0066] For testing the stability and chaos of data paths in storage systems:

[0067] For testers of (centralized / distributed) storage systems who are not from storage server vendors, it is difficult to directly obtain various bad disks to construct the above disk failure scenarios, and there are no Linux kernel storage driver layer developers to perform professional fault injection development.

[0068] Therefore, the following technical problems exist: During long-term stability and chaos testing of the storage system data path, it is difficult to cover various underlying storage hardware failure scenarios and their impact on the entire storage system. In real production environments, after long-term hardware use, the probability of various hardware-related problems increases significantly. With different combinations of hardware and software, various issues may be exposed, severely impacting business I / O, often requiring lengthy localization and presenting significant troubleshooting challenges.

[0069] In view of the technical problems existing in the prior art, the inventors of this application have the following concept: if the Linux kernel hot patching technology can be used to dynamically implement the construction of various storage hardware failure scenarios, during testing, if a certain type of disk failure is constructed, the handling process of the previous processing software for this type of failure will be executed and tested. If there is no real bad disk with this problem, and it is not constructed, these code exception handling branches cannot be tested, thereby realizing the determination of the failure.

[0070] The technical solution of this application will now be described in detail through specific embodiments. It should be noted that the following specific embodiments can be combined with each other, and the same or similar concepts or processes may not be described again in some embodiments.

[0071] It is worth noting that the application areas of the disk failure testing methods, devices, electronic equipment, and storage media disclosed herein are not limited.

[0072] In addition to the application scenarios mentioned in the above technical background, the solution involved in this application can also be applied to other storage subsystems, such as NVMe devices that use the standard bus (Peripheral Component Interconnect, PCI) that defines a local bus. The same method can be used to implement hot patching technology at the corresponding kernel I / O done processing interface to dynamically construct various storage hardware failure scenarios.

[0073] In this application, the executing entity is an electronic device, specifically an application to a Linux operating system installed on an electronic device, which may be a server, computer, or other such device.

[0074] Figure 4 Flowchart of the disk failure testing method provided in the embodiments of this application Figure 1 ,like Figure 4As shown, the test method for this disk failure may include the following steps:

[0075] Step 41: During the execution of the test cases, run the target patch to construct the target fault that exists on the disk.

[0076] The target patch is determined based on a preset type of Small Computer System Interface (SCSI) fault.

[0077] In this step, within a specific test case, a target patch constructed from a type of SCSI fault can be injected and applied to the electronic device to construct a fault present in the disk of the electronic device.

[0078] The electronic device in question can be a storage server under test, etc.

[0079] Optional SCSI failures include: read / write media errors, other disk errors, and return errors specified in the SCSI protocol.

[0080] In other words, the target fault can be one of the following: read / write media error, other disk error, return error specified in the SCSI protocol, etc.

[0081] It should be understood that not all possible failures are listed here.

[0082] Optionally, this test case can be a test case in the long-term stability test or the chaos test of the storage system.

[0083] Step 42: Determine if the target fault exists on the disk.

[0084] In this step, under the scenario of a target fault existing on the constructed disk, the previous processing flow of the target fault by the processing software will be executed and thus tested, that is, it can be detected whether the target fault exists on the disk.

[0085] In other words, without a real damaged disk that is the target failure, and without constructing a scenario that is the target failure, it is impossible to test these code exception handling branches.

[0086] The disk failure testing method provided in this application is applied to a Linux system. During the execution of test cases, a target patch is run to construct a target failure present on the disk. This target patch is determined based on a preset type of Small Computer System Interface (SCSI) failure. The method then determines whether the disk exhibits the target failure. In this technical solution, by constructing a disk failure, the processing software's handling procedure for that failure is executed and tested, thereby detecting the type of failure in the storage hardware.

[0087] Figure 5Flowchart of the disk failure testing method provided in the embodiments of this application Figure 2 ,like Figure 5 As shown, prior to step 41, the disk failure testing method may include the following steps:

[0088] Step 51: When responding to the IO operation callback function, modify the response result corresponding to the IO operation to the target fault.

[0089] In this step, when responding to the IO operation callback function, the response result corresponding to the IO operation, i.e. the underlying response result, is modified to the target fault.

[0090] The target fault can be one of the following: read / write media error, other disk error, or return error specified in the SCSI protocol.

[0091] Optionally, the processing function can be the sd_done function of the processing interface.

[0092] That is, at the entry point of the processing function, the underlying response results are constructed as different error types.

[0093] Furthermore, after modifying the response result to the target fault, a mechanism for exiting the modification is added.

[0094] The mechanism for exiting modifications is designed to avoid affecting the normal processing flow of software on the operating system.

[0095] That is, at the exit of the processing function, the module_exit function of the patch module is executed (for example, the exit mechanism of the modification can be: using the import count method, setting the import count to 0, and the module exits automatically), so that the patch module exits and the software processing flow of the operating system is restored.

[0096] Step 52: Compile the Linux kernel module to obtain the target patch.

[0097] In this step, the Linux kernel module contains multiple functions, including functions for module initialization and module exit. After modifying the response result to the target fault, the Linux kernel module is compiled to obtain a static target patch.

[0098] In one possible implementation, the target patch could be:

[0099] Patch 1: sshdr.sense_key = MEDIUM_ERROR (Read / write media error);

[0100] Patch 2: sshdr.sense_key = HARDWARE_ERROR (Other disk errors);

[0101] Patch 3: sshdr.sense_key = RECOVERED_ERROR (the return error specified in the SCSI protocol).

[0102] It should be understood that this does not exhaustively list all possible patches for all possible faults.

[0103] The disk failure testing method provided in this application modifies the response result of the IO operation to the target failure when responding to the IO operation callback handling function, and then compiles the Linux kernel module to obtain the target patch. In this technical solution, by modifying the code in a suitable Linux data channel IO callback handling function to modify the IO response result to the target failure and adding code related to the automatic exit handling mechanism, compilation is performed to generate the corresponding patch, without affecting subsequent normal disk operations.

[0104] Figure 6 Flowchart of the disk failure testing method provided in the embodiments of this application Figure 3 ,like Figure 6 As shown, running the target patch may include the following steps:

[0105] Step 61: Obtain the preset values ​​set by the user.

[0106] In this step, in order to construct a random time interval to trigger a random patch call, a combination of random numbers and preset values ​​can be used, i.e., the user-defined preset values ​​can be obtained first.

[0107] In one possible implementation, the preset value could be 500, 5000, 50000, etc.

[0108] Step 62: Determine the ratio of the random number to the preset value based on the random number and the preset value.

[0109] In this step, the decision to run the target patch is made based on the ratio between the random number and the preset value.

[0110] In one possible implementation, the function If(rand()%conf->inject_failure==0) can be set;

[0111] Here, rand is a random number and conf is a preset value. Whether the ratio of the random number to the preset value is an integer (i.e., whether there is a remainder of 0) is used to control whether the target patch is injected.

[0112] Example 1: The random number is 1000, and the default value is 500.

[0113] Step 63: If the ratio is an integer, run the target patch.

[0114] Following the example above, the random number is 1000, and the default value is 500. The ratio of 1000 to 500 is 2, that is, 1000 divided by 500 has a remainder of 0. This triggers the random patch call, that is, the target patch is run.

[0115] In addition, if the ratio is not an integer, the target patch will not be run.

[0116] If the ratio is not an integer, it means the remainder is 0, and the random patch will not be triggered, meaning the target patch will not be run.

[0117] The disk failure testing method provided in this application obtains a user-defined preset value, determines the ratio of the random number to the preset value based on the random number and the preset value, and runs the target patch if the ratio is an integer. This technical solution controls the frequency of triggering the target patch by using preset values, thereby controlling the specific scenarios in which disk failures are constructed.

[0118] Figure 7 Flowchart of the disk failure testing method provided in the embodiments of this application Figure 4 ,like Figure 7 As shown, testing for this disk failure may include the following steps:

[0119] Step 1: Implement hot patching for various SCSI fault constructs in the sd_done function of the lower half request-response processing interface of the Linux SCSI subsystem.

[0120] Specifically:

[0121] Firstly, at the function entry point, construct different error types for the underlying response. For example:

[0122] Patch 1: sshdr.sense_key = MEDIUM_ERROR (i.e., read / write media error);

[0123] Patch 2: sshdr.sense_key = HARDWARE_ERROR (Other disk errors);

[0124] Patch 3: sshdr.sense_key = RECOVERED_ERROR (the return error specified in the SCSI protocol).

[0125] Secondly, at the function exit point, the module_exit function of the patch module is executed (for example, using the import counting method, the import count is set to 0, and the module exits automatically (i.e., the exit modification processing mechanism)), causing the patch module to exit and restoring the system software processing flow.

[0126] Step 2: During the long-term stability and chaos testing of the storage system: construct random time intervals and trigger random patch calls to construct various software and hardware combination tests.

[0127] Specifically:

[0128] Firstly, the following logic is used: based on the test configuration parameter injection_failure (which testers can set to 500, 5000, 50000, etc. to make the probability of fault injection different), the fault injection condition is randomly triggered to be true.

[0129] If(rand()%conf->inject_failure==0)

[0130] Secondly: If the above conditions are met, in the same way, randomly select a SCSI fault-constructing patch (e.g., construct MEDIUM_ERROR), and apply it to one or more storage servers under test according to the specific test case to construct the corresponding hardware error of the disk.

[0131] The disk failure testing methods provided in the above embodiments are similar in principle and technical effect, and will not be described again here.

[0132] The following are embodiments of the apparatus described in this application, which can be used to execute the embodiments of the method described in this application. For details not disclosed in the apparatus embodiments of this application, please refer to the embodiments of the method described in this application.

[0133] Figure 8 This is a schematic diagram of the structure of an embodiment of the disk failure testing device provided in this application.

[0134] like Figure 8 As shown, the test apparatus for this disk failure includes:

[0135] Module 81 is used to run a target patch during the execution of test cases to construct a target fault existing on the disk. The target patch is determined based on a preset type of Small Computer System Interface (SCSI) fault.

[0136] Module 82 is used to determine whether a target fault exists on the disk.

[0137] In one possible design of this application embodiment, before running the target patch to construct the target fault present on the disk, the apparatus further includes:

[0138] Modify module 83 to change the response result of the I / O operation to the target fault when responding to the callback function of the input / output I / O operation;

[0139] Module 82 is also used to perform the compilation of Linux kernel modules to obtain target patches.

[0140] In this possible design, the device also includes the following before performing the compilation of the Linux kernel module to obtain the target patch:

[0141] The processing module is used to add a mechanism to exit the modification after the response result is modified to the target fault.

[0142] In another possible design of the embodiments of this application, SCSI failures include: read / write media errors, other disk errors, and return errors specified in the SCSI protocol.

[0143] In another possible design of this application embodiment, the construction module runs the target patch, specifically for:

[0144] Get the user-defined preset values;

[0145] Determine the ratio of the random number to the preset value based on the random number and the preset value;

[0146] If the ratio is an integer, run the target patch.

[0147] In this possible design, the building block is also specifically used for:

[0148] If the ratio is not an integer, the target patch will not be run.

[0149] The disk failure testing apparatus provided in this application embodiment can be used to execute the disk failure testing method in any of the above embodiments. Its implementation principle and technical effect are similar, and will not be described again here.

[0150] It should be noted that the division of the various modules in the above device is merely a logical functional division. In actual implementation, they can be fully or partially integrated into a single physical entity, or they can be physically separated. Furthermore, these modules can be implemented entirely in software via processing element calls; they can be fully implemented in hardware; or some modules can be implemented by processing element calls to software, while others are implemented in hardware. Additionally, these modules can be fully or partially integrated together, or implemented independently. The processing element mentioned here can be an integrated circuit with signal processing capabilities. In the implementation process, each step of the above method or each of the above modules can be completed through the integrated logic circuits in the hardware of the processor element or through software instructions.

[0151] Figure 9 This is a schematic diagram of the structure of the electronic device provided in the embodiments of this application, such as... Figure 9As shown, the electronic device may include: a processor 91, a memory 92, and computer program instructions stored in the memory 92 and executable on the processor 91. When the processor 91 executes the computer program instructions, it implements the disk failure testing method provided in any of the foregoing embodiments.

[0152] Optionally, the various components of the electronic device can be connected via a system bus.

[0153] The memory 92 can be a separate memory unit or a memory unit integrated into the processor 91. The number of processors 91 can be one or more.

[0154] It should be understood that the processor 91 can be a Central Processing Unit (CPU), or other general-purpose processors 91, digital signal processors (DSPs), application-specific integrated circuits (ASICs), etc. The general-purpose processor 91 can be a microprocessor 91, or any conventional processor 91. The steps of the method disclosed in this application can be directly manifested as being executed by the hardware processor 91, or being executed by a combination of hardware and software modules within the processor 91.

[0155] The system bus can be a Peripheral Component Interconnect (PCI) bus or an Extended Industry Standard Architecture (EISA) bus, etc. The system bus can be divided into address bus, data bus, control bus, etc. For ease of representation, only one thick line is used in the diagram, but this does not indicate that there is only one bus or one type of bus. Memory 92 may include random access memory (RAM) 92, and may also include non-volatile memory (NVM) 92, such as at least one disk storage device 92.

[0156] All or part of the steps of the above method embodiments can be implemented by hardware related to program instructions. The aforementioned program can be stored in a readable memory 92. When the program is executed, it performs the steps of the above method embodiments; and the aforementioned memory 92 (storage medium) includes: read-only memory 92 (ROM), RAM, flash memory 92, hard disk, solid-state hard disk, magnetic tape, floppy disk, optical disk, and any combination thereof.

[0157] The electronic device provided in this application embodiment can be used to execute the disk failure testing method provided in any of the above method embodiments. Its implementation principle and technical effect are similar, and will not be described again here.

[0158] This application provides a computer-readable storage medium storing computer instructions that, when executed on a computer, cause the computer to perform the aforementioned disk failure testing method.

[0159] The aforementioned computer-readable storage medium can be implemented by any type of volatile or non-volatile storage device or a combination thereof, such as static random access memory, electrically erasable programmable read-only memory, erasable programmable read-only memory, programmable read-only memory, read-only memory, magnetic storage, flash memory, magnetic disk, or optical disk. The readable storage medium can be any available medium accessible to a general-purpose or special-purpose computer.

[0160] Optionally, a readable storage medium can be coupled to a processor, enabling the processor to read information from and write information to the readable storage medium. Alternatively, the readable storage medium can be an integral part of the processor. Both the processor and the readable storage medium can reside in an Application Specific Integrated Circuit (ASIC). Alternatively, the processor and the readable storage medium can exist as discrete components within the device.

[0161] This application also provides a computer program product, which includes a computer program stored in a computer-readable storage medium. At least one processor can read the computer program from the computer-readable storage medium, and when the at least one processor executes the computer program, it can implement the above-mentioned disk failure testing method.

[0162] It should be understood that this disclosure is not limited to the precise structures described above and shown in the accompanying drawings, and various modifications and changes can be made without departing from its scope. The scope of this disclosure is limited only by the appended claims.

Claims

1. A method for testing disk failure, characterized in that, Applied to Linux systems, the method includes: Implement hot patching for each SCSI fault in the sd_done function of the SCSI subsystem lower half request-response processing interface in the Linux system; perform compilation of the Linux kernel module to obtain the target patch; During the execution of the test cases, a target patch for constructing a type of SCSI fault is injected to construct the target fault present on the disk. The target patch is determined based on a preset type of Small Computer System Interface (SCSI) fault. Determine whether the target fault exists on the disk.

2. The method according to claim 1, characterized in that, Prior to injecting a target patch to construct a target fault present on the disk, the method further includes: When responding to the input / output I / O operation callback function, the response result corresponding to the I / O operation is modified to the target fault.

3. The method according to claim 2, characterized in that, Before performing the compilation of the Linux kernel module to obtain the target patch, the method further includes: After modifying the response result to the target fault, a mechanism for exiting the modification is added.

4. The method according to any one of claims 1-3, characterized in that, The SCSI failures include: read / write media errors, other disk errors, and return errors specified in the SCSI protocol.

5. The method according to any one of claims 1-3, characterized in that, The targeted patch injection, which selects a type of SCSI fault construct, includes: Get the user-defined preset values; Based on the random number and the preset value, determine the ratio of the random number to the preset value; If the ratio is an integer, the target patch is executed.

6. The method according to claim 5, characterized in that, The method further includes: If the ratio is not an integer, the target patch will not be run.

7. A disk failure testing device, characterized in that, The device, used in a Linux system, includes: The module is used to implement hot patches for various SCSI faults in the sd_done function of the SCSI subsystem lower half request-response processing interface in the Linux system; to compile the Linux kernel module to obtain the target patch; during the execution of the test cases, a running target patch for a selected type of SCSI fault is injected to construct the target fault existing on the disk, wherein the target patch is determined based on a preset type of Small Computer System Interface SCSI fault; A determination module is used to determine whether the target fault exists on the disk.

8. The apparatus according to claim 7, characterized in that, Prior to the injection of a target patch constructed using a selected type of SCSI fault to construct a target fault present on the disk, the apparatus further includes: The modification module is used to modify the response result corresponding to the I / O operation to the target fault when responding to the input / output I / O operation callback processing function.

9. An electronic device, characterized in that, include: A processor, and a memory communicatively connected to the processor; The memory stores computer-executed instructions; The processor executes computer execution instructions stored in the memory to implement the disk failure testing method as described in any one of claims 1 to 6.

10. A computer-readable storage medium, characterized in that, The computer-readable storage medium stores computer-executable instructions, which, when executed by a processor, are used to implement the disk failure testing method as described in any one of claims 1 to 6.