An input / output timeout monitoring method, device, equipment and readable storage medium
By monitoring the lifecycle of input/output control pages within the controller, the problem of untimely and inaccurate IO timeout monitoring is solved, enabling earlier and more accurate IO timeout detection and improving system stability and fault handling efficiency.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- SHANDONG YUNHAI GUOCHUANG CLOUD COMPUTING EQUIP IND INNOVATION CENT CO LTD
- Filing Date
- 2026-03-20
- Publication Date
- 2026-06-19
Smart Images

Figure CN122240375A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of computer technology, and in particular to an input / output timeout monitoring method, apparatus, device, and readable storage medium. Background Technology
[0002] In recent years, the rapid development of artificial intelligence technology has placed higher demands on data storage systems. Within storage systems, as the deployment scale of controllers continues to expand and the throughput of RAID (Redundant Array of Independent Disks) cards continues to increase, the challenges of operation and maintenance are becoming increasingly significant. One of the most typical problems is the IO timeout phenomenon that occurs during hardware-level IO (input / output) execution. If IO timeout monitoring is not timely, it can directly lead to business IO blocking, increased system response latency, upper-layer service anomalies, and even a decrease in storage link stability.
[0003] Therefore, how to achieve timely and accurate monitoring of IO timeouts is a technical problem that urgently needs to be solved. Summary of the Invention
[0004] In view of this, the purpose of the present invention is to provide an input / output timeout monitoring method, apparatus, device and readable storage medium, which solves the problems of inaccurate and untimely IO timeout monitoring in the prior art.
[0005] To solve the above-mentioned technical problems, the present invention provides an input / output timeout monitoring method, comprising: The duration of the target control page is determined; the duration is the time between when the control page is requested and when the control page is released; the target control page is the control page corresponding to the input / output. The maximum latency of a single input / output is determined based on the frequency of read / write operations. The duration is compared with the maximum delay to determine the input / output timeout situation.
[0006] On the one hand, obtain the duration of the target control page's existence, including: When the input or output is received, the target control page is requested for the input or output; When the input and output have been processed and the processing result has been returned to the upper layer, the target control page is released; The duration between the request and the release is taken as the duration of the target control page's existence.
[0007] On the one hand, after requesting the target control page for the input / output upon receiving it, the method further includes: Initiate monitoring and monitor the target control page according to a preset monitoring frequency; the preset monitoring frequency is determined based on the maximum latency and monitoring period. Accordingly, the step of using the time between the request and the release as the duration of the target control page's existence includes: Once the target control page is detected to be released, the duration of the target control page's existence is determined based on the number of monitoring sessions and the monitoring period.
[0008] On the one hand, after requesting the target control page for the input / output upon receiving it, the method further includes: Set the resource usage status field in the target control page status table to 1; the target control page status table includes the resource usage status field, timeout flag field, and duration field of the target control page; Accordingly, after processing the input and output and returning the processing result to the upper layer, and releasing the target control page, the following steps are also included: Set the resource usage status field in the target control page status table to 0; Accordingly, the duration between the application and the release is taken as the duration of the target control page's existence, including: Read the target control page status table, determine the duration of the target control page's existence based on the changes in the resource usage status field, and write it to the duration field of the target control page status table; Accordingly, after comparing the duration with the maximum delay to determine the input / output timeout situation, the method further includes: Write the output / input timeout information to the timeout flag field in the target control page status table.
[0009] On the one hand, the duration is compared with the maximum delay to determine the input / output timeout situation, including: If the duration is less than the maximum delay, then it is determined that the input / output has not timed out; If the duration is greater than the maximum delay, then the input / output timeout is determined.
[0010] On the one hand, after determining that the input / output timeout has occurred if the duration exceeds the maximum delay, the method further includes: The message receiving channel is closed, the target control page of the input / output is stored, and then sent to the serial port.
[0011] On the one hand, if the duration is greater than the maximum delay, then the input / output timeout is determined, including: Calculate the ratio of the duration to the maximum delay; If the ratio is greater than the first threshold, then the input-output timeout is determined to be Level 1. If the ratio is greater than the second threshold, then the input-output timeout is determined to be level two. If the ratio is greater than the third threshold, then the input / output is determined to be a level three timeout. Wherein, the first threshold is less than the second threshold, the second threshold is less than the third threshold, the severity of the first-level timeout is less than the severity of the second-level timeout, and the severity of the second-level timeout is less than the severity of the third-level timeout.
[0012] The present invention also provides an input / output timeout monitoring device, comprising: The duration acquisition module is used to acquire the duration of the target control page's existence; the duration is the time between when the control page is requested and when the control page is released; the target control page is the control page corresponding to the input / output. The maximum latency determination module is used to determine the maximum latency of a single input / output based on the frequency of read / write operations. The comparison module is used to compare the duration with the maximum delay to determine the input / output timeout status.
[0013] The present invention also provides an input / output timeout monitoring device, comprising: Memory, used to store computer programs; A processor is used to implement the input / output timeout monitoring method described above when executing the computer program.
[0014] The present invention also provides a computer-readable storage medium storing computer-executable instructions, which, when loaded and executed by a processor, implement the input / output timeout monitoring method described above.
[0015] The present invention also provides a computer program product, including a computer program / instruction that, when executed by a processor, implements the steps of the input / output timeout monitoring method described above.
[0016] As can be seen from the above technical solution, this invention, applied to a controller, obtains the duration of the target control page's existence; the duration is the time between the time the control page is requested and the time the control page is released; the target control page is the control page corresponding to the input / output; the maximum latency of a single input / output is determined based on the read / write operation frequency; the duration is compared with the maximum latency to determine the input / output timeout situation. The beneficial effect of this invention is that, by monitoring the entire lifecycle of the input / output corresponding control page from request to release within the controller, this method can perceive the execution status and timeout trend of the input / output earlier and more directly compared to traditional host-side timeout detection methods. Since the control page timing begins at the moment the I / O enters the controller and completes the allocation, it eliminates the delay caused by communication transmission and driver interaction between the host and the controller. It can identify abnormal situations such as I / O delays and slow execution within the controller in real time, thereby detecting timeout risks in advance and capturing on-site information in a timely manner. This effectively improves the real-time performance and accuracy of I / O timeout detection, reduces the probability of false positives and false negatives, and enhances the operational stability and fault handling efficiency of the disk array system. It also avoids the problem of the host triggering a controller reset upon sensing an I / O timeout, which could lead to on-site damage and difficulty in locating timeout issues.
[0017] In addition, the present invention also provides an input / output timeout monitoring device, equipment, and readable storage medium, which also have the above-mentioned beneficial effects. Attached Figure Description
[0018] To more clearly illustrate the technical solutions in the embodiments of the present invention or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are only embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on the provided drawings without creative effort.
[0019] Figure 1 A flowchart illustrating an input / output timeout monitoring method provided in an embodiment of the present invention; Figure 2 An example diagram of a CP status table provided in an embodiment of the present invention; Figure 3 This is a flowchart illustrating a CP (Content Provider) application process provided in an embodiment of the present invention. Figure 4 This is a flowchart illustrating a CP release process provided in an embodiment of the present invention. Figure 5 This is an example diagram illustrating a timeout determination method provided in an embodiment of the present invention. Figure 6 An architecture diagram provided for an embodiment of the present invention; Figure 7 A schematic diagram of a CP monitoring and control register provided in an embodiment of the present invention; Figure 8 A schematic diagram of a maximum delay register provided in an embodiment of the present invention; Figure 9 A schematic diagram of a CP occupancy status update frequency register provided in an embodiment of the present invention; Figure 10 A flowchart illustrating a timeout monitoring process provided in this embodiment of the invention; Figure 11 This is a schematic diagram of an input / output timeout monitoring device provided in an embodiment of the present invention; Figure 12 This is a schematic diagram of an input / output timeout monitoring device provided in an embodiment of the present invention. Detailed Implementation
[0020] To make the objectives, technical solutions, and advantages of the embodiments of the present invention clearer, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.
[0021] First, let's clarify some of the terms used in this invention: RAID (Redundant Array of Independent Disks): Often simply referred to as a disk array. In short, RAID is a disk subsystem composed of multiple independent, high-performance disk drives, providing higher storage performance and data redundancy than a single disk.
[0022] RAID card: Short for Redundant Independent Disk Array card, it's a technology that combines multiple hard drives into a single logical drive, designed to improve storage capacity, performance, reliability, and manageability. It combines hard drives through different RAID levels, providing higher storage performance and data redundancy than a single hard drive.
[0023] NVMe RAID card: refers to a card that combines multiple NVMe solid-state drives (SSDs) into a RAID array.
[0024] CP (Control Page): A data block used to describe IO attributes and operations.
[0025] CP Index (IO Control Page Index): Used to index CP.
[0026] CP Pool (CP Memory Pool): Used to load a fixed number of CPs of a fixed size.
[0027] HWE (Hardware Event): When a hardware malfunctions while performing I / O, it reports a hardware event to the software.
[0028] IO timeout: The complete IO request path involves multiple hardware processing stages, such as host driver queue → RAID card DMA (Direct Memory Access) engine → IO scheduler → XOR (Exclusive OR) accelerator → physical disk distribution. The IO processing chain is long and the scenarios are diverse. Due to logical processing issues, IO is prone to timeout.
[0029] In recent years, the rapid development of artificial intelligence technology has placed higher demands on data storage systems. Especially in terms of data security, storage management, and IO performance, as controller deployment scales expand and RAID card throughput continues to increase, operational challenges are becoming increasingly significant. One of the most typical problems is IO timeout during hardware-level IO execution. If IO timeout monitoring is not timely, it can directly lead to business IO blocking, increased system response latency, upper-layer service anomalies, and even decreased storage link stability. In particular, the RAID controller (Redundant Array of Independent Disks Controller), as the core hardware for disk array management, is playing an increasingly crucial role by enabling data redundancy and fault tolerance, improving IO throughput, simplifying storage management, and expanding storage capacity. Therefore, how to achieve timely and accurate monitoring of IO timeouts is a pressing technical problem that needs to be solved.
[0030] To address the aforementioned issues, this invention provides an input / output timeout monitoring method that can detect IO timeouts or timeout trends as early as possible, thus preventing the host from detecting IO timeouts and resetting the NVMe RAID card, which could lead to site damage and difficulty in locating timeout problems.
[0031] Please refer to the details. Figure 1 , Figure 1 A flowchart illustrating an input / output timeout monitoring method provided in an embodiment of the present invention. This method is applied to a controller and may specifically include: S101: Obtain the duration of the target control page; the duration is the time between when the control page is requested and when it is released; the target control page is the control page corresponding to the input / output.
[0032] The execution entity in this embodiment is the controller. This invention does not limit the controller. For example, it can be a RAID card / RAID controller, an NVMe RAID card, or other controllers. That is, this method can also be used for IO timeout monitoring in non-NVMe RAID cards; and of course, it can also be used for packet timeout monitoring in network devices.
[0033] Furthermore, the aforementioned duration of the target control page's existence can specifically include: when input or output is received, requesting a target control page for the input or output; when the input or output is processed and the processing result is returned to the upper layer, releasing the target control page; and using the time between requesting and releasing as the duration of the target control page's existence.
[0034] It's important to note that when I / O enters the controller, a control page can be requested, and a corresponding control page can be allocated for that I / O. When the controller completes processing the I / O and returns the result to the upper layer (driver, host, or operating system), the control page corresponding to that I / O can be released. The time between requesting and releasing a control page is the duration of the control page's existence. This duration accurately reflects the actual processing time of the input / output within the controller. This duration does not include the interaction latency between the host and the controller, transmission latency, or upper-layer processing latency, thus avoiding timing errors introduced by external factors. This allows for a more realistic and accurate determination of whether the input / output has timeouts or timeout trends, improving the reliability of timeout detection.
[0035] Furthermore, after requesting the target control page for the input / output upon receiving input / output, the process may further include: initiating monitoring and monitoring the target control page according to a preset monitoring frequency; the preset monitoring frequency is determined based on the maximum latency and monitoring cycle; correspondingly, the duration between request and release is taken as the duration of the target control page's existence, which may specifically include: after detecting that the target control page has been released, determining the duration of the target control page's existence based on the number of monitoring sessions and the monitoring cycle.
[0036] It should be noted that the duration acquisition in this implementation does not rely on a hardware timer. Instead, it calculates the duration of the target control page based on the number of monitoring sessions and the monitoring cycle. This eliminates the need to allocate timing resources separately for each input and output, reducing system overhead. Furthermore, it avoids handling timestamp overflow, clock drift, and timing conflicts from multiple concurrent I / O operations. By hiding the timing logic within the timeout monitoring logic, it ensures that the timing and monitoring processes are synchronized and unified, improving the stability and reliability of I / O timeout judgments and facilitating lightweight implementation in the controller.
[0037] Furthermore, upon receiving input or output, after requesting a target control page for the input or output, the process may further include: setting the resource usage status field in the target control page status table to 1; the target control page status table includes the resource usage status field, timeout flag field, and duration field of the target control page; correspondingly, after processing the input or output and returning the processing result to the upper layer, after releasing the target control page, the process may further include: setting the resource usage status field in the target control page status table to 0; correspondingly, the duration between request and release is taken as the duration of the target control page's existence, which may specifically include: reading the target control page status table, determining the duration of the target control page's existence based on the changes in the resource usage status field, and writing it to the duration field in the target control page status table; correspondingly, after comparing the duration with the maximum delay to determine the input / output timeout situation, the process may further include: writing the input / output timeout situation to the timeout flag field in the target control page status table.
[0038] To better understand this section, please refer to... Figure 2 , Figure 3 and Figure 4 . Figure 2 This is an example diagram of a CP status table provided in an embodiment of the present invention. The CP Pool contains a total of CP_NUM (the maximum number of CPs that the system can support for allocation), which mainly records the status of the allocated CPs. Its table entries are defined as follows: (1) S represents the usage status of CP resources, that is, the resource usage status field: 0 represents idle, 1 represents non-idle; (2) T represents the timeout flag: 00 represents no timeout, 01 represents suspected timeout, 10 represents highly suspected timeout, 11 represents timeout; (3) io_latency: duration (that is, the duration from when the CP is requested to when it is released), which increases in increments of time_slice (time slice, which can be understood as time period). Figure 3 This is a flowchart illustrating a CP (Content Request) application process provided in an embodiment of the present invention. When applying for a CP, the CP Index of the applied CP is allocated to its IO. Based on the CP Index, the table entry of CP_STATUS_TABLE is indexed, and the field S (resource usage status field) in the table entry is set to 1. Figure 4 This is a flowchart illustrating a CP release process provided in an embodiment of the present invention. The CP Index is used to index the corresponding CP_STATUS_TABLE entry, which is then cleared, and the S field (resource usage status field) in the entry is set to [value].
[0039] S102: Determine the maximum delay of a single input / output based on the frequency of read / write operations.
[0040] It should be noted that IOPS (Input / Output Operations Per Second) is a real-time business metric, and the calculated maximum latency can adaptively adjust according to system load. This embodiment estimates the maximum latency of a single input / output based on real-time IOPS, allowing the maximum latency to change dynamically with system load. Compared to a fixed maximum latency configuration, this better reflects the controller's actual processing capacity under different load levels, effectively avoiding false or missed timeouts caused by system load changes, and improving the accuracy and adaptability of timeout detection.
[0041] S103: Compare the duration with the maximum delay to determine the input / output timeout status.
[0042] In this embodiment, when the time duration is less than the maximum delay, the input and output can be directly determined to be in a non-timeout state; when the time duration exceeds the maximum delay, it can be directly determined to be in a timeout state. Alternatively, finer-grained division can be performed after the maximum timeout. Here, this embodiment does not make specific limitations. For example, it can be a range threshold mode; or it can be a dynamic multiplier grading mode; or it can be a trend determination mode; or it can be a multi-level threshold mode.
[0043] Furthermore, after determining that the input / output timeout has occurred if the duration exceeds the maximum delay, the above-mentioned steps may further include: closing the message receiving channel, storing the target control page of the input / output, and sending it to the serial port.
[0044] When an I / O operation is determined to be in a timeout state (i.e., the software receives an HWE event confirming the I / O timeout), closing the message receiving channel can prevent further influx of new input / output requests, thus preventing the system load from continuously increasing and the scope of the fault from expanding. Saving the target control page corresponding to the timed-out I / O can completely preserve the controller's state at the time of the timeout, providing reliable data for subsequent problem localization and fault analysis. Outputting the control page information via serial port can still stably output the fault situation even when the system is abnormal and unable to report information through normal business channels, facilitating debugging personnel to quickly obtain key information and locate the root cause. Overall, this effectively improves the system's fault safety and maintainability.
[0045] Furthermore, if the duration exceeds the maximum delay, an input / output timeout is determined. Specifically, this can include: calculating the ratio of duration to the maximum delay; if the ratio is greater than a first threshold, the input / output is determined to be a Level 1 timeout; if the ratio is greater than a second threshold, the input / output is determined to be a Level 2 timeout; if the ratio is greater than a third threshold, the input / output is determined to be a Level 3 timeout; wherein, the first threshold is less than the second threshold, the second threshold is less than the third threshold, the severity of a Level 1 timeout is less than that of a Level 2 timeout, and the severity of a Level 2 timeout is less than that of a Level 3 timeout.
[0046] This embodiment does not specifically limit the first threshold, second threshold, and third threshold. See reference... Figure 5 , Figure 5 This is an example diagram illustrating a timeout determination method provided in an embodiment of the present invention. For instance, if the ratio is between 1 and 2, a timeout flag T=01 is determined, indicating a suspected timeout, and the HME is reported; if the ratio is between 2 and 4, a timeout flag T=10 is determined, indicating a highly suspected timeout, and the HME is reported; if the ratio is greater than 4, a timeout flag T=11 is determined, indicating a timeout, and the HME is reported.
[0047] Furthermore, if the input / output timeout is Level 1, status marking and logging are performed; if the input / output timeout is Level 2, on-site information capture and pre-check are performed; if the input / output timeout is Level 3, the message receiving channel is closed, the target control page of the input / output is sent to the serial port and saved.
[0048] It should be noted that for Level 1 timeouts, only status marking and logging are performed to detect anomalies without affecting normal system operation. For Level 2 timeouts, on-site information capture and pre-checks are performed to facilitate early acquisition of fault clues and provide a basis for subsequent processing. For Level 3 timeouts, the message receiving channel is closed to prevent fault propagation, and the corresponding target control page is saved and output via serial port to ensure complete traceability of the fault scene. Through progressively differentiated processing, the reliability of fault detection and the efficiency of anomaly localization can be improved while ensuring system business continuity, thereby enhancing system stability and maintainability.
[0049] The input / output timeout monitoring method provided in this embodiment of the invention includes the following steps: S101: Obtain the duration of the target control page's existence; the duration is the time between the time the control page is requested and the time the control page is released; the target control page is the control page corresponding to the input / output; S102: Determine the maximum delay of a single input / output based on the read / write operation frequency; S103: Compare the duration with the maximum delay to determine the input / output timeout situation. This method monitors the entire lifecycle of the control page corresponding to the input / output from request to release within the controller. Compared to traditional host-side timeout detection methods, this method can perceive the execution status and timeout trend of input / output earlier and more directly. Since the control page timing begins at the moment the I / O enters the controller and completes the allocation, it eliminates the delay caused by communication transmission and driver interaction between the host and the controller. It can identify abnormal situations such as I / O delays and slow execution within the controller in real time, thereby detecting timeout risks in advance and capturing on-site information in a timely manner. This effectively improves the real-time performance and accuracy of I / O timeout detection, reduces the probability of false positives and false negatives, and enhances the operational stability and fault handling efficiency of the disk array system. It also avoids the problem of the host triggering a controller reset upon sensing an I / O timeout, which could lead to on-site damage and difficulty in locating timeout issues.
[0050] For a clearer understanding of this invention, please refer to the following details. Figure 6 , Figure 6 This invention provides an architecture diagram. This method can detect IO timeouts or timeout trends as early as possible; it avoids the host detecting IO timeouts and resetting the NVMe RAID card, preventing site damage and difficulties in locating timeout issues; it achieves IO timeout monitoring through hardware and software collaboration, making the method simple, efficient, and easy to implement. Specifically, it may include: CP Pool (CP Memory Pool): Primarily used to store information describing the I / O itself and its related context after it enters the RAID card.
[0051] CP Manager: Primarily used for managing the allocation and release of IO control pages.
[0052] Monitor: A submodule of CP Manager, primarily used to monitor the occurrence of IO timeouts and the reporting of hardware events after a timeout.
[0053] HWE Manager (Event Reporting Manager): Primarily used for event reporting. In this embodiment, it is mainly used for reporting hardware events such as IO timeouts. This hardware event includes a CP Index, which is used to locate the target CP from the CP Pool.
[0054] Serial port: Used to output timeout I / O CPs to the terminal for developers to analyze and troubleshoot.
[0055] Disk: Allocate a space in the reserved area on the disk to save the timed-out IO CPs. Later, the CPs can be read from the disk for analysis and location.
[0056] SW: The software of the RAID card. The software receives hardware events of timeout IO, closes the channel for receiving NVMe messages on the RAID card, and prints or saves the CP.
[0057] It should be noted that the CP Pool, CP Manager, HWE Manager, serial port, and disk belong to the hardware components of the RAID card. In other words, the entire monitoring of timeout I / O is handled by the RAID card.
[0058] RAID cards also include various interface definitions: refer to Figure 7 , Figure 7 This is a schematic diagram of a CP monitoring control register provided in an embodiment of the present invention. This register (CP_MONITOR_CTLCP, CP monitoring control register) is used to control whether the CP request release monitoring function is enabled or disabled. E represents IO timeout monitoring enabled: 0 represents disabled, 1 represents enabled.
[0059] refer to Figure 8 , Figure 8 This is a schematic diagram of a maximum latency register provided in an embodiment of the present invention. This register (MAX_IO_LATENCY register, i.e., the maximum latency register) is used to configure the maximum I / O latency and is configured by software. This value needs to be determined by the software developer based on IOPS. `max_io_latency` represents the maximum latency for normal I / O, determined based on IOPS. If the latency exceeds the maximum, an I / O timeout is suspected; if it exceeds N times the maximum latency, an I / O timeout is highly suspected; if it exceeds M times the maximum latency, an I / O timeout is indicated. M is greater than N and greater than 1.
[0060] refer to Figure 9 , Figure 9 This is a schematic diagram of a CP occupancy status update frequency register provided in an embodiment of the present invention. This register (UPDATE_CP_OCCUPY_FREQ register, i.e., the CP occupancy status update frequency register) is used to configure the monitoring frequency. N represents the monitoring frequency.
[0061] refer to Figure 10 , Figure 10This is a flowchart illustrating a timeout monitoring process provided in an embodiment of the present invention. After CP_MONITOR_CTL.E is enabled, the time period is first calculated as time_slice = MAX_IO_LATENCY.max_io_latency / UPDATE_CP_OCCUPY_FREQ.N, that is, the time period = maximum latency / monitoring frequency; when the time period is reached, the monitoring process is started: (1) start polling monitoring from the first CP until all CPs in the CP Pool are polled; (2) read the contents of CP_STATUS_TABLE[CP_Index] (i.e. the corresponding CP status table), if CP_STATUS_TABLE.S == 1, it means that the CP is in use, and calculate the duration CP_STATUS_TABLE.io_latency = CP_STATUS_TABLE.io_latency (i.e. the current duration) + time_slice; (3) determine whether timeout has occurred: determine the status of CP_STATUS_TABLE.io_latency / MAX_IO_LATENCY.max_io_latency, if it changes, update the timeout flag CP_STATUS_TABLE.T and report to HWE.
[0062] The input / output timeout monitoring device provided in the embodiments of the present invention will be described below. The input / output timeout monitoring device described below can be referred to in correspondence with the input / output timeout monitoring method described above.
[0063] Please refer to the details. Figure 11 , Figure 11 A schematic diagram of an input / output timeout monitoring device provided in an embodiment of the present invention may include: The duration acquisition module 100 is used to acquire the duration of the existence of the target control page; the duration is the time between the time the control page is requested and the time the control page is released; the target control page is the control page corresponding to the input and output. The maximum delay determination module 200 is used to determine the maximum delay of a single input / output based on the frequency of read / write operations. The comparison module 300 is used to compare the duration with the maximum delay to determine the input / output timeout situation.
[0064] Furthermore, based on the above embodiments, the duration acquisition module 100 may include: The application unit is used to apply for the target control page for the input / output when the input / output is received; The release unit is used to release the target control page when the input and output have been processed and the processing result has been returned to the upper layer; The duration determination unit is used to determine the duration between the application and the release as the duration of the existence of the target control page.
[0065] Furthermore, based on the above embodiments, the input / output timeout monitoring device may further include: The monitoring module is used to initiate monitoring after requesting the target control page for the input / output when the input / output is received, and to monitor the target control page according to a preset monitoring frequency; the preset monitoring frequency is determined based on the maximum latency and the monitoring period. Accordingly, the aforementioned duration determination unit can be specifically used to determine the duration of the target control page's existence based on the number of monitoring sessions and the monitoring cycle after the target control page is detected to be released.
[0066] Furthermore, based on the above embodiments, the input / output timeout monitoring device may further include: The setting module is used to set the resource usage status field in the target control page status table to 1 after requesting the target control page for the input / output when the input / output is received; the target control page status table includes the resource usage status field, timeout flag field, and duration field of the target control page; Correspondingly, after the input and output have been processed and the processing result has been returned to the upper layer, and the target control page has been released, the process may further include: setting the resource usage status field in the target control page status table to 0; Accordingly, taking the duration between the application and the release as the duration of the target control page's existence can include: reading the target control page status table, determining the duration of the target control page's existence based on the changes in the resource usage status field, and writing it into the duration field of the target control page status table; Accordingly, after comparing the duration with the maximum delay to determine the input / output timeout, the method may further include: writing the input / output timeout to the timeout flag field in the target control page status table.
[0067] Furthermore, based on the above embodiments, the comparison module 300 may include: The first determining unit is configured to determine that the input / output has not timed out if the duration is less than the maximum delay. The second determining unit is used to determine that the input / output timeout occurs if the duration is greater than the maximum delay.
[0068] Furthermore, based on the above embodiments, the input / output timeout monitoring device may further include: The storage and transmission module is used to close the message receiving channel, store the target control page of the input / output, and send it to the serial port after determining that the input / output timeout occurs if the duration is greater than the maximum delay.
[0069] Furthermore, based on the above embodiments, the second determining unit may include: A ratio calculation subunit is used to calculate the ratio of the duration to the maximum delay; The first determining subunit is used to determine that the input-output timeout is level one if the ratio is greater than a first threshold. The second determining subunit is used to determine that the input-output timeout is level two if the ratio is greater than the second threshold. The third determining subunit is used to determine that the input-output timeout is level three if the ratio is greater than the third threshold. Wherein, the first threshold is less than the second threshold, the second threshold is less than the third threshold, the severity of the first-level timeout is less than the severity of the second-level timeout, and the severity of the second-level timeout is less than the severity of the third-level timeout.
[0070] It should be noted that the order of the modules and units in the above-mentioned input / output timeout monitoring device can be changed without affecting the logic.
[0071] The input / output timeout monitoring device provided in this embodiment of the invention uses a duration acquisition module 100 to acquire the duration of the target control page's existence; the duration is the time between the time the control page is requested and the time the control page is released; the target control page is the control page corresponding to the input / output; a maximum latency determination module 200 is used to determine the maximum latency of a single input / output based on the frequency of read / write operations; and a comparison module 300 is used to compare the duration with the maximum latency to determine the input / output timeout situation. This device monitors the entire lifecycle of the control page corresponding to the input / output from request to release within the controller, which, compared to traditional host-side timeout detection methods, allows for earlier and more direct perception of the execution status and timeout trend of the input / output. Since the control page timing begins at the moment the I / O enters the controller and completes the allocation, it eliminates the delay caused by communication transmission and driver interaction between the host and the controller. It can identify abnormal situations such as I / O delays and slow execution within the controller in real time, thereby detecting timeout risks in advance and capturing on-site information in a timely manner. This effectively improves the real-time performance and accuracy of I / O timeout detection, reduces the probability of false positives and false negatives, and enhances the operational stability and fault handling efficiency of the disk array system. It also avoids the problem of the host triggering a controller reset upon sensing an I / O timeout, which could lead to on-site damage and difficulty in locating timeout issues.
[0072] Figure 12This is a schematic diagram of the structure of an input / output timeout monitoring device provided in an embodiment of the present invention, as shown below. Figure 12 As shown, the input / output timeout monitoring device includes: Memory 60 is used to store computer programs; The processor 61 is used to implement the steps of the input / output timeout monitoring method as described in the above embodiments when executing a computer program.
[0073] The input / output timeout monitoring device provided in this embodiment may include, but is not limited to, smartphones, tablets, laptops, or desktop computers.
[0074] The processor 61 may include one or more processing cores, such as a quad-core processor or an octa-core processor. The processor 61 may be implemented using at least one hardware form selected from Digital Signal Processing (DSP), Field-Programmable Gate Array (FPGA), and Programmable Logic Array (PLA). The processor 61 may also include a main processor and a coprocessor. The main processor, also known as the Central Processing Unit (CPU), is used to process data in the wake-up state; the coprocessor is a low-power processor used to process data in the standby state. In some embodiments, the processor 61 may integrate a Graphics Processing Unit (GPU), which is responsible for rendering and drawing the content to be displayed on the screen. In some embodiments, the processor 61 may also include an Artificial Intelligence (AI) processor, which handles computational operations related to machine learning.
[0075] The memory 60 may include one or more computer-readable storage media, which may be non-transitory. The memory 60 may also include high-speed random access memory and non-volatile memory, such as one or more disk storage devices or flash memory devices. In this embodiment, the memory 60 is used to store at least the following computer program 601, which, after being loaded and executed by the processor 61, is capable of implementing the relevant steps of the input / output timeout monitoring method disclosed in any of the foregoing embodiments. In addition, the resources stored in the memory 60 may also include an operating system 602 and data 603, etc., and the storage method may be temporary storage or permanent storage. The operating system 602 may include Windows, Unix, Linux, etc. The data 603 may include, but is not limited to, data related to the input / output timeout monitoring method.
[0076] In some embodiments, the input / output timeout monitoring device may further include a display screen 62, an input / output interface 63, a communication interface 64, a power supply 65, and a communication bus 66.
[0077] Those skilled in the art will understand that Figure 12 The structure shown does not constitute a limitation on the input / output timeout monitoring device and may include more or fewer components than shown.
[0078] It is understood that if the input / output timeout monitoring method in the above embodiments is implemented as a software functional unit and sold or used as an independent product, it can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present invention, in essence, or the part that contributes to the current technology, or all or part of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a storage medium and executes all or part of the steps of the methods in the various embodiments of the present invention. The aforementioned storage medium includes: USB flash drive, mobile hard drive, read-only memory (ROM), random access memory (RAM), electrically erasable programmable ROM, register, hard disk, removable disk, CD-ROM, magnetic disk, or optical disk, and other media capable of storing program code.
[0079] Based on this, embodiments of the present invention also provide a computer-readable storage medium storing a computer program, which, when executed by a processor, implements the steps of the input / output timeout monitoring method described above.
[0080] The following describes a computer program product provided by an embodiment of this application. The computer program product described below can be referred to in conjunction with other embodiments described herein.
[0081] A computer program product includes a computer program / instructions that, when executed by a processor, implement the steps of the aforementioned disclosed input / output timeout monitoring method.
[0082] The various embodiments in this specification are described in a progressive manner, with each embodiment focusing on its differences from other embodiments. Similar or identical parts between embodiments can be referred to interchangeably. For the apparatus disclosed in the embodiments, since it corresponds to the method disclosed in the embodiments, the description is relatively simple; relevant parts can be referred to in the method section.
[0083] Those skilled in the art will further recognize that the units and algorithm steps of the various examples described in conjunction with the embodiments disclosed herein can be implemented in electronic hardware, computer software, or a combination of both. To clearly illustrate the interchangeability of hardware and software, the components and steps of the various examples have been generally described in terms of functionality in the foregoing description. Whether these functions are implemented in hardware or software depends on the specific application and design constraints of the technical solution. Those skilled in the art can use different methods to implement the described functions for each specific application, but such implementations should not be considered beyond the scope of this invention.
[0084] Finally, it should be noted that in this document, relationships such as "first" and "second" are used merely to distinguish one entity or operation from another, and do not necessarily require or imply any such actual relationship or order between these entities or operations. Furthermore, the terms "comprising," "including," or any other variations are intended to cover non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such process, method, article, or apparatus.
[0085] The above provides a detailed description of the input / output timeout monitoring method, apparatus, device, and computer-readable storage medium provided by the present invention. Specific examples have been used to illustrate the principles and implementation methods of the present invention. The description of the above embodiments is only for the purpose of helping to understand the method and core ideas of the present invention. At the same time, for those skilled in the art, there will be changes in the specific implementation methods and application scope based on the ideas of the present invention. Therefore, the content of this specification should not be construed as a limitation of the present invention.
Claims
1. An input / output timeout monitoring method, characterized by, Applied to controllers, including: The duration of the target control page is determined; the duration is the time between when the control page is requested and when the control page is released; the target control page is the control page corresponding to the input / output. The maximum latency of a single input / output is determined based on the frequency of read / write operations. The duration is compared with the maximum delay to determine the input / output timeout situation.
2. The input / output timeout monitoring method according to claim 1, wherein Obtain the duration of the target control page's existence, including: When the input or output is received, the target control page is requested for the input or output; When the input and output have been processed and the processing result has been returned to the upper layer, the target control page is released; The duration between the request and the release is taken as the duration of the target control page's existence.
3. The input / output timeout monitoring method according to claim 2, characterized in that, After requesting the target control page for the input / output upon receipt of the input / output, the process further includes: Initiate monitoring and monitor the target control page according to a preset monitoring frequency; the preset monitoring frequency is determined based on the maximum latency and monitoring period. Accordingly, the step of using the time between the request and the release as the duration of the target control page's existence includes: Once the target control page is detected to be released, the duration of the target control page's existence is determined based on the number of monitoring sessions and the monitoring period.
4. The input / output timeout monitoring method according to claim 2, characterized in that, After requesting the target control page for the input / output upon receipt of the input / output, the process further includes: Set the resource usage status field in the target control page status table to 1; the target control page status table includes the resource usage status field, timeout flag field, and duration field of the target control page; Accordingly, after processing the input and output and returning the processing result to the upper layer, and releasing the target control page, the following steps are also included: Set the resource usage status field in the target control page status table to 0; Accordingly, the duration between the application and the release is taken as the duration of the target control page's existence, including: Read the target control page status table, determine the duration of the target control page's existence based on the changes in the resource usage status field, and write it to the duration field of the target control page status table; Accordingly, after comparing the duration with the maximum delay to determine the input / output timeout situation, the method further includes: Write the output / input timeout information to the timeout flag field in the target control page status table.
5. The input / output timeout monitoring method according to claim 1, characterized in that, The input / output timeout is determined by comparing the duration with the maximum delay, including: If the duration is less than the maximum delay, then it is determined that the input / output has not timed out; If the duration is greater than the maximum delay, then the input / output timeout is determined.
6. The input / output timeout monitoring method according to claim 5, characterized in that, After determining that the input / output timeout has occurred if the duration exceeds the maximum delay, the method further includes: The message receiving channel is closed, the target control page of the input / output is stored, and then sent to the serial port.
7. The input / output timeout monitoring method according to claim 5, characterized in that, If the duration is greater than the maximum delay, then the input / output timeout is determined, including: Calculate the ratio of the duration to the maximum delay; If the ratio is greater than the first threshold, then the input-output timeout is determined to be Level 1. If the ratio is greater than the second threshold, then the input-output timeout is determined to be level two. If the ratio is greater than the third threshold, then the input / output is determined to be a level three timeout. Wherein, the first threshold is less than the second threshold, the second threshold is less than the third threshold, the severity of the first-level timeout is less than the severity of the second-level timeout, and the severity of the second-level timeout is less than the severity of the third-level timeout.
8. An input / output timeout monitoring device, characterized in that, include: The duration acquisition module is used to acquire the duration of the target control page's existence; the duration is the time between when the control page is requested and when the control page is released; The target control page is the control page corresponding to the input and output; The maximum latency determination module is used to determine the maximum latency of a single input / output based on the frequency of read / write operations. The comparison module is used to compare the duration with the maximum delay to determine the input / output timeout status.
9. An input / output timeout monitoring device, characterized in that, include: Memory, used to store computer programs; A processor for implementing the input / output timeout monitoring method as described in any one of claims 1 to 7 when executing the computer program.
10. A computer-readable storage medium, characterized in that, The computer-readable storage medium stores computer-executable instructions, which, when loaded and executed by a processor, implement the input / output timeout monitoring method as described in any one of claims 1 to 7.