Method, apparatus, and computer program product for delay processing
By monitoring and analyzing system status data, predictors are used to identify high-latency factors in data persistence operations, providing targeted solutions to address the high latency issues caused by data persistence operations and improve system performance.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- DELL PROD LP
- Filing Date
- 2022-06-09
- Publication Date
- 2026-06-12
AI Technical Summary
High latency issues caused by data persistence operations are complex and impact system performance. Existing technologies struggle to quickly and accurately identify and resolve the main factors contributing to high latency.
By monitoring the persistent data operations of the monitoring system, relevant state data is recorded and an estimated latency is generated using a trained predictor. This identifies the states that have the greatest impact on latency among high-latency events and provides targeted action recommendations.
Quickly identify and resolve high latency issues in data persistence operations, reduce negative impacts on user business, and improve system performance.
Smart Images

Figure CN117251102B_ABST
Abstract
Description
Technical Field
[0001] Embodiments of this disclosure relate to system fault diagnosis, and more specifically, to methods, apparatus, and computer program products for delay processing. Background Technology
[0002] To improve system efficiency, during file processing, data is typically first written to memory and then later written to a more persistent storage device (e.g., disk) at an appropriate time. Processes can invoke the system's data persistence operations (e.g., using the fsync function) to flush memory and synchronize updated content to disk. Summary of the Invention
[0003] The embodiments of this disclosure provide a scheme for delay processing.
[0004] In a first aspect of this disclosure, a method for delay processing is provided, comprising: in response to a data persistence operation occurring in the system, acquiring a record for the operation, wherein the record includes an actual delay of the operation and a set of metrics of a set of states of the system during a predetermined period in which the operation occurs; in response to the actual delay being greater than a first threshold, generating an estimated delay of the operation based on the set of metrics and using a trained predictor; determining a difference between the actual delay and the estimated delay; and in response to the difference being less than a second threshold, identifying one or more states from the set of states based on the record and the estimated delay.
[0005] In a second aspect of this disclosure, an electronic device is provided, including a processor and a memory coupled to the processor, the memory having instructions stored therein, the instructions causing the device to perform an action when executed by the processor, the action including: in response to a data persistence operation occurring in the system, acquiring a record for the operation, wherein the record includes an actual delay of the operation and a set of metrics of a set of states of the system during a predetermined period in which the operation occurs; in response to the actual delay being greater than a first threshold, generating an estimated delay of the operation based on the set of metrics and using a trained predictor; determining a difference between the actual delay and the estimated delay; and in response to the difference being less than a second threshold, identifying one or more states from the set of states based on the record and the estimated delay.
[0006] In a third aspect of this disclosure, a computer program product is provided, which is tangibly stored on a computer-readable medium and includes machine-executable instructions that, when executed, cause a machine to perform the method described in accordance with a first aspect of this disclosure.
[0007] This summary is provided to present a simplified description of the chosen concepts, which will be further described in the detailed embodiments described below. The summary is not intended to identify key or principal features of this disclosure, nor is it intended to limit the scope of this disclosure. Attached Figure Description
[0008] The above and other objects, features, and advantages of this disclosure will become more apparent from a more detailed description of exemplary embodiments thereof, taken in conjunction with the accompanying drawings, wherein:
[0009] Figure 1 A schematic diagram of an example environment in which several embodiments of the present disclosure can be implemented is shown;
[0010] Figure 2 Example methods for delay processing according to some embodiments of this disclosure are shown;
[0011] Figure 3 An example architecture for delay processing according to some embodiments of this disclosure is shown;
[0012] Figure 4 Example simulation results are shown illustrating factors that cause high latency in data persistence operations, according to some embodiments of this disclosure; and
[0013] Figure 5 A schematic block diagram of a device that can be used to implement embodiments of the present disclosure is shown.
[0014] In all the accompanying drawings, the same or similar reference numerals denote the same or similar elements. Detailed Implementation
[0015] Embodiments of this disclosure will now be described in more detail with reference to the accompanying drawings. While some embodiments of this disclosure are shown in the drawings, it should be understood that this disclosure can be implemented in various forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided to provide a more thorough and complete understanding of this disclosure. It should be understood that the accompanying drawings and embodiments of this disclosure are for illustrative purposes only and are not intended to limit the scope of protection of this disclosure.
[0016] The term "comprising" and its variations as used herein are open-ended inclusion, meaning "including but not limited to". The term "based on" means "at least partially based on". The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment". Definitions of other terms will be given in the description below.
[0017] During file processing, data is typically first written to memory and then later written to persistent storage at an appropriate time. For example, a process can call the system's data persistence operation (e.g., the `fsync` function) to flush the buffer and synchronize the updated file content to persistent storage. This data persistence operation involves different components across multiple layers of the system, such as the I / O stack, file system write-back / logging, and runtime workloads. In some cases, data persistence operation calls can result in high latency that impacts system performance. Such high latency can block I / O operations for extended periods and, in some cases, cause application operations to panic due to timeouts. For example, latency can exceed 30 seconds, sometimes even 50 seconds. Therefore, engineering teams need to identify the factors causing high latency when it occurs and take timely, targeted actions to resolve the problem. However, the latency of data persistence operations is related to multiple factors, including the various states of multiple components involved in the operation and / or other system configurations at the time, making the investigation and mitigation of such high latency issues very complex.
[0018] To at least partially address the aforementioned problems and other potential issues, embodiments of this disclosure propose a scheme for latency handling. This scheme monitors the system's data persistence operations and, when a high-latency event (e.g., latency exceeding a threshold) occurs in the system's data persistence operations (e.g., fsync), records various system state data associated with it. The scheme then uses a trained predictor to generate data persistence operations based on these system states. When the estimation results are of sufficient quality, some embodiments of the scheme determine one or more states that have the greatest impact on the high latency event (e.g., cause the largest increase in total latency) based on analysis of the role of each system state when the predictor generates the estimation results. Some embodiments of the scheme also consider the magnitude of the impact of these one or more system states on user services and prioritize actions with less negative impact on user services to improve the high latency problem. The scheme of this disclosure can identify the most significant factors causing high latency problems in the system's data persistence operations, thereby enabling timely and targeted remedial measures to be provided to users.
[0019] Figure 1 A schematic diagram of an example environment 100 in which various embodiments of the present disclosure can be implemented is shown. Environment 100 may include computing device 110 and system 120. Although shown as separate single entities, computing device 110 and system 120 may exist and be distributed in any suitable form, and may have other relationships between them. For example, system 120 or a portion of system 120 may reside on computing device 110.
[0020] Processes running in system 120 (such as system daemons or client application processes) can temporarily place data in memory 130 (e.g., RAM) during data processing and, when appropriate, use data persistence operations to update and synchronize the data in memory 130 to persistent storage device 140. This data persistence operation involves multiple components (not shown) across multiple layers of system 120 and has a certain delay.
[0021] The computing device 110 can monitor and record various data from the system 120, such as the various states of the system before and after a data persistence operation (e.g., within a threshold event period after the operation occurs) and the latency of the operation. The computing device 110 can also use the methods of this disclosure to estimate the amount of latency when a high latency (e.g., latency exceeding a threshold) occurs during a data persistence operation, and to identify the factors causing the high latency based on the estimation results of sufficient quality, and to provide targeted recommendations for improving the latency.
[0022] The architecture and functionality in example environment 100 are described for illustrative purposes only and do not imply any limitation on the scope of this disclosure. Other devices, systems, or components, etc., not shown, may also be present in example environment 100. Furthermore, embodiments of this disclosure can be applied to other environments with different structures and / or functionalities.
[0023] Figure 2 A flowchart of an example method 200 for delay processing according to some embodiments of the present disclosure is shown. Example method 200 can be, for example, by... Figure 1 The method is executed by the computing device 110 shown. It should be understood that method 200 may also include additional actions not shown, and the scope of this disclosure is not limited in this respect. The following is in conjunction with... Figure 1 Example environment 100 is used to describe method 200 in detail.
[0024] At box 210, in response to a data persistence operation occurring in the system, a record for that operation is obtained, the record including the actual delay of the operation and a set of metrics of a set of system states during a predetermined period of time in which the operation occurs. For example, computing device 110 may obtain a record for a data persistence operation occurring in system 120, wherein the record includes the actual delay of the operation and a set of metrics of a set of system states during a predetermined period of time in which the operation occurs.
[0025] In some embodiments, computing device 110 can monitor and record different types of states of system 120, indicating the status of different domains of the system. In some embodiments, these types may include hardware status, I / O stack configuration, and workload patterns. Hardware status may include, but is not limited to, states reflecting hardware health such as hard disk SMART information and I / O errors. I / O stack configuration may include, but is not limited to, parameters such as I / O scheduler settings, file system write-back policies, and file system logging policies. Workload patterns may include, but are not limited to, process read / write throughput (e.g., in bytes) and system call (e.g., fsync) counts for different applications in the system.
[0026] At box 220, in response to an actual latency exceeding a first threshold, an estimated latency for the operation is generated using a trained predictor based on a set of metrics. For example, computing device 110 may generate an estimated latency for a data persistence operation using a trained predictor based on a set of metrics for that operation, in response to an actual latency exceeding the first threshold. Latency within the first threshold is considered reasonably expected, while latency exceeding the first threshold can be considered high latency that adversely affects system operation, and therefore computing device 110 needs to further perform subsequent steps of method 200 to investigate the latency. Computing device 110 may set the threshold based on the performance requirements of the specific system and / or for a specific type of operation (e.g., a system function call performing a data persistence operation).
[0027] In some embodiments, the computing device 110 may use a history of data persistence operations of a particular type to train a predictor using appropriate machine learning methods, and use the trained predictor to estimate the latency of the prediction operation when it has the desired quality. Thus, the predictions made by the predictor can be considered to reflect reality well.
[0028] At box 230, the difference between the actual delay and the estimated delay is determined. For example, computing device 110 can determine (e.g., for operations with high delays as described above) the difference between the actual delay and the estimated delay. Thus, computing device 110 can determine whether the estimate made by the predictor has sufficient quality to reflect the actual delay situation for identifying the step.
[0029] At box 240, in response to a difference less than a second threshold, one or more states are identified from a set of states based on the recorded and estimated latency. For example, computing device 110 may identify one or more states from a set of states recorded in a record based on the recorded and estimated latency of an operation, in response to a difference between the actual latency and the estimated latency of an operation (determined at box 230) being less than a second threshold. Computing device 110 can then identify one or more states that have the greatest impact on the high latency of the operation.
[0030] Using method 200, computing device 110 can detect the occurrence of high-latency events and identify the main factors causing the high latency, thereby providing guidance for timely and targeted solutions to the high-latency problem.
[0031] Figure 3 An example architecture 300 for latency processing according to some embodiments of the present disclosure is shown. Architecture 300 may be an example implementation of a logic module in computing device 110 for processing latency (e.g., high-latency events of data persistence operations in system 120) in the manner described in the present disclosure (e.g., method 200). References are made below. Figure 2 Example environment 100 to describe Figure 3 It should be understood that the computing device 110 may also include other modules not shown. Furthermore, architecture 300 is merely an example, and other suitable architectures capable of implementing the schemes described in this disclosure may also be used.
[0032] Architecture 300 includes a data persistence operation monitoring module 310, a high latency analysis module 320, and a reporting module 330.
[0033] The computing device 110 can use the data persistence operation monitoring module 310 to monitor the occurrence of data persistence operations in the system 120 and record relevant data for the operations, including the actual latency of the operations and a set of metrics of a set of system states within a predetermined time period during which the operations occur. In some instances, the computing device 110 takes statistics (e.g., the most recent n minutes) of a set of states within a threshold time period from the occurrence of a high-latency data persistence operation as the system state associated with that operation.
[0034] The data persistence operation monitoring module 310 can also identify data persistence operations with actual latency greater than a first threshold. In response to such an operation, the high latency analysis module 320 can generate an estimated latency for the operation using a trained predictor based on a set of metrics in the records for that operation, and determine the difference between the actual latency and the estimated latency. If the difference is less than a second threshold, the high latency analysis module 320 can consider that the predictor's estimate of the operation's latency has sufficient quality to reflect the actual latency situation. In this case, the high latency analysis module 320 can analyze the operation's records and estimated latency to identify one or more states from the recorded set of states, thereby identifying the main factors causing the high latency of the operation. In some embodiments, if the difference is greater than the second threshold, the high latency analysis module 320 can determine that the current predictor cannot accurately estimate the latency of the operation, and the computing device 110 can use the record to adjust the predictor. For example, the computing device 110 can add the record to a historical database of delayed operations for subsequent retraining of the predictor.
[0035] For an estimate of sufficient quality, the high-latency analysis module 320 can determine the contribution of a set of states to the estimated latency generated by the predictor, and determine one or more states based on that contribution. In some embodiments, the computing device 110 can use, for example, an additive model based on game theory to calculate the contribution of each system state to the estimated latency generated by the predictor. Since the estimated latency is considered to reflect the actual situation well, the contribution reflects the contribution of these states to the actual latency of the operation. Therefore, the high-latency analysis module 320 can then determine one or more states that have the greatest impact on the latency of the operation based on that contribution.
[0036] For illustrative purposes, Figure 4An example visualization 400 of high-latency analysis results according to an embodiment of the present disclosure is shown. This exemplary high-latency analysis utilizes Shapley additive interpretation (SHAP) to calculate the contribution of each state and can be performed, for example, by the high-latency analysis module 320. Visualization 400 shows the results of analyzing the contribution of a data persistence operation with a high latency exceeding a threshold (30 seconds in this non-limiting example), listing the contribution values of the set of features with the highest contribution values (i.e., a set of system states in this example) and the total contribution values of other features. As shown by reference numeral 410, in this non-limiting example, the latency is close to 40 seconds, close to its estimated latency f(x). As shown by reference numeral 420, for this operation, computing device 110 can infer that the state named "slow_drive" made the largest contribution (+24.946.54 seconds) to the latency of this sample. In this example, the value of this feature represents an excessive number of remapped sectors (RAS), which indicates media wear.
[0037] Now back Figure 3 In some embodiments, the high latency analysis module 320 may also determine the order of one or more states based on the degree of impact of one or more states on the functionality of the system 120. In some embodiments, the states in a set of states may belong to one of the following types: hardware state, I / O stack configuration, or workload mode. These state types indicate different domains of the system 120, and changing the state indicating different domains has different degrees of impact on the functionality of the system 120 (e.g., the performance of user workloads).
[0038] In some embodiments, the high latency analysis module 320 can sort the identified one or more system states based on this impact. For example, after identifying one or more states that increase the total latency of a data persistence operation the most, the high latency analysis module 42 can sort the hardware states with the least impact on workload performance first, followed by I / O stack configuration states, then workload modes, and so on. In some embodiments, the high latency analysis module 320 can also determine the sorting based on both the degree of influence of the state on the latency (e.g., contribution as described above) and the degree of influence of the state type on functionality. For example, the high latency analysis module 320 can group the identified one or more states by type based on the magnitude of the influence of the type on functionality and sort the groups, then sort the states of that type by contribution within the sorted groups to determine the final sorting. For example, the high latency analysis module 320 can also weight the contribution of the state based on the degree of influence of the type of the state on functionality, and sort one or more states based on the weighted contribution.
[0039] The computing device 110 can also use the reporting module 330 to generate reports on high-latency data persistence operations. In some embodiments, the reporting module 330 can generate a first report that includes indications of one or more states identified by the high-latency analysis module 320 to report to the user the main factors causing the high latency. Thus, the user can understand the cause of the latency in a timely manner when it occurs.
[0040] In some embodiments, the reporting module 330 may also generate a second report based on a set of metrics of one or more states at the time the high-latency operation was analyzed. This second report indicates recommended actions to the user to reduce latency. For example, based on the analysis results as shown in visualization 400, the reporting module 330 may further generate recommended actions to repair or replace the driver.
[0041] In some embodiments, in addition to a set of metrics, the reporting module 330 may also generate a third report based on the aforementioned functional impact ranking to indicate recommended actions to the user for reducing latency. In some such embodiments, the reporting module 330 may prioritize generating recommended actions for higher-ranked states (e.g., states with the least impact on workload performance). For example, if the high latency analysis module 320 identifies the disk's I / O error rate and a particular user application as the main factors causing a certain high latency, the reporting module 330 may prioritize recommending repairing or replacing the drive.
[0042] As described above, when used in conjunction with method 200, architecture 300 can be used to identify the main factors causing high latency in data persistence operations in the system and provide instructions and suggestions to the user accordingly, so that the problem causing the high latency can be handled in a timely and accurate manner, and in some embodiments, in a manner that has a minimal impact on the user's workload.
[0043] Figure 5 A schematic block diagram of a device 500 that can be used to implement embodiments of the present disclosure is shown. Device 500 may be the device or apparatus described in the embodiments of the present disclosure. Figure 5 As shown, device 500 includes a central processing unit (CPU) 501, which can perform various appropriate actions and processes according to computer program instructions stored in read-only memory (ROM) 502 or loaded from storage unit 508 into random access memory (RAM) 503. The RAM 503 may also store various programs and data required for the operation of device 500. The CPU 501, ROM 502, and RAM 503 are interconnected via bus 504. Input / output (I / O) interface 505 is also connected to bus 504. Although not shown in... Figure 5As shown, device 500 may also include a coprocessor.
[0044] Multiple components in device 500 are connected to I / O interface 505, including: input unit 506, such as keyboard, mouse, etc.; output unit 507, such as various types of monitors, speakers, etc.; storage unit 508, such as disk, optical disk, etc.; and communication unit 509, such as network card, modem, wireless transceiver, etc. Communication unit 509 allows device 500 to exchange information / data with other devices through computer networks such as the Internet and / or various telecommunications networks.
[0045] The various methods or processes described above can be executed by processing unit 501. For example, in some embodiments, the methods can be implemented as computer software programs tangibly contained in a machine-readable medium, such as storage unit 508. In some embodiments, part or all of the computer program can be loaded and / or installed on device 500 via ROM 502 and / or communication unit 509. When the computer program is loaded into RAM 503 and executed by CPU 501, one or more steps or actions in the methods or processes described above can be performed.
[0046] In some embodiments, the methods and processes described above can be implemented as a computer program product. The computer program product may include a computer-readable storage medium having computer-readable program instructions loaded thereon for performing various aspects of this disclosure.
[0047] Computer-readable storage media can be tangible devices capable of holding and storing instructions for use by an instruction execution device. Computer-readable storage media can be, for example, but not limited to, electrical storage devices, magnetic storage devices, optical storage devices, electromagnetic storage devices, semiconductor storage devices, or any suitable combination thereof. More specific examples (a non-exhaustive list) of computer-readable storage media include: portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), static random access memory (SRAM), portable compact disc read-only memory (CD-ROM), digital multifunction disc (DVD), memory sticks, floppy disks, mechanical encoding devices, such as punch cards or recessed protrusions storing instructions thereon, and any suitable combination thereof. The computer-readable storage media used herein are not to be construed as transient signals themselves, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through waveguides or other transmission media (e.g., light pulses through fiber optic cables), or electrical signals transmitted through wires.
[0048] The computer-readable program instructions described herein can be downloaded from computer-readable storage media to various computing / processing devices, or downloaded via a network, such as the Internet, a local area network (LAN), a wide area network (WAN), and / or a wireless network, to an external computer or external storage device. The network may include copper cables, fiber optic cables, wireless transmission, routers, firewalls, switches, gateway computers, and / or edge servers. A network adapter card or network interface in each computing / processing device receives the computer-readable program instructions from the network and forwards them to the computer-readable storage media in the respective computing / processing device.
[0049] Computer program instructions used to perform the operations of this disclosure may be assembly instructions, instruction set architecture (ISA) instructions, machine instructions, machine-dependent instructions, microcode, firmware instructions, status setting data, or source code or object code written in any combination of one or more programming languages, including object-oriented programming languages and conventional procedural programming languages. The computer-readable program instructions may execute entirely on a user's computer, partially on a user's computer, as a standalone software package, partially on a user's computer and partially on a remote computer, or entirely on a remote computer or server. In cases involving a remote computer, the remote computer may be connected to the user's computer via any type of network—including a local area network (LAN) or a wide area network (WAN)—or may be connected to an external computer (e.g., via the Internet using an Internet service provider). In some embodiments, electronic circuitry, such as programmable logic circuitry, field-programmable gate arrays (FPGAs), or programmable logic arrays (PLAs), is personalized by utilizing the status information of the computer-readable program instructions to implement various aspects of this disclosure.
[0050] These computer-readable program instructions can be provided to a processing unit of a general-purpose computer, a special-purpose computer, or other programmable data processing apparatus to produce a machine such that, when executed by the processing unit of the computer or other programmable data processing apparatus, they create means for implementing the functions / actions specified in one or more blocks of the flowchart and / or block diagram. These computer-readable program instructions can also be stored in a computer-readable storage medium that causes a computer, programmable data processing apparatus, and / or other device to operate in a particular manner. Thus, the computer-readable medium storing the instructions comprises an article of manufacture that includes instructions for implementing aspects of the functions / actions specified in one or more blocks of the flowchart and / or block diagram.
[0051] Computer-readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other device to cause a series of operational steps to be performed on the computer, other programmable data processing apparatus, or other device to produce a computer-implemented process, thereby causing the instructions executed on the computer, other programmable data processing apparatus, or other device to perform the functions / actions specified in one or more boxes of a flowchart and / or block diagram.
[0052] The flowcharts and block diagrams in the accompanying drawings illustrate the architecture, functionality, and operation of possible implementations of devices, methods, and computer program products according to various embodiments of the present disclosure. In this regard, each block in a flowchart or block diagram may represent a module, segment, or portion of an instruction containing one or more executable instructions for implementing a specified logical function. In some alternative implementations, the functions marked in the blocks may occur in a different order than those marked in the drawings. For example, two consecutive blocks may actually be executed substantially in parallel, and they may sometimes be executed in reverse order, depending on the functions involved. It should also be noted that each block in the block diagrams and / or flowcharts, and combinations of blocks in the block diagrams and / or flowcharts, may be implemented using a dedicated hardware-based system that performs the specified function or action, or using a combination of dedicated hardware and computer instructions.
[0053] The various embodiments of this disclosure have been described above. These descriptions are exemplary and not exhaustive, and are not limited to the disclosed embodiments. Many modifications and variations will be apparent to those skilled in the art without departing from the scope and spirit of the described embodiments. The terminology used herein is chosen to best explain the principles, practical application, or technical improvements to the technology in the market, or to enable others skilled in the art to understand the embodiments disclosed herein.
Claims
1. A method for delay processing executed by a processing device, comprising: In response to a data persistence operation occurring in the system, a record for the data persistence operation is obtained, wherein the record includes the actual delay of the data persistence operation and a set of metrics of a set of states of the system during a predetermined period in which the data persistence operation occurs; In response to the actual latency being greater than a first threshold, an estimated latency for the data persistence operation is generated based on the set of metrics and using a trained predictor. Determine the difference between the actual delay and the estimated delay; In response to the difference being less than a second threshold, one or more states are identified from the set of states based on the record and the estimated delay; as well as The order of the one or more states is determined based on the degree of influence of the one or more states on the functionality of the system.
2. The method according to claim 1, wherein the states in the set of states belong to one of the following types: Hardware status, I / O stack configuration, or workload mode.
3. The method according to claim 1, further comprising: Generate a first report, wherein the first report includes an indication of the one or more states.
4. The method according to claim 1, further comprising: A second report is generated based on a set of metrics of one or more states at the time the data persistence operation occurs. The second report includes indications of recommended actions to reduce latency.
5. The method according to claim 1, further comprising: A second report is generated based on a set of metrics of one or more states at the time the data persistence operation occurs, and the sorting. The second report includes instructions for recommended actions to reduce latency.
6. The method of claim 1, wherein identifying the one or more states comprises: Determine the contribution of the set of states to the estimated delay when using the predictor to generate the estimated delay; as well as The one or more states are determined based on the contribution amount.
7. The method according to claim 1, further comprising: In response to the difference being greater than the second threshold, the predictor is adjusted using the record.
8. An electronic device, comprising: processor; as well as A memory coupled to the processor, the memory having instructions stored therein, the instructions causing the device to perform actions when executed by the processor, the actions including: In response to a data persistence operation occurring in the system, a record for the data persistence operation is obtained, wherein the record includes the actual delay of the data persistence operation and a set of metrics of a set of states of the system during a predetermined period in which the data persistence operation occurs; In response to the actual latency being greater than a first threshold, an estimated latency for the data persistence operation is generated based on the set of metrics and using a trained predictor. Determine the difference between the actual delay and the estimated delay; In response to the difference being less than a second threshold, one or more states contributing to the estimated delay are identified from the set of states based on the records and the estimated delay; and A first report is generated based on a set of metrics of one or more states at the time the data persistence operation occurs, wherein the first report includes indications of suggested actions for reducing latency associated with the data persistence operation.
9. The device according to claim 8, wherein the action further includes: The order of the one or more states is determined based on the degree of influence of the one or more states on the functionality of the system.
10. The device of claim 9, wherein the state in the set of states belongs to one of the following types: Hardware status, I / O stack configuration, or workload mode.
11. The device according to claim 8, wherein the action further includes: A second report is generated, wherein the second report includes indications of the one or more states.
12. The device according to claim 9, wherein the action further includes: A second report is generated based on a set of metrics of one or more states at the time the data persistence operation occurs, and the sorting. The second report includes instructions for recommended actions to reduce latency.
13. The device of claim 8, wherein identifying the one or more states comprises: Determine the contribution of the set of states to the estimated delay when using the predictor to generate the estimated delay; as well as The one or more states are determined based on the contribution amount.
14. The device according to claim 8, wherein the action further comprises: In response to the difference being greater than the second threshold, the predictor is adjusted using the record.
15. A computer program product tangibly stored on a computer-readable medium and comprising machine-executable instructions that, when executed, cause a machine to perform the method according to any one of claims 1 to 7.