Process state analysis method and device, electronic equipment, storage medium and product

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
By using eBPF technology to collect kernel monitoring data in the Linux system, constructing process event sequences and analyzing the causes, the problem of traditional tools being unable to automatically locate the D state of processes is solved, and fast and automated blocking cause analysis is achieved.

CN122240384APending Publication Date: 2026-06-19UNIONTECH SOFTWARE TECH CO LTD

View PDF 0 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Applications(China)
Current Assignee / Owner: UNIONTECH SOFTWARE TECH CO LTD
Filing Date: 2026-05-20
Publication Date: 2026-06-19

Application Information

Patent Timeline

20 May 2026

Application

19 Jun 2026

Publication

CN122240384A

IPC: G06F11/07; G06F11/30; G06F11/32; G06F18/22; G06F123/02

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

A method, apparatus, device, and storage medium for detecting abnormal states of a process.
CN110825593B
Process abnormal state detection method and device, equipment and storage medium
CN110825593A
Jitter detection method, device and equipment for key task process in operating system
CN118132316A
Method and device for exiting an uninterruptible sleep state process
CN116991559B
Uninterruptible sleep state process exit method and device
CN116991559A

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

⚠Technical Problem

In modern Linux systems, traditional tools lack automation capabilities and struggle to quickly pinpoint the cause of blocking when a process enters an uninterruptible waiting state (D state), leading to service response delays and abnormal system load.

⚗Method used

By installing probe points in the kernel using extended Berkeley Packet Filter (eBPF) technology, system call, I/O request, and kernel function information are collected to construct process event sequences and analyze the reasons why processes are in an uninterruptible waiting state for a long time based on rules.

🎯Benefits of technology

It achieves automated location of process D state, reduces the cost of manual investigation, and can quickly build blocking chains and locate the cause of blocking.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN122240384A_ABST

Patent Text Reader

Abstract

This disclosure provides a process state analysis method, apparatus, electronic device, storage medium, and product. The method includes: when the cumulative waiting time of a target process indicated by a first process identifier in a first storage table exceeds a first time threshold, or the duration of the target process in an uninterruptible waiting state exceeds a second time threshold, obtaining summary information of the target process from the first storage table and input / output request information of the target process from a second storage table; constructing an event sequence of the target process based on the obtained summary information and input / output request information; matching multiple pre-set rules based on the constructed event sequence; and analyzing the reasons why the target process is in an uninterruptible waiting state for a long time based on the matching results of the multiple rules. According to this method, the location of uninterruptible waiting states can be automated, reducing the cost of manual investigation.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This disclosure relates to the field of system operation and maintenance and kernel diagnostics, specifically to a process state analysis method, device, electronic device, storage medium, and product. Background Technology

[0002] In modern Linux systems, many services rely on disk, network storage, or distributed file systems for data read and write. When underlying resources malfunction (e.g., disk failure, network storage congestion, or driver failure), processes may enter an uninterruptible waiting state (uninterruptible sleep state or D state). Processes in this state cannot be interrupted by signals, and even `kill -9` cannot terminate them. This type of problem typically leads to service response delays, request backlogs, and abnormal system load. Therefore, quickly locating the cause of blocking in D-state processes is a crucial issue in system maintenance and kernel diagnostics. However, traditional troubleshooting tools (e.g., ps, top, iostat, and dmesg) provide scattered information that requires manual analysis, lacking automated diagnostic capabilities. Summary of the Invention

[0003] Embodiments of this disclosure relate to a process state analysis method, apparatus, electronic device, storage medium, and product that can solve some or all of the problems described above.

[0004] According to a first aspect of this disclosure, a process state analysis method is provided, comprising: determining that the target process has been in an uninterruptible waiting state for a long time when the sum of the waiting times of a target process indicated by a first process identifier in a first storage table exceeds a first time threshold, or the duration of the target process in an uninterruptible waiting state exceeds a second time threshold; obtaining summary information of the target process from the first storage table and input / output request information of the target process from a second storage table; constructing an event sequence of the target process according to the obtained summary information and the input / output request information in the chronological order of events occurring in the process; matching multiple pre-set rules based on the constructed event sequence; and analyzing the reasons why the target process has been in an uninterruptible waiting state for a long time based on the matching results of the multiple rules.

[0005] Optionally, the first storage table and the second storage table are extended Berkeley package filter key-value tables, and the information in the first storage table and the second storage table is obtained through the export of the ring buffer.

[0006] Optionally, the first storage table stores summary information of the process indicated by the process identifier, wherein the summary information of the process indicated by the process identifier includes at least one of the time of the most recent system call, the type identifier of the most recent system call, the identifier of the most recently executed kernel function, the hash value of the target resource of the most recent operation, and the sum of wait times. The second storage table stores records of incomplete input / output requests including process identifiers and timestamps, wherein the timestamp records the time when the process indicated by the process identifier initiated the input / output request.

[0007] Optionally, based on the acquired summary information and the input / output request information, an event sequence of the target process is constructed according to the chronological order of events occurring in the process, including: based on the acquired summary information and the input / output request information, constructing the event sequence of the target process in the order of initiating a system call event, initiating an input / output event, entering a scheduling wait point event, and the current moment event.

[0008] Optionally, the time and type identifier of the most recent system call included in the summary information are respectively used as the time and type identifier of the system call included in the system call initiation event; the identifier of the most recently executed kernel function included in the summary information is used as the identifier of the kernel function included in the entry scheduling wait point event; the time point at which the acquisition of the summary information and the input / output request information begins is used as the current time included in the current time event; and when the input / output request information includes the record of the input / output request of the target process, the input / output event initiation includes the record of the input / output request of the target process as the record of the input / output request, and the current time event also includes the input / output request completion missing.

[0009] Optionally, based on the matching results of the multiple rules, the reasons why the target process is in an uninterruptible waiting state for a long time are analyzed, including: in response to the matching result matching one of the multiple rules, determining that the reason is the reason pointed to by the one rule; and in response to the matching result matching at least two of the multiple rules, determining that the reason is the reason pointed to by the rule whose confidence level meets the first condition, wherein each of the multiple rules is set with a confidence level.

[0010] Optionally, the multiple rules include at least one of the following: Rule 1: If the system call type identifier of the system call initiating the system call event points to one of read, write, and file synchronization, and the input / output event initiating the input / output event includes a record of block input / output request, and the time elapsed between the current time in the event and the timestamp in the record of the block input / output request initiating the input / output event exceeds a third time threshold, it points to the reason why the block device did not respond. Rule 2: If the system call type identifier of the system call initiating the system call event points to read or write, and the input / output event initiating the input / output event includes a record of network file system input / output request, and the current time event includes a network file system input / output request completion missing, it points to the network file system server. The reasons for no response and rule 3: When the system call type identifier of the system call initiating the system call event points to the fast user space mutex, and the kernel function identifier of the entry into the scheduling wait point event points to the fast user space mutex wait queue and is suspended, the reason for the lock not being released for too long is as follows: According to the record of the target process's input / output requests, including the record of the target process's block input / output requests and / or the record of the target process's network file system input / output requests, the input / output event initiating the input / output event includes the record of the block input / output request and / or the record of the network file system input / output request, and the current event includes the block input / output request completion missing and / or the network file system input / output request completion missing.

[0011] Optionally, analyzing the reasons why the target process is in an uninterruptible waiting state for a long time based on the matching results of the multiple rules further includes: in response to a rule with no matching result, obtaining at least one score for each candidate reason in the candidate reasons for at least one set standard; obtaining a weighted score for each candidate reason for the at least one standard based on at least one weight set for the at least one standard and the at least one score for each candidate reason for the at least one standard; and determining the reason as a candidate reason whose weighted score satisfies the second condition based on the obtained weighted score for each candidate reason.

[0012] Optionally, the candidate reasons include at least one of the following: block device not responding, network file system server not responding, and lock not being released for too long. The at least one criterion includes at least one of the following: Criterion 1: the historical association frequency between the type of the system call type identifier of the system call initiating the system call event and the candidate reasons; Criterion 2: the call stack distance between the kernel function identifier of the kernel function included in the entry scheduling wait point event and multiple blocking kernel functions related to the candidate reasons, wherein the multiple blocking kernel functions include at least one of an input / output call, a network file system read initiation, a network file system write initiation, and entering and suspending a fast user space mutex wait queue; and Criterion 3: the duration of the current time in the record of the input / output request initiating the input / output event, as included in the current time event.

[0013] Optionally, the process state analysis method further includes: periodically or in an event-driven manner checking the sum of waiting times of the process indicated by the process identifier in the first storage table, and the duration of the process in an uninterruptible waiting state.

[0014] Optionally, the process state analysis method further includes: using an extended Berkeley packet filter acquisition module to collect kernel monitoring data by placing multiple probe points in the kernel; and using the collected kernel monitoring data to update a first storage table and a second storage table, wherein the collected kernel monitoring data is recorded according to an event-driven and / or predetermined cycle.

[0015] Optionally, updating the first and second storage tables using collected kernel monitoring data includes: when the process indicated by the process identifier initiates a system call, updating the summary information of the process indicated by the process identifier in the first storage table, including the time and type identifier of the most recent system call, using the acquired system call time and system call type identifier; when the process indicated by the process identifier initiates an input / output request, recording the input / output request information, including resource information, process identifier, and timestamp, in the second storage table as an input / output request record, wherein the input / output request includes a block input / output request or a network file system input / output request; and deleting the input / output request record from the second storage table when the input / output is completed.

[0016] According to a second method of this disclosure, a process state analysis apparatus is provided, comprising: an acquisition module configured to: determine that the target process has been in an uninterruptible waiting state for a long time when the sum of the waiting times of the target process indicated by the first process identifier in a first storage table exceeds a first time threshold, or the duration of the target process in an uninterruptible waiting state exceeds a second time threshold; and acquire summary information of the target process from the first storage table and input / output request information of the target process from a second storage table; a construction module configured to construct an event sequence of the target process according to the acquired summary information and the input / output request information, in the chronological order of events occurring in the process; a matching module configured to match multiple pre-set rules based on the constructed event sequence; and an analysis module configured to analyze the reasons why the target process has been in an uninterruptible waiting state for a long time based on the matching results of the multiple rules.

[0017] Optionally, the first storage table and the second storage table are extended Berkeley package filter key-value tables, and the information in the first storage table and the second storage table is obtained through the export of the ring buffer.

[0018] Optionally, the first storage table stores summary information of the process indicated by the process identifier, wherein the summary information of the process indicated by the process identifier includes at least one of the time of the most recent system call, the type identifier of the most recent system call, the identifier of the most recently executed kernel function, the hash value of the target resource of the most recent operation, and the sum of wait times. The second storage table stores records of incomplete input / output requests including process identifiers and timestamps, wherein the timestamp records the time when the process indicated by the process identifier initiated the input / output request.

[0019] Optionally, the construction module is configured to: construct the event sequence of the target process according to the acquired summary information and input / output request information, in the order of initiating system call events, initiating input / output events, entering scheduling wait point events, and current time events.

[0020] Optionally, the time and type identifier of the most recent system call included in the summary information are respectively used as the time and type identifier of the system call included in the system call initiation event; the identifier of the most recently executed kernel function included in the summary information is used as the identifier of the kernel function included in the entry scheduling wait point event; the time point at which the acquisition of the summary information and the input / output request information begins is used as the current time included in the current time event; and when the input / output request information includes the record of the input / output request of the target process, the input / output event initiation includes the record of the input / output request of the target process as the record of the input / output request, and the current time event also includes the input / output request completion missing.

[0021] Optionally, the analysis module is configured to: determine the cause as the cause pointed to by the rule in response to the matching result being a match of one of the multiple rules; and determine the cause as the cause pointed to by the rule whose confidence level satisfies the first condition in response to the matching result being a match of at least two of the multiple rules, wherein each of the multiple rules is set with a confidence level.

[0022] Optionally, the multiple rules include at least one of the following: Rule 1: If the system call type identifier of the system call initiating the system call event points to one of read, write, and file synchronization, and the input / output event initiating the input / output event includes a record of block input / output request, and the time elapsed between the current time in the event and the timestamp in the record of the block input / output request initiating the input / output event exceeds a third time threshold, it points to the reason why the block device did not respond. Rule 2: If the system call type identifier of the system call initiating the system call event points to read or write, and the input / output event initiating the input / output event includes a record of network file system input / output request, and the current time event includes a network file system input / output request completion missing, it points to the network file system server. The reasons for no response and rule 3: When the system call type identifier of the system call initiating the system call event points to the fast user space mutex, and the kernel function identifier of the entry into the scheduling wait point event points to the fast user space mutex wait queue and is suspended, the reason for the lock not being released for too long is as follows: According to the record of the target process's input / output requests, including the record of the target process's block input / output requests and / or the record of the target process's network file system input / output requests, the input / output event initiating the input / output event includes the record of the block input / output request and / or the record of the network file system input / output request, and the current event includes the block input / output request completion missing and / or the network file system input / output request completion missing.

[0023] Optionally, the analysis module is further configured to: in response to a rule where the matching result is no match, obtain at least one score for each candidate reason among the candidate reasons for at least one set standard; obtain a weighted score for each candidate reason for the at least one standard based on at least one weight set for the at least one standard and the at least one score for each candidate reason for the at least one standard; and determine the reason as a candidate reason whose weighted score satisfies a second condition based on the obtained weighted score for each candidate reason.

[0024] Optionally, the candidate reasons include at least one of the following: block device not responding, network file system server not responding, and lock not being released for too long. The at least one criterion includes at least one of the following: Criterion 1: the historical association frequency between the type of the system call type identifier of the system call initiating the system call event and the candidate reasons; Criterion 2: the call stack distance between the kernel function identifier of the kernel function included in the entry scheduling wait point event and multiple blocking kernel functions related to the candidate reasons, wherein the multiple blocking kernel functions include at least one of an input / output call, a network file system read initiation, a network file system write initiation, and entering and suspending a fast user space mutex wait queue; and Criterion 3: the duration of the current time in the record of the input / output request initiating the input / output event, as included in the current time event.

[0025] Optionally, the process state analysis apparatus further includes: an inspection module configured to periodically or in an event-driven manner inspect the sum of waiting times of the process indicated by the process identifier in the first storage table, and the duration of the process in an uninterruptible waiting state.

[0026] Optionally, the process state analysis apparatus further includes: an update module configured to collect kernel monitoring data by placing multiple probe points in the kernel using an extended Berkeley packet filter collection module; and to update a first storage table and a second storage table using the collected kernel monitoring data, wherein the collected kernel monitoring data is recorded according to an event-driven and / or predetermined period.

[0027] Optionally, the update module is configured to: when the process indicated by the process identifier initiates a system call, update the summary information of the process indicated by the process identifier in the first storage table, including the time and type identifier of the most recent system call, using the obtained system call time and system call type identifier; when the process indicated by the process identifier initiates an input / output request, record the input / output request information, including resource information, process identifier, and timestamp, in the second storage table as an input / output request record, wherein the input / output request includes a block input / output request or a network file system input / output request; and when the input / output is completed, delete the input / output request record from the second storage table.

[0028] According to a third aspect of this disclosure, an electronic device is provided, comprising: at least one processor; and at least one memory storing computer-executable instructions, wherein, when executed by the at least one processor, the computer-executable instructions cause the at least one processor to perform the process state analysis method as described above.

[0029] According to a fourth aspect of this disclosure, a computer-readable storage medium is provided having a computer program stored thereon that, when executed, implements the process state analysis method as described above.

[0030] According to a fifth aspect of this disclosure, a computer program product is provided, wherein instructions in the computer program product are executed by at least one processor in an electronic device to perform the process state analysis method as described above.

[0031] The technical solution provided by the embodiments of this disclosure produces at least the following beneficial effects: by acquiring the process summary information and I / O request information, the event sequence of the process is constructed, and the reason why the process is in the D state for a long time is analyzed based on the process event sequence matching rules. The blocking chain can be automatically constructed and the blocking reason can be automatically located when the process is in the D state for a long time, thereby realizing the automation of D state location and reducing the cost of manual investigation.

[0032] The technical solutions provided by the embodiments of this disclosure can also produce the following beneficial effects: by associating system calls, initiating I / O requests, entering scheduling wait points and current events to rebuild the blocking chain, it is convenient to analyze the cause of blocking; process-level summary information is continuously maintained in kernel mode with low overhead to facilitate rapid location; and by adopting a strategy of deriving summary information and events (or problems), configurable probe points are supported to ensure controllability and event-driven and / or predetermined periodic recording is supported to ensure low overhead.

[0033] It should be understood that the above general description and the following detailed description are exemplary and explanatory only, and are not intended to limit this disclosure. Attached Figure Description

[0034] The accompanying drawings, which are incorporated in and form part of this specification, illustrate embodiments consistent with this disclosure and, together with the description, serve to explain the principles of this disclosure, and are not intended to unduly limit this disclosure.

[0035] Figure 1 A system architecture diagram of a process state analysis method according to an embodiment of the present disclosure is shown.

[0036] Figure 2 A schematic diagram of the kernel acquisition process according to an embodiment of this disclosure is shown.

[0037] Figure 3 A flowchart illustrating a process state analysis method according to an embodiment of the present disclosure is shown.

[0038] Figure 4 A schematic diagram of a user-mode analysis process according to an embodiment of the present disclosure is shown.

[0039] Figure 5 A schematic diagram of a process state analysis apparatus according to an embodiment of the present disclosure is shown.

[0040] Figure 6 A block diagram of an electronic device according to an embodiment of the present disclosure is shown. Detailed Implementation

[0041] To enable those skilled in the art to better understand the technical solutions of this disclosure, the technical solutions in the embodiments of this disclosure will be clearly and completely described below with reference to the accompanying drawings.

[0042] It should be noted that the terms "first," "second," etc., used in the specification, claims, and accompanying drawings of this disclosure are used to distinguish similar objects and are not necessarily used to describe a specific order or sequence. It should be understood that such data can be interchanged where appropriate so that the embodiments of this disclosure described herein can be implemented in orders other than those illustrated or described herein. The embodiments described in the following examples do not represent all embodiments consistent with this disclosure. Rather, they are merely examples of apparatuses and methods consistent with some aspects of this disclosure as detailed in the appended claims.

[0043] It should be noted that the phrase "at least one of several items" in this disclosure refers to three parallel cases: "any one of the several items", "a combination of any number of the several items", and "all of the several items". For example, "including at least one of A and B" includes the following three parallel cases: (1) including A; (2) including B; (3) including A and B. Another example is "performing at least one of step one and step two", which means the following three parallel cases: (1) performing step one; (2) performing step two; (3) performing both step one and step two.

[0044] The Uninterruptible Sleep (D) state is a Linux process state indicating that the process is in an uninterruptible sleep, typically occurring when waiting for input / output (I / O) resources. Processes in this state cannot be interrupted by signals, and even `kill -9` cannot terminate them. This type of problem usually leads to: delayed service response, request backlog, and abnormal system load. In existing Linux systems, locating processes in the D state typically involves the following steps: 1. Using `ps` to check the process state (`ps -eo pid,state,cmd | grep "D"`); 2. Using ` / proc / ` <pid>Use ` / stack` to view the process's kernel stack (using `cat / proc / `). <pid>3. Use iostat to check disk I / O (iostat -x 1); 4. Use dmesg to check kernel logs (dmesg | grep error). With this information, engineers can infer that a process may be blocked in: disk I / O, network file system, kernel locks, and / or driver layer. In addition, some tools (such as bcc and bpftrace) can use eBPF to trace system calls or I / O events, for example: bpftrace -e 'tracepoint:block:block_rq_issue { printf("pid=%d\\n", pid);}'. This method can collect system runtime events.

[0045] However, the above methods have the following problems: 1. The diagnostic process relies on human experience: Engineers need to manually collect various types of information and perform comprehensive analysis, making the diagnostic process time-consuming and experience-dependent. 2. Lack of call chain context: Traditional tools can only see the current call stack and cannot construct a complete call-wait relationship chain. 3. Inability to automatically identify blocked resources: Even if it can see that a process is waiting for I / O, it is difficult to determine: the specific device, the specific request, and the blocking time. 4. Limited monitoring granularity: Existing tools usually only provide instantaneous information and lack long-term runtime monitoring capabilities.

[0046] With the development of Linux's extended Berkeley Packet Filter (eBPF) technology, system calls, I / O requests, and kernel function execution can be dynamically collected during kernel runtime, providing a foundation for automated diagnostics. eBPF is a secure virtual machine mechanism running within the Linux kernel, capable of dynamically loading programs for event tracing and data collection without modifying kernel code. An eBPF key-value table (Map) is a key-value storage structure for shared data between eBPF programs and user-space programs. Block I / O is the mechanism by which the Linux kernel's block device layer handles disk read / write requests. (See appendix below.) Figures 1 to 5 This disclosure provides a method, apparatus, electronic device, storage medium, and product for analyzing process states.

[0047] Figure 1 A system architecture diagram of a process state analysis method according to an embodiment of the present disclosure is shown.

[0048] Reference Figure 1 The system architecture diagram of the process state analysis method disclosed herein includes the following components (or modules): eBPF acquisition module (kernel mode) 110, kernel mode key-value table (Map) / ring buffer (Ringbuffer) 120, user mode analyzer 130, and output module / alarm / automatic processing 140.

[0049] eBPF Acquisition Module (Kernel Mode) 110: Installs several probes (or tracepoints) in the kernel. The location of these probes is configurable. Probes can include syscall tracepoints, block I / O tracepoints, and kernel probes (kprobes). This module is responsible for collecting process system calls, kernel wait events, and I / O operation information. The collected information (e.g., kernel monitoring data) can be used to populate or update the HotMap and Issue_map.

[0050] The kernel-mode Map / Ringbuffer 120 can be divided into two sub-parts: the kernel-mode Map and the RingBuffer. The kernel-mode Map uses an eBPF Map to maintain the following two types of data structures: - HotMap: Stores summary information of a process, including its most recent system calls, kernel wait points (e.g., the identifier of the most recently executed kernel function), target resource signatures (e.g., the hash of the target resource in the most recently operated operation), and timestamps (e.g., the time of the most recent system call), using the process identifier (PID) as the key.

[0051] - issue_map: Records information about incomplete I / O requests (also known as I / O request information) with resource identifiers (e.g., device number + sector number) as keys.

[0052] The RingBuffer provides an efficient event transmission mechanism for sending event data (i.e., user-space event streams) from the kernel to user space. For example, information in HotMap and issue_map is exported to user space via RingBuffer for viewing by the user.

[0053] User-mode analyzer 130: Reads data from kernel-mode Map and / or RingBuffer, and when it detects that a process has been in the D state for a long time or the cumulative waiting time exceeds the threshold, it rebuilds the call-wait relationship chain (or blocking chain) and generates a location report.

[0054] Output Module / Alarm / Automated Processing 140: This module can output a location report, which may include the PID of the problematic process, the call-wait chain, and the reasons for prolonged periods in the D state. Additionally, this module can output alarms as needed, such as those based on Prometheus and / or Alertmanager, and perform automated processing (e.g., triggering scripts, logistics). Prometheus alarms are an automated monitoring and alarm system based on time-series data, triggering anomaly notifications through real-time analysis of indicator data. Alertmanager is the alarm event processing center within Prometheus, responsible for secondary processing and intelligent distribution of alarms triggered by Prometheus.

[0055] It should be understood that the system architecture and modules of the above process status analysis method are merely examples. Other modules may be added or removed from this system architecture (e.g., output module / alarm / automatic processing 140), and this disclosure is not limited thereto.

[0056] In the embodiments of this disclosure, kernel monitoring data is collected during kernel runtime, and then user-mode analysis is performed based on the kernel monitoring data, so that the cause can be automatically analyzed in the case of an abnormal situation in which a process is in the D state for a long time.

[0057] According to embodiments of this disclosure, an extended Berkeley packet filter acquisition module is used to collect kernel monitoring data by placing multiple probe points in the kernel; the collected kernel monitoring data is then used to update a first storage table and a second storage table.

[0058] In the embodiments disclosed herein, the eBPF acquisition module (kernel mode) 110 may install multiple eBPF probes (or tracepoints) in the kernel. These probes may include syscall tracepoints, block I / O tracepoints, and kprobes. Syscall tracepoints are a predefined set of probes in the Linux kernel, specifically designed to trace the entry points that can be called (sys_enter_...). ) and exit (sys_exit_ Block I / O tracepoints are a predefined set of probe points in the Linux kernel's block I / O layer, used to trace the entire lifecycle of data during disk reads and writes. kprobes, on the other hand, are a dynamic tracing technique provided by the Linux kernel, allowing probe code to be dynamically inserted at the entry or exit point of almost any kernel function, or anywhere in the kernel instruction stream, for debugging, performance analysis, and tracing. Probe points are configurable. The events collected can include at least one of the following: system calls, I / O requests, and kernel wait points.

[0059] According to embodiments of this disclosure, the first storage table and the second storage table are extended Berkeley package filter key-value tables, and the information in the first storage table and the second storage table is obtained through the derivation of a ring buffer.

[0060] According to embodiments of this disclosure, a first storage table stores summary information of a process indicated by a process identifier, wherein the summary information of the process indicated by the process identifier includes at least one of the time of the most recent system call, the type identifier of the most recent system call, the identifier of the most recently executed kernel function, the hash value of the target resource of the most recent operation, and the sum of wait times. A second storage table stores records of incomplete input / output requests including process identifiers and timestamps, wherein the timestamp records the time when the process indicated by the process identifier initiated the input / output request.

[0061] In embodiments of this disclosure, the first storage table may be, for example, but not limited to, a HotMap. The HotMap is used to record summary information (or recent behavior) of processes. The HotMap uses the PID as the key, and the stored information (i.e., summary information) may include at least one of the following: the time of the process's most recent system call indicated by the PID (last_syscall_ts), the type identifier of the most recent system call (last_syscall_id), the identifier of the most recently executed kernel function (last_kernel_fn_id), the hash value of the target resource of the most recent operation (last_target_hash), and the cumulative wait time (cumulative_wait_ms). For example, the summary information may include: last_syscall_ts, last_syscall_id, last_kernel_fn_id, last_target_hash, and cumulative_wait_ms. A Least Recently Used (LRU) / Time to Live (TTL) cleanup strategy can be used on the HotMap to periodically clean up PID entries that have not been updated for a long time, preventing the Map from bloating.

[0062] In embodiments of this disclosure, the second storage table may be, for example, but not limited to, an issue_map. The issue_map records incomplete I / O requests (hereinafter also referred to as I / O requests) using a resource identifier (resource_key) as the key. These I / O requests may include block I / O requests or Network File System (NFS) I / O requests. Records of incomplete I / O requests (hereinafter also referred to as I / O request records or I / O request information) may include at least one of resource information, a PID, and a timestamp (ts_ns). The resource information of records of incomplete block I / O requests (hereinafter also referred to as block I / O request records or block I / O request information) may include a device number (dev) and a sector number, and the resource information of records of incomplete NFS I / O requests (hereinafter also referred to as NFS I / O request records or NFS I / O request information) may include a network device file handle (fhandle) and an offset value (offset). For example, a data structure recorded in an Issue_map may include: pid, ts_ns, dev, sector, fhandle, and offset. This data structure can record block I / O requests or NFS I / O requests. The resource_key can be dev:sector or fhandle:offset, and additional fields can be added to distinguish between block I / O requests and NFS I / O requests, facilitating lookups based on the resource_key. Furthermore, if a process indicated by a PID has both block I / O and NFS I / O requests, two such data structures in the Issue_map may be needed to record block I / O requests and NFS I / O requests respectively. Therefore, a process indicated by a PID may require one or more Issue_maps (or one or more data structures of Issue_map). As another example, an issue_map can include both data structures recording block I / O requests and data structures recording NFS I / O requests. The data structure for recording block I / O requests includes pid, ts_ns, dev, and sector, where resource_key can be dev:sector. The data structure for recording NFS I / O requests includes pid, ts_ns, fhandle, and offset, where resource_key can be fhandle:offset.

[0063] This disclosure uses eBPF Maps (e.g., HotMap and issue_map) as examples for the first and second storage tables; however, this disclosure is not limited thereto. The first storage table can be any storage table storing summary information of the process (recent actions and / or recent status), and the second storage table can be any storage table storing records of incomplete I / O requests. For example, an LRU hash key-value table (LRU Hash Map) can also be used instead of an eBPF Map.

[0064] According to embodiments of this disclosure, updating the first and second storage tables using collected kernel monitoring data may include: when a process indicated by a process identifier initiates a system call, updating the summary information of the process indicated by the process identifier in the first storage table, including the time and type identifier of the most recent system call, using the acquired system call time and system call type identifier; when a process indicated by a process identifier initiates an input / output request, recording the input / output request information, including resource information, process identifier, and timestamp, in the second storage table as an input / output request record, wherein the input / output request includes a block input / output request or a network file system input / output request; and deleting the input / output request record from the second storage table when the input / output is completed.

[0065] Figure 2 A schematic diagram of the kernel acquisition process according to an embodiment of this disclosure is shown.

[0066] Reference Figure 2 The kernel data collection process is as follows: Install probes for data collection 210: The eBPF acquisition module (kernel mode) can install multiple eBPF probes in the kernel to collect kernel monitoring data. The probes may include: syscall tracepoints, block I / O tracepoints and / or Kprobes.

[0067] Update eBPF Map 220: Update HotMap and issue_map using collected kernel monitoring data.

[0068] Exporting information via Ringbuffer 230: Exporting information from HotMap and issue_map via Ringbuffer into a user-space event stream for analysis in user space.

[0069] Kernel-collected sample events may include the following steps: Step 1: At the system call entry point, update the type identifier (last_syscall_id) and time (last_syscall_ts) of the most recent system call for the corresponding PID in the HotMap. For example, at the write system call entry point (sys_enter_write), obtain the timestamp and the identifier of the system call type "write", update the summary information of the process indicated by the PID in the HotMap, specifically, update the last_syscall_ts and last_syscall_id of the process indicated by the PID accordingly.

[0070] Step 2: At the block I / O initiation point (block_rq_issue), record the request information (PID, resource information, and timestamp) in issue_map. The PID is the identifier of the process that initiated the block I / O request, the resource information includes the device number (dev) and sector number (sector), and the timestamp (ts_ns) records the time when the process initiated the block I / O request.

[0071] Step 3: At the block I / O completion point (block_rq_complete), delete the corresponding request record (or request information) from the issue_map.

[0072] Step 4: For requests that have not been completed for a long time, the events can be exported to user space through RingBuffer for in-depth analysis.

[0073] In steps 2 and 3, we will use block I / O requests as an example for explanation. I / O requests can also be NFS I / O requests. At the point where an NFS I / O request is initiated, the resource information recorded in the issue_map in step 2 includes the network device file handle (fhandle) and offset. Furthermore, only incomplete I / O requests (including incomplete block I / O requests and / or incomplete NFS I / O requests) are recorded in the issue_map, while records of completed I / O requests are deleted from the issue_map. Therefore, when an incomplete I / O request is recorded in the issue_map, the corresponding I / O request completion is missing.

[0074] In embodiments of this disclosure, the collected kernel monitoring data may further include the kernel functions executed by the process and the resources operated by the process, thereby updating the PID in HotMap to indicate the identifier (last_kernel_fn_id) of the most recently executed kernel function of the process and the hash value (last_target_hash) of the most recently operated target resource. Additionally, the PID in HotMap indicates the cumulative wait time (cumulative_wait) of the process, representing the sum of all blocked wait times of the process; it is the sum of blocked wait times collected multiple times.

[0075] According to embodiments of this disclosure, the collected kernel monitoring data is recorded according to event-driven and / or predetermined cycles.

[0076] In embodiments of this disclosure, the collected kernel monitoring data is recorded event-driven (e.g., recorded to HotMap and / or issue_map). For example, events can include a process initiating a system call, a process initiating an I / O request, etc., and data is recorded whenever the event occurs. The data can also be recorded periodically (e.g., recorded to HotMap and / or issue_map). Here, periodicity can also be characterized by a sampling rate. The collected kernel monitoring data can also be recorded both event-driven and at predetermined periods. For example, detailed information can be recorded when a process initiates an I / O request, while simultaneously recording at a sampling rate (e.g., 1%, meaning only 1 record out of 100 data collections) to reduce runtime overhead.

[0077] In the embodiments disclosed herein, tracefs can be used to collect kernel monitoring data. Tracefs is a virtual file system provided by the kernel that exposes various interfaces for kernel tracing. By reading and writing these files, tracing can be controlled or data can be acquired.

[0078] Figure 3 A schematic diagram of a user-mode analysis process according to an embodiment of the present disclosure is shown.

[0079] Reference Figure 3 The user-mode analysis process is as follows: Monitoring 310: A monitoring module (e.g., the monitoring module in user-mode analyzer 130) can periodically scan (e.g., poll) the HotMap exported by Ringbuffer to check whether the sum of the waiting times of the processes indicated by the PIDs in the HotMap exceeds a first time threshold. The monitoring module can also periodically check whether the duration of a process in the D state exceeds a second time threshold. Alternatively, the check can be event-driven, such as a "user click" event that prompts the check. After the event occurs, it can also check whether the sum of the waiting times of the processes indicated by the PIDs in the HotMap exceeds the first time threshold and whether the duration of the process in the D state exceeds the second time threshold. The first and second time thresholds are configurable, and the first time threshold can be the same as or different from the second time threshold; for example, the first time threshold can be, for example, but not limited to, 60 seconds.

[0080] Triggering condition 320: The analysis process is triggered when the cumulative wait time (cumulative_wait) of a process indicated by a certain PID (target PID or first PID) (i.e., the target process) in the HotMap exceeds a first time threshold, or when the duration of uninterruptible waiting in state D by a certain PID exceeds a second time threshold. Here, if the cumulative wait time of a process exceeds the first time threshold, or if the duration of uninterruptible waiting in state D by a process exceeds the second time threshold, it can be determined that the process has been in state D for an extended period of time.

[0081] Get summary information and I / O request information 330: Get the summary information of the target process in HotMap and the I / O request information of the target process in issue_map.

[0082] Construct event sequence 340: Construct the event sequence of the target process in the order of initiating system call event, initiating I / O event, entering scheduling wait point event, and current time event, and based on the constructed event sequence, analyze the reasons why the target process is in state D for a long time using multiple pre-set rules.

[0083] Output report 350: Output a location report, which may include the PID of the blocked process, the blocking chain (i.e., the constructed sequence of events), and the reason for being in the D state for a long time.

[0084] Based on the above user-state analysis process, by monitoring (or checking) the cumulative waiting time of processes and the duration of processes in the D state, processes that have been in the D state for a long time can be detected; and by obtaining the process's summary information and I / O request information to construct an event sequence, a blocking chain can be automatically constructed and the cause of the blocking can be automatically located, thereby automating the location of the D state and reducing the cost of manual investigation.

[0085] Figure 4 A flowchart illustrating a process state analysis method according to an embodiment of the present disclosure is shown.

[0086] Reference Figure 4 In step S410, if it is determined that the cumulative waiting time of the target process indicated by the first process identifier in the first storage table exceeds the first time threshold, or the duration of the target process in the uninterruptible waiting state exceeds the second time threshold, it is determined that the target process has been in the uninterruptible waiting state for a long time, and the summary information of the target process is obtained from the first storage table and the input / output request information of the target process is obtained from the second storage table.

[0087] According to embodiments of this disclosure, the cumulative wait time of the process indicated by the process identifier in the first storage table and the duration of the process in an uninterruptible wait state are periodically or event-drivenly checked.

[0088] In embodiments of this disclosure, the monitoring module (refer to...) Figure 3 The monitoring (310) can periodically or in an event-driven manner check the cumulative waiting time of the process indicated by the PID in the HotMap, as well as the duration of the process in the D state.

[0089] In embodiments of this disclosure, if it is determined (or detected) that the cumulative waiting time of a process (i.e., the target process) indicated by a certain PID (target PID or first PID) in HotMap exceeds a first time threshold, or if the duration of a process indicated by a certain PID in the D state exceeds a second time threshold, it is determined that the process indicated by that PID (i.e., the target process) has been in an uninterruptible waiting state for a long time, and the analysis process is triggered. When a process is in the D state, because the blocking is not completed, the cumulative waiting time of all blocked waiting times of the process will continue to increase. When the cumulative waiting time of a process exceeds the first time threshold, it can be considered that the process has been in the D state for a long time. Additionally, it is also possible to detect whether the duration of a process in the D state exceeds the second time threshold. For example, by repeatedly viewing / proc / <pid>By combining ` / stack` with the timestamps of each check, it's possible to determine if a process has been in the D state for an extended period. Another example is the hung task detection mechanism. Hung tasks are a Linux kernel mechanism used to detect processes that have been in the D state for an extended period. They periodically iterate through all processes in the system to check for processes that have been in the D state for more than a set threshold time.

[0090] In embodiments of this disclosure, a user-space daemon (Agent) can periodically scan / proc (process file system) data to discover processes that have been in the D state for a long time and / or detect whether a process has been in the D state for a long time.

[0091] It should be understood that the above methods for detecting whether a process is in state D for an extended period are merely examples, and this disclosure is not limited thereto.

[0092] In the embodiments of this disclosure, the summary information of the target process in the HotMap and the I / O request information of the target process in the issue_map can be obtained. Specifically, based on the trigger time, the time of the most recent system call (last_syscall_ts), the type identifier of the most recent system call (last_syscall_id), and the identifier of the most recently executed kernel function (last_kernel_fn_id) of the target PID can be extracted from the HotMap, and the I / O request information (or events) of the target process can be filtered out from the issue_map (or Ringbuffer). Here, since the issue_map records incomplete I / O requests, if the target process has incomplete I / O requests, the issue_map includes records of the incomplete I / O requests of the target process, and the I / O request information of the target process obtained from the issue_map includes records of the target process's I / O requests.

[0093] In step S420, based on the acquired summary information and input / output request information, the event sequence of the target process is constructed according to the chronological order in which the events occur in the process.

[0094] According to embodiments of this disclosure, constructing an event sequence of a target process based on the acquired summary information of the target process and the input / output request information of the target process, in accordance with the chronological order of events occurring in the process, may include: constructing an event sequence of the target process in the order of initiating a system call event, initiating an input / output event, entering a scheduling wait point event, and the current moment event, based on the acquired summary information of the target process and the input / output request information of the target process.

[0095] According to embodiments of this disclosure, the time of the most recent system call and the type identifier of the most recent system call included in the summary information of the target process are respectively used as the time of the system call and the type identifier of the system call included in the system call initiation event. The identifier of the most recently executed kernel function included in the summary information of the target process is used as the identifier of the kernel function included in the entry scheduling wait point event. The time point at which the acquisition of the summary information of the target process and the input / output request information begins is used as the current time included in the current time event. When the input / output request information includes the record of the input / output request of the target process, the input / output event initiation includes the record of the input / output request of the target process as the record of the input / output request, and the current time event also includes the input / output request completion missing.

[0096] In the embodiments of this disclosure, an event sequence (hereinafter also referred to as an event sequence) of the process can be constructed based on the summary information of the target process (the first PID or the process indicated by the target PID) obtained from HotMap and issue_map, respectively, and the I / O request information of the target process. The constructed event sequence is as follows: [Initiate system call] → [Initiate I / O] → [Enter scheduling wait point] → [Current time]. The constructed initiate system call event may include the time of the system call and the type identifier of the system call; the constructed enter scheduling wait point event may include the identifier of the kernel function; and the constructed current time event may include the current time. Here, the time of the most recent system call (last_syscall_ts) and the type identifier of the most recent system call (last_syscall_id) included in the target process's summary information can be used as the time and type identifier of the system call included in the system call event, respectively. The identifier of the most recently executed kernel function (last_kernel_fn_id) included in the target process's summary information can be used as the identifier of the kernel function included in the event of entering the scheduling wait point. Furthermore, the time point that triggers the analysis process (e.g., the time point when the acquisition of summary information and I / O request information begins) can be used as the current time point included in the current time event. Constructing the event here can be done by filling in or placing the parameters included in HotMap and issue_map under the corresponding event as the parameters corresponding to that event.

[0097] If the acquired I / O request information includes records of I / O requests from the target process, the constructed I / O initiation event can include these records, and correspondingly, the constructed current-time event can also include instances where I / O request completion is missing. Here, the records of I / O requests from the target process can be included in the I / O request initiation event. These records can include at least a timestamp (ts_ns). If the acquired I / O request information does not include records of I / O requests from the target process, or if the acquired I / O request information is empty (this may be because the issue_map does not contain records of I / O requests from the target process), the constructed I / O initiation event can be empty.

[0098] Specifically, the record of I / O requests of the target process may include the record of block I / O requests and / or the record of NFS I / O requests of the target process. Correspondingly, the constructed I / O initiation event may include the record of block I / O requests and / or the record of NFS I / O requests, and the constructed current-moment event may also include the block I / O request completion missing and / or the NFS I / O request completion missing. If the record of I / O requests of the target process includes the record of block I / O requests of the target process, the record of block I / O requests of the target process can be used as the record of block I / O requests included in the initiation I / O event, and the constructed current-moment event may also include the block I / O request completion missing. If the record of I / O requests of the target process includes the record of NFS I / O requests of the target process, the record of NFS I / O requests of the target process can be used as the record of NFS I / O requests included in the initiation I / O event, and the constructed current-moment event may also include the NFS I / O request completion missing.

[0099] In the embodiments of this disclosure, if the type identifier of the most recent system call included in the target process's summary information points to a fast userspace mutex (futex) (i.e., last_syscall_id is futex), the build-initiated I / O event can be skipped in the event sequence of the build process, or the build-initiated I / O event can be empty. Furthermore, since the process has been in the D state for a long time, and last_syscall_id is futex, this could be due to the lock not being released for a long time, and the process will not initiate an I / O request (the obtained I / O request information does not include the target process's I / O request record, or the obtained I / O request information is empty), resulting in an empty build-initiated I / O event. In this case, the identifier of the most recently executed kernel function (last_kernel_fn_id) included in the target process's summary information can be used as the identifier of the kernel function included in the entry scheduling wait point event.

[0100] In step S430, multiple pre-set rules are matched based on the constructed event sequence.

[0101] In step S440, the reasons why the target process is in an uninterruptible waiting state for a long time are analyzed based on the matching results of multiple rules.

[0102] According to embodiments of this disclosure, analyzing the reasons why a target process is in an uninterruptible waiting state for a long time based on the matching results of multiple rules may include: determining the reason as the reason pointed to by the one rule in response to the matching result matching one of the multiple rules; and determining the reason as the reason pointed to by the rule whose confidence level meets the first condition in response to the matching result matching at least two of the multiple rules, wherein each of the multiple rules is set with a confidence level.

[0103] According to embodiments of this disclosure, multiple rules include at least one of the following: Rule 1: If the type identifier of the system call included in the system call initiating the system call event points to one of read, write, and file synchronization, and the input / output event initiating the input / output event includes a record of block input / output request, and the duration from the current time included in the current time event to the timestamp in the record of block input / output request included in the input / output event exceeds a third time threshold, it points to the reason why the block device did not respond. Rule 2: If the type identifier of the system call included in the system call initiating the system call event points to read or write, and the input / output event initiating the input / output event includes a record of network file system input / output request, and the current time event includes a record of missing network file system input / output request completion, it points to the network file system input / output request completion. The reasons for the system server not responding, and rule 3: the system call type identifier of the system call initiating the system call event points to the fast user space mutex, and the kernel function identifier of the entry into the scheduling wait point event points to the fast user space mutex wait queue and is suspended, the reason why the lock has not been released for too long, where, according to the target process's input / output request record, including the target process's block input / output request record and / or the target process's network file system input / output request record, the input / output event initiating the input / output event includes the block input / output request record and / or the network file system input / output request record, and the current event includes the block input / output request completion missing and / or the network file system input / output request completion missing.

[0104] In the embodiments of this disclosure, the following multiple rules (e.g., ordered rules) are applied to the constructed event sequence, and each rule is matched one by one. If one rule is matched, the corresponding conclusion (i.e., the reason for being in state D for a long time) is output. If multiple rules are matched simultaneously, they are sorted in descending order of confidence and the highest-scoring item is selected, and the corresponding conclusion is output. Rule R1 (Block Device I / O Timeout): The system call type identifier of the system call initiating the system call event points to one of read, write, and file synchronization fsync (i.e., last_syscall_id ∈ {write, read, fsync}), the I / O event initiating the I / O event includes a record of a block I / O request (i.e., there is a record of an incomplete block I / O request with resource_key(dev:sector) as the key), and the current time in the event includes the time elapsed between the current time and the timestamp in the record of the block I / O request initiating the I / O event exceeds a third time threshold (i.e., the time elapsed between ts_ns in the record of the block I / O request and the configurable third time threshold). The conclusion pointed to by Rule R1 is: the block device did not respond, and Rule R1 can set the confidence level of C1.

[0105] Rule R2 (NFS Timeout): The system call type identifier of the system call initiating the system call event points to read or write (i.e., last_syscall_id ∈ {write, read}), the I / O event initiating the I / O event includes a record of an NFS I / O request (i.e., a record of an incomplete NFS I / O request with resource_key(fhandle:offset) as the key), and if the current event includes an incomplete NFS I / O request, it points to the reason why the NFS server is unresponsive. The conclusion of Rule R2 is: the NFS server is unresponsive, and Rule R2 can set the confidence level of C2.

[0106] Rule R3 (Lock Contention): The system call type identifier of the system call initiating the system call event points to the fast user-space mutex futex (i.e., last_syscall_id is futex), and the identifier of the kernel function in the entry into the scheduling wait point event points to entering the futex wait queue and being suspended (i.e., last_kernel_fn_id is (or corresponds to) the futex_wait_queue_me kernel function). The conclusion pointed to by Rule R3 is: the lock has not been released for a long time, and Rule R2 can set the confidence level of C3.

[0107] In embodiments of this disclosure, the values of confidence levels C1, C2, and C3 can be set empirically, for example, confidence levels C1, C2, and C3 are 0.95, 0.90, and 0.85, respectively. Here, the first condition can be, for example, but not limited to, the maximum confidence value; that is, the rule whose confidence level satisfies the first condition can be the rule with the highest confidence level. If at least two rules match, the conclusion corresponding to the rule with the highest confidence level can be output. Here, the confidence level values corresponding to the rules are merely examples, and this disclosure is not limited thereto.

[0108] According to embodiments of this disclosure, analyzing the reasons why a target process is in an uninterruptible waiting state for a long time based on the matching results of multiple rules may further include: in response to a rule that has no matching result, obtaining at least one score for each candidate reason in the candidate reasons for at least one set standard; obtaining a weighted score for each candidate reason for at least one standard based on at least one weight set for at least one standard and at least one score for each candidate reason for at least one standard; and determining a candidate reason whose weighted score satisfies a second condition based on the obtained weighted score for each candidate reason.

[0109] According to embodiments of this disclosure, candidate causes include at least one of the following: block device not responding, network file system server not responding, and lock not being released for too long. The at least one criterion includes at least one of the following: Criterion 1: the historical association frequency between the type identifier of the system call included in the system call initiating the system call event and the candidate cause; Criterion 2: the call stack distance between the kernel function indicated by the identifier of the kernel function included in the entry into the scheduling wait point event and multiple blocking kernel functions related to the candidate cause, wherein the multiple blocking kernel functions include at least one of an input / output call, a network file system read initiation, a network file system write initiation, and entering and suspending a fast user space mutex wait queue; and Criterion 3: the duration of the current time in the record of the input / output request included in the input / output event, as included in the current time event.

[0110] In the embodiments of this disclosure, if none of the above rules match, the evidence may be incomplete, and a heuristic H1 algorithm can be used to output a conclusion. At least one score can be obtained for each candidate reason against at least one set criterion. Candidate reasons may include at least one of the following: block device not responding, NFS server not responding, and lock not released for too long. The following explanation uses three candidate reasons—block device not responding, NFS server not responding, and lock not released for too long—as an example, with these three reasons being reason 1, reason 2, and reason 3, respectively. At least one criterion may include: ① The historical association frequency between the type of the system call type identifier and the candidate reason (i.e., the historical association frequency between the last_syscall_id type and the blocking category) of the system call event that initiated the system call.

[0111] In embodiments of this disclosure, the number or proportion of times the type identifier of various system calls falls into multiple candidate reasons can be statistically analyzed in historical observations. After determining the type pointed to by the type identifier of the system call included in the system call event, the number or proportion of times that type falls into multiple candidate reasons can be obtained. For example, if the type identifier of the system call included in the system call event points to the type "read", and the proportions of the "read" type falling into the reasons of "block device not responding", "NFS server not responding", and "lock not released for too long" are F1, F2, and F3 respectively, then the score (S) of the three reasons is calculated. 11 S 12 and S 13 The numbers can be F1, F2, and F3 in sequence. Here, S 11 S 12 and S 13 The value range can be [0.0, 1.0]. For example, F1, F2, and F3 can be 0.6, 0.3, and 0.2, then S 11 S 12 and S 13 The values could be 0.6, 0.3, and 0.2 respectively.

[0112] ② The call stack distance between the kernel function indicated by the identifier of the kernel function included in the scheduling wait point event and multiple blocking kernel functions related to candidate reasons (i.e., the call stack distance between last_kernel_fn_id and the multiple blocking kernel functions related to candidate reasons). These multiple blocking kernel functions may include at least one of I / O schedule, nfs_initiate_read, nfs_initiate_write, and futex_wait_queue_me. Here, I / O schedule is a blocking kernel function related to the reason that the block device is not responding, nfs_initiate_read and nfs_initiate_write are blocking kernel functions related to the reason that the NFS server is not responding, and futex_wait_queue_me is a blocking kernel function related to the reason that a lock has not been released for too long. In embodiments of this disclosure, the call stack distance between the kernel function indicated by the identifier of the kernel function included in the scheduling wait point event and multiple blocking functions can be analyzed, and the closer the distance, the higher the score. If the distance between the indicated kernel function and one of the multiple blocking kernel functions is 0 (i.e., the indicated kernel function is the blocking kernel function), it can be considered that the distance between the indicated kernel function and the multiple blocking kernel functions is large. Therefore, the reason corresponding to the blocking kernel function with a distance of 0 will have the highest score, while the other reasons will have the lowest scores. For example, if the call stack distance between last_kernel_fn_id and the I / O schedule is 0 (i.e., last_kernel_fn_id is the I / O schedule), the call stack distance between last_kernel_fn_id and nfs_initiate_read, nfs_initiate_write, and futex_wait_queue_me can be considered large. Thus, the reason for block device unresponsiveness can have a high score, while the reasons for network file system NFS server unresponsiveness and locks not being released for too long can have very low scores. Here, S 21 S 22 and S 23 The value range can be [0.0, 1.0], and the score S for the three reasons. 21 S 22 and S 23 These could be, for example, 1.0, 0.01, and 0.01. Additionally, since there are two blocking kernel functions (nfs_initiate_read, nfs_initiate_write) indicating the reason for the NFS server's unresponsiveness, and the distance between the indicated kernel function and either of these two blocking kernel functions is 0, the score (S) for the reason for the NFS server's unresponsiveness... 22 The highest score is for the first reason, while the scores for the other reasons are the lowest. If the distance between the indicated kernel function and any of the multiple blocking kernel functions is not zero, then the score for the three reasons is S. 21 S 22 and S 23 It can be inversely proportional to the distance, where S 21 S 22 and S 23 The value range can be [0.0, 1.0]. Additionally, since there are two blocking kernel functions related to the reason for the NFS server not responding, the score (S) for the reason for the NFS server not responding... 22 It can be inversely proportional to the average distance to the two functions.

[0113] ③ The current event includes the duration from the current time to the timestamp in the record of the I / O request that initiated the I / O event, i.e., the retention time of the resource_key in the issue_map. If the initiating I / O event includes a record of a block I / O request, the retention time includes the retention time t1 of the block I / O request record (i.e., the retention time of dev:sector in the issue_map). If the initiating I / O event includes a record of an NFS I / O request, the retention time includes the retention time t2 of the NFS I / O request record (i.e., the retention time of fhandle:offset in the issue_map).

[0114] In embodiments of this disclosure, when both t1 and t2 exist, the score (i.e., S) for the reason why the lock has not been released for too long is determined. 33 The longer the duration of t1 or t2, the higher the corresponding reason score. For example, if t1 > t2, the higher the score for the reason why the block device did not respond (i.e., S) for t1. 31 The score for the reason why the NFS server was unresponsive was higher than that for t2 (i.e., S). 32 If only one of t1 and t2 exists, then the score (e.g., S) of the cause corresponding to the duration of one of t1 and t2 (e.g., t1). 31 The highest score (i.e., S) is the score for the reason why the lock was not released for too long. 33 Secondly, the score (e.g., S) for the cause corresponding to another duration (e.g., t2) in t1 and t2. 32 The lowest score is for the reason why the lock has not been released for too long when neither t1 nor t2 exists (i.e., S). 33 The highest score is t1, while the corresponding cause scores (i.e., S) for t1 and t2 are the highest. 31 and S 32 Both are low. Here, S... 21 S 22 and S 23 The value range can be [0.0, 1.0]. For example, the highest score can be 1.0 and the lowest score can be 0.01. The score can also be proportional to the duration.

[0115] In embodiments of this disclosure, a weighted score for each candidate cause against at least one standard can be obtained based on at least one weight set for at least one standard and at least one score for each candidate cause against at least one standard. Here, the second condition can be, for example, but not limited to, the highest weighted score; that is, a candidate cause whose weighted score satisfies the second condition can be the candidate cause with the highest weighted score. Weights can be set for each standard; for example, in the case of three standards, the weights of standard 1, standard 2, and standard 3 can be W1, W2, and W3 respectively. For example, W1, W2, and W3 can be set to 0.4, 0.4, and 0.2. The weighted score S1 of the reason for the block device not responding (reason 1) can be calculated as W1. S 11 +W2 S 21 +W3 S 31 Similarly, the weighted score S2 for the reason (reason 2) that the NFS server is unresponsive can be calculated as W1. S 12 +W2 S 22 +W3 S 32 Furthermore, the weighted score S3 for the reason why the lock has not been released for too long (reason 3) can be calculated as W1. S 13 +W2 S 23 +W3 S 33 Based on the weighted scores (e.g., S1, S2, and S3) of each candidate cause, the candidate cause with the highest weighted score is determined. For example, if the weighted score S1 for the cause of block device non-responsiveness (cause 1) is the highest, then the cause of the target process being in state D for an extended period is determined to be the cause of block device non-responsiveness (cause 1), and this cause can be output. Alternatively, in the case of only one criterion, the weighted score can be the score under that criterion itself.

[0116] In embodiments of this disclosure, when multiple rules do not match, the conclusion output by heuristic H1 may have a low confidence level C4, and the value of confidence level C4 may be less than the values of confidence levels C1, C2 and C3.

[0117] It should be understood that the range of values here (e.g., [0.0, 1.0]) is set to unify the order of magnitude of the scores under each criterion, and this disclosure is not limited thereto. Furthermore, the specific score values under each criterion are merely examples, and this disclosure is not limited thereto.

[0118] In embodiments of this disclosure, a location report can be generated, which includes the PID (target PID or first PID) of the blocked process, the call-wait chain (i.e., the constructed event sequence), and the reason for being in the D state for an extended period. Additionally, the conclusions derived from the heuristic H1 output (i.e., the reason for being in the D state for an extended period) can be marked with "low confidence" in the location report, for example, giving a low confidence value of C4.

[0119] According to the process state analysis method disclosed herein, a process event sequence is constructed by acquiring process summary information and I / O request information. Based on the process event sequence matching rules, the reasons for a process remaining in the D state for an extended period are analyzed. This method can automatically construct blocking chains and automatically locate the cause of blocking when a process remains in the D state for an extended period, thus automating D state location and reducing manual troubleshooting costs. Furthermore, by associating system calls, initiating I / O requests, entering scheduling wait points, and current events to reconstruct the blocking chain, the analysis of blocking causes is facilitated. Process-level summary information is continuously maintained in kernel mode with low overhead for rapid location. Moreover, by employing a strategy derived from summary information and events (or problems), configurable probe points are supported to ensure controllability, and event-driven and / or pre-defined periodic recording is supported to ensure low overhead.

[0120] Figure 5 A schematic diagram of a process state analysis apparatus according to an embodiment of the present disclosure is shown.

[0121] Reference Figure 5 The process state analysis device 500 includes an acquisition module 510, a construction module 520, a matching module 530, and an analysis module 540. The acquisition module 510 can determine that the target process has been in an uninterruptible waiting state for a long time if the sum of the waiting times of the target process indicated by the first process identifier in the first storage table exceeds a first time threshold, or if the duration of the target process in an uninterruptible waiting state exceeds a second time threshold. The acquisition module 510 then acquires summary information of the target process from the first storage table and input / output request information of the target process from the second storage table. The construction module 520 constructs an event sequence of the target process according to the chronological order of events occurring in the process, based on the acquired summary information and input / output request information. The matching module 530 matches the constructed event sequence against multiple pre-set rules. The analysis module 540 analyzes the reasons why the target process has been in an uninterruptible waiting state for a long time based on the matching results of the multiple rules.

[0122] According to embodiments of this disclosure, the first storage table and the second storage table are extended Berkeley package filter key-value tables, and the information in the first storage table and the second storage table is obtained through the derivation of a ring buffer.

[0123] According to embodiments of this disclosure, a first storage table stores summary information of a process indicated by a process identifier, wherein the summary information of the process indicated by the process identifier includes at least one of the time of the most recent system call, the type identifier of the most recent system call, the identifier of the most recently executed kernel function, the hash value of the target resource of the most recent operation, and the sum of wait times. A second storage table stores records of incomplete input / output requests including process identifiers and timestamps, wherein the timestamp records the time when the process indicated by the process identifier initiated the input / output request.

[0124] According to embodiments of this disclosure, the construction module 520 can construct an event sequence of the target process based on the acquired summary information and input / output request information, in the order of initiating system call events, initiating input / output events, entering scheduling wait point events, and current time events.

[0125] According to embodiments of this disclosure, the time of the most recent system call and the type identifier of the most recent system call included in the summary information are respectively used as the time of the system call and the type identifier of the system call included in the system call initiation event. The identifier of the most recently executed kernel function included in the summary information is used as the identifier of the kernel function included in the entry scheduling wait point event. The time point at which the acquisition of summary information and input / output request information begins is used as the current time included in the current time event. Furthermore, when the input / output request information includes a record of the input / output request of the target process, the input / output event initiation includes the record of the input / output request of the target process as the record of the input / output request, and the current time event also includes the input / output request completion missing.

[0126] According to an embodiment of this disclosure, the analysis module 540 may determine the cause as the cause pointed to by a rule in response to the matching result being a match of one of a plurality of rules; and determine the cause as the cause pointed to by a rule whose confidence level satisfies a first condition in response to the matching result being a match of at least two of a plurality of rules, wherein each of the plurality of rules is set with a confidence level.

[0127] According to embodiments of this disclosure, multiple rules include at least one of the following: Rule 1: If the type identifier of the system call included in the system call initiating the system call event points to one of read, write, and file synchronization, and the input / output event initiating the input / output event includes a record of block input / output request, and the duration from the current time included in the current time event to the timestamp in the record of block input / output request included in the input / output event exceeds a third time threshold, it points to the reason why the block device did not respond. Rule 2: If the type identifier of the system call included in the system call initiating the system call event points to read or write, and the input / output event initiating the input / output event includes a record of network file system input / output request, and the current time event includes a record of missing network file system input / output request completion, it points to the network file system input / output request completion. The reasons for the system server not responding, and rule 3: the system call type identifier of the system call initiating the system call event points to the fast user space mutex, and the kernel function identifier of the entry into the scheduling wait point event points to the fast user space mutex wait queue and is suspended, the reason why the lock has not been released for too long, where, according to the target process's input / output request record, including the target process's block input / output request record and / or the target process's network file system input / output request record, the input / output event initiating the input / output event includes the block input / output request record and / or the network file system input / output request record, and the current event includes the block input / output request completion missing and / or the network file system input / output request completion missing.

[0128] According to an embodiment of this disclosure, the analysis module 540 may further respond to a rule where the matching result is no match by obtaining at least one score for each candidate reason in the candidate reasons for at least one set standard; obtaining a weighted score for each candidate reason for at least one standard based on at least one weight set for at least one standard and at least one score for each candidate reason for at least one standard; and determining a candidate reason whose weighted score satisfies the second condition based on the obtained weighted score for each candidate reason.

[0129] According to embodiments of this disclosure, candidate causes include at least one of the following: block device not responding, network file system server not responding, and lock not being released for too long. The at least one criterion includes at least one of the following: Criterion 1: the historical association frequency between the type identifier of the system call included in the system call initiating the system call event and the candidate cause; Criterion 2: the call stack distance between the kernel function indicated by the identifier of the kernel function included in the entry into the scheduling wait point event and multiple blocking kernel functions related to the candidate cause, wherein the multiple blocking kernel functions include at least one of an input / output call, a network file system read initiation, a network file system write initiation, and entering and suspending a fast user space mutex wait queue; and Criterion 3: the duration of the current time in the record of the input / output request included in the input / output event, as included in the current time event.

[0130] According to embodiments of this disclosure, the process state analysis apparatus 500 may further include: an inspection module (not shown), which may periodically or in an event-driven manner inspect the sum of waiting times of the process indicated by the process identifier in the first storage table, and the duration of the process in an uninterruptible waiting state.

[0131] According to embodiments of this disclosure, the process state analysis apparatus 500 may further include: an update module (not shown), which can utilize an extended Berkeley packet filter acquisition module to acquire kernel monitoring data by placing multiple probe points in the kernel; and use the acquired kernel monitoring data to update a first storage table and a second storage table, wherein the acquired kernel monitoring data is recorded according to an event-driven and / or predetermined period.

[0132] According to embodiments of this disclosure, the update module can update the summary information of the process indicated by the process identifier in the first storage table, including the time and type identifier of the most recent system call, using the acquired system call time and system call type identifier when the process indicated by the process identifier initiates a system call; when the process indicated by the process identifier initiates an input / output request, the update module records the input / output request information, including resource information, process identifier, and timestamp, in the second storage table as an input / output request record, wherein the input / output request includes a block input / output request or a network file system input / output request; and when the input / output is completed, the update module deletes the input / output request record from the second storage table.

[0133] Figure 6 A block diagram of an electronic device 600 according to an embodiment of the present disclosure is shown.

[0134] Reference Figure 6 The electronic device 600 includes at least one memory 601 and at least one processor 602. The at least one memory 601 stores computer-executable instructions. When the computer-executable instructions are executed by the at least one processor 602, the at least one processor 602 performs the process state analysis method as described above.

[0135] As an example, electronic device 600 may be a PC, tablet, personal digital assistant, smartphone, or other device capable of executing the aforementioned instructions. Here, electronic device 600 is not necessarily a single electronic device, but may be a collection of any devices or circuits capable of executing the aforementioned instructions (or instruction sets) individually or in combination. Electronic device 600 may also be part of an integrated control system or system manager, or may be configured to interconnect with a portable electronic device locally or remotely (e.g., via wireless transmission) through an interface.

[0136] In electronic device 600, processor 602 may include a central processing unit (CPU), a graphics processing unit (GPU), a programmable logic device, a dedicated processor system, a microcontroller, or a microprocessor. By way of example and not limitation, processor may also include analog processors, digital processors, microprocessors, multi-core processors, processor arrays, network processors, etc.

[0137] The processor 602 can execute instructions or code stored in the memory 601, which can also store data. Instructions and data can also be sent and received via a network through a network interface device, which can employ any known transmission protocol.

[0138] The memory 601 may be integrated with the processor 602, for example, by placing RAM or flash memory within an integrated circuit microprocessor. Alternatively, the memory 601 may include a separate device, such as an external disk drive, a storage array, or other storage device that can be used by any database system. The memory 601 and the processor 602 may be operatively coupled, or may communicate with each other, for example, via I / O ports, network connections, etc., enabling the processor 602 to read files stored in the memory.

[0139] In addition, the electronic device 600 may also include a video display (such as a liquid crystal display) and a user interaction interface (such as a keyboard, mouse, touch input device, etc.). All components of the electronic device 600 can be interconnected via a bus and / or network.

[0140] According to exemplary embodiments of the present disclosure, a computer-readable storage medium storing a computer program thereon, which, when executed, implements the process state analysis method according to the present disclosure. Examples of computer-readable storage media include: read-only memory (ROM), random access programmable read-only memory (PROM), electrically erasable programmable read-only memory (EEPROM), random access memory (RAM), dynamic random access memory (DRAM), static random access memory (SRAM), flash memory, non-volatile memory, CD-ROM, CD-R, CD+R, CD-RW, CD+RW, DVD-ROM, DVD-R, DVD+R, DVD-RW, DVD+RW, DVD-RAM, BD-ROM, BD-R, BD-R LTH, BD-RE, Blu-ray or optical disc storage, hard disk drive (HDD), solid-state drive (SSD), card storage (such as multimedia cards, secure digital (SD) cards, or ultra-fast digital (XD) cards), magnetic tape, floppy disk, magneto-optical data storage device, optical data storage device, hard disk, solid-state drive, and any other device configured to store a computer program and any associated data, data files, and data structures in a non-transitory manner and to provide the computer program and any associated data, data files, and data structures to a processor or computer so that the processor or computer can execute the computer program. The computer program in the aforementioned computer-readable storage medium can run in an environment deployed in computer devices such as clients, hosts, agent devices, servers, etc. Furthermore, in one example, the computer program and any associated data, data files, and data structures are distributed across a networked computer system, such that the computer program and any associated data, data files, and data structures are stored, accessed, and executed in a distributed manner through one or more processors or computers.

[0141] According to exemplary embodiments of the present disclosure, a computer program product may also be provided, wherein instructions in the computer program product are executed by at least one processor in an electronic device to perform the process state analysis method as described above.

[0142] The process state analysis methods, devices, electronic devices, storage media, and products described above construct process event sequences by acquiring process summary information and I / O request information. Based on process event sequence matching rules, they analyze the reasons why a process remains in the D state for an extended period. They can automatically construct blocking chains and automatically locate the cause of blocking when a process remains in the D state for a long time, thus automating D state location and reducing manual troubleshooting costs. Furthermore, by associating system calls, initiating I / O requests, entering scheduling wait points, and current events to reconstruct the blocking chain, they facilitate analysis of blocking causes. They continuously maintain process-level summary information in kernel mode with low overhead for rapid location. Moreover, by employing strategies derived from summary information and events (or problems), they support configurable probe points to ensure controllability and support event-driven and / or pre-defined periodic recording to ensure low overhead.

[0143] Other embodiments of this disclosure will readily occur to those skilled in the art upon consideration of the specification and practice of the invention disclosed herein. This disclosure is intended to cover any variations, uses, or adaptations of this disclosure that follow the general principles of this disclosure and include common knowledge or customary techniques in the art not disclosed herein. The specification and examples are to be considered exemplary only, and the true scope and spirit of this disclosure are indicated by the following claims.

[0144] It should be understood that this disclosure is not limited to the precise structures described above and shown in the accompanying drawings, and various modifications and changes can be made without departing from its scope. The scope of this disclosure is limited only by the appended claims.< / pid> < / pid> < / pid>

Claims

1. A process state analysis method, characterized in that, The process state analysis method includes: If the cumulative waiting time of the target process indicated by the first process identifier in the first storage table exceeds the first time threshold, or the duration of the target process in the uninterruptible waiting state exceeds the second time threshold, it is determined that the target process has been in the uninterruptible waiting state for a long time, and the summary information of the target process is obtained from the first storage table and the input / output request information of the target process is obtained from the second storage table. Based on the acquired summary information and the input / output request information, construct the event sequence of the target process according to the chronological order in which the events occur within the process; Based on the constructed event sequence, multiple pre-set rules are matched; and Based on the matching results of the multiple rules, the reasons why the target process is in an uninterruptible waiting state for a long time are analyzed.

2. The process state analysis method as described in claim 1, characterized in that, The first and second storage tables are extended Berkeley package filter key-value tables, and the information in the first and second storage tables is obtained through the export of the ring buffer.

3. The process state analysis method as described in claim 1, characterized in that, The first storage table stores summary information of the process indicated by the process identifier, wherein the summary information of the process indicated by the process identifier includes at least one of the time of the most recent system call, the type identifier of the most recent system call, the identifier of the most recently executed kernel function, the hash value of the target resource of the most recent operation, and the sum of wait times. The second storage table stores records of incomplete input / output requests including process identifiers and timestamps, wherein the timestamp records the time when the process indicated by the process identifier initiated the input / output request.

4. The process state analysis method as described in claim 1, characterized in that, Based on the acquired summary information and the input / output request information, an event sequence of the target process is constructed according to the chronological order of events occurring within the process, including: Based on the acquired summary information and the input / output request information, the event sequence of the target process is constructed in the order of initiating a system call event, initiating an input / output event, entering a scheduling wait point event, and the current moment event.

5. The process state analysis method as described in claim 4, characterized in that, The time of the most recent system call and the type identifier of the most recent system call included in the summary information are respectively used as the time of the system call and the type identifier of the system call included in the system call event. The identifier of the most recently executed kernel function included in the summary information is used as the identifier of the kernel function included in the entry scheduling wait point event, and the time point at which the acquisition of the summary information and the input / output request information begins is used as the current time point included in the current time event. When the input / output request information includes a record of the input / output request of the target process, the input / output event initiation includes the record of the input / output request of the target process as the input / output request record, and the event at the current moment also includes the input / output request completion missing.

6. The process state analysis method as described in claim 5, characterized in that, Based on the matching results of the multiple rules, the reasons why the target process remains in an uninterruptible waiting state for a long time are analyzed, including: In response to a matching result that matches one of the multiple rules, the reason is determined to be the reason pointed to by the rule; and In response to a matching result that matches at least two of the multiple rules, the reason is determined to be the reason pointed to by the rule whose confidence level satisfies the first condition, wherein each of the multiple rules is set with a confidence level.

7. The process state analysis method as described in claim 6, characterized in that, The multiple rules include at least one of the following: Rule 1: If the system call type identifier of the system call initiating the system call event points to one of read, write, or file synchronization, and the input / output event initiating the input / output event includes a record of a block input / output request, and the time elapsed between the current time in the current time event and the timestamp in the record of the block input / output request initiating the input / output event exceeds a third time threshold, then the reason for the block device's failure to respond is indicated. Rule 2: System call events include system call type identifiers pointing to read or write; input / output events include records of network file system input / output requests; and current events include network file system input / output request completion information, indicating the reason for the network file system server's lack of response in the event of missing completion. Rule 3: If the system call type identifier of the system call initiating the system call event points to a fast user-space mutex, and the identifier of the kernel function in the entry into the scheduling wait point event points to a fast user-space mutex waiting queue and is suspended, the reason why the lock has not been released for too long is as follows: Specifically, the input / output request records of the target process include the records of block input / output requests of the target process and / or the records of network file system input / output requests of the target process; the input / output events include the records of block input / output requests and / or the records of network file system input / output requests; and the events at the current moment include the missing completion of block input / output requests and / or the missing completion of network file system input / output requests.

8. The process state analysis method as described in claim 6, characterized in that, Based on the matching results of the multiple rules, the analysis of the reasons why the target process is in an uninterruptible waiting state for a long time also includes: In response to a rule that results in no match, obtain at least one score for each candidate reason in the candidate reasons for at least one set criterion; Based on at least one weight set for the at least one criterion and at least one score for each candidate reason for the at least one criterion, obtain a weighted score for each candidate reason for the at least one criterion; Based on the weighted score of each candidate cause obtained, the cause is determined as a candidate cause whose weighted score satisfies the second condition.

9. The process state analysis method as described in claim 8, characterized in that, The candidate reasons include at least one of the following: block device not responding, network file system server not responding, and lock not being released for too long. The at least one criterion includes at least one of the following: Standard 1: The historical association frequency between the type of the system call type identifier pointing to the system call initiating the system call event and the candidate reasons. Standard 2: The call stack distance between the kernel function identifier indicating the kernel function entering the scheduling wait point event and a plurality of blocking kernel functions for the candidate reason, wherein the plurality of blocking kernel functions includes at least one of an input / output call, a network file system read, a network file system write, and entering and suspending a fast user space mutex wait queue. Standard 3: The current time event includes the duration between the current time and the timestamp in the record that initiated the input / output request.

10. The process state analysis method as described in claim 1, characterized in that, Also includes: Periodically or in an event-driven manner, check the sum of the waiting times of the process indicated by the process identifier in the first storage table, as well as the duration of the process in an uninterruptible waiting state.

11. The process state analysis method as described in claim 3, characterized in that, Also includes: The extended Berkeley packet filter acquisition module is used to collect kernel monitoring data by placing multiple probe points in the kernel; Update the first and second storage tables using the collected kernel monitoring data. The collected kernel monitoring data is recorded according to event-driven and / or predetermined cycles.

12. The process state analysis method as described in claim 11, characterized in that, Update the first and second storage tables using the collected kernel monitoring data, including: When the process indicated by the process identifier initiates a system call, the time and type identifier of the most recent system call, which are included in the summary information of the process indicated by the process identifier in the first storage table, are updated with the time and type identifier of the most recent system call. When a process, indicated by a process identifier, initiates an input / output request, information including resource information, the process identifier, and a timestamp is recorded in a second storage table as an input / output request record. This input / output request may include block input / output requests or network file system input / output requests. When the input / output is complete, delete the record of the input / output request from the second storage table.

13. A process state analysis device, characterized in that, The process state analysis device includes: The acquisition module is configured to: determine that the target process has been in an uninterruptible waiting state for a long time when the sum of the waiting times of the target process indicated by the first process identifier in the first storage table exceeds a first time threshold, or the duration of the target process in an uninterruptible waiting state exceeds a second time threshold; and acquire the summary information of the target process from the first storage table and the input / output request information of the target process from the second storage table. The construction module is configured to construct an event sequence of the target process according to the time order in which the events occur in the process, based on the acquired summary information and the input / output request information. The matching module is configured to match multiple pre-defined rules based on the constructed event sequence; and The analysis module is configured to analyze the reasons why the target process is in an uninterruptible waiting state for a long time based on the matching results of the multiple rules.

14. An electronic device, characterized in that, include: At least one processor; At least one memory that stores computer-executable instructions. Wherein, when the computer-executable instructions are executed by the at least one processor, the at least one processor causes the at least one processor to perform the process state analysis method as described in any one of claims 1 to 12.

15. A computer-readable storage medium, characterized in that, It stores a computer program that, when executed, implements the process state analysis method according to any one of claims 1 to 12.

16. A computer program product, characterized in that, The instructions in the computer program product are executed by at least one processor in an electronic device to perform the process state analysis method as described in any one of claims 1 to 12.