Proactive troubleshooting method for provisioning workflow for efficient cloud operations
By using Local Outlier Factor (LOF) to identify and classify supply steps, the problem of automated detection of stuck states in cloud supply is solved, achieving efficient automated processing and improved customer satisfaction.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- INTERNATIONAL BUSINESS MACHINE CORPORATION
- Filing Date
- 2021-06-21
- Publication Date
- 2026-06-19
AI Technical Summary
In the process of cloud provisioning virtual machines or bare metal servers, existing technologies struggle to effectively detect and automate stalled provisioning steps, leading to overdue detection and repair becoming a performance bottleneck. Furthermore, existing anomaly detection methods lack intervention guidance.
By configuring systems, methods, and computer program products, outliers in non-intervention data are identified using the Local Outlier Factor (LOF), iterative grouping and response mapping are performed, and the provision of intervention response strategies is automated, categorizing the supply steps as immediate attention, delayed attention, or no attention.
It improves the efficiency of automated inspection in the supply process, reduces labor costs and average inspection time, and improves average resolution time and customer satisfaction.
Smart Images

Figure CN116249979B_ABST
Abstract
Description
Background Technology
[0001] Provisioning virtual machines or bare-metal servers in the cloud involves multiple provisioning steps. During these steps, unstructured, timestamped logs are generated as workflows flow through the provisioning steps and reflect the provisioning status. Of the tens of thousands of Virtual Server Infrastructure (VSI) provisionings that occur daily, hundreds may end up in a stuck state during a given provisioning step. Proactive detection and remediation of expired provisioning typically involves manually identifying errors that cause the provisioning to be stuck, which can be a significant performance bottleneck. Furthermore, not all expired transactions require manual attention, as they are capable of self-resolvement. Current anomaly detection methods do not provide guidance on when to intervene. Summary of the Invention
[0002] The disclosed embodiments include systems, methods, and computer program products configured to determine intervention response categories for a supply workflow. In embodiments, the systems, methods, and computer program products are configured to: determine supply characteristics of a supply step; perform outlier detection to identify and remove outliers from non-intervention data to generate a non-intervention normal dataset; perform iterative grouping on the non-intervention normal dataset to determine valid variables in the supply characteristics; and perform a supply response mapping using the results of the iterative grouping to classify the supply into response categories, the results of the iterative grouping including the importance and presence of errors in the non-intervention normal data and the partial intervention data.
[0003] Alternatively, in some embodiments, the response category is one of immediate attention, delayed attention, and no attention.
[0004] Optionally, in some embodiments, the systems, methods, and computer program products use the Local Outlier Factor (LOF) to identify outliers in uninterrupted data.
[0005] Optionally, in some embodiments, the system, method, and computer program product are configured to: calculate the step duration of the supply step for all the supplies using an event log associated with the supply step; identify supply parameters of the supply step; and identify error messages in the event log that occur as part of the supply in the supply step.
[0006] Optionally, in some embodiments, the system, method, and computer program product are configured to calculate an F-statistic based on the supply parameters, the error message, and the step duration; determine effective variables based on the F-statistic; and cluster the uninterrupted normal dataset into multiple clusters based on combinations of the values of the effective variables.
[0007] Optionally, in some embodiments, the system, method, and computer program product are configured to compare the percentage of data points exceeding n*interquartile range with a threshold for each cluster to determine whether the cluster is a candidate for further segmentation, where n is variable.
[0008] Optionally, in some embodiments, the system, method, and computer program product are configured to calculate a duration threshold for each cluster by computing an n*interquartile range for each cluster, where n is variable.
[0009] Optionally, in some embodiments, the system, method, and computer program product are configured to classify the response category as delayed concern when errors exist in the partial intervention data and the non-intervention normal dataset as a result of iterative grouping, and the errors are valid variables.
[0010] Optionally, in some embodiments, the system, method, and computer program product are configured to calculate an alarm time for errors, wherein the alarm time for errors is the maximum of all duration thresholds across all clusters having non-zero and non-empty duration thresholds.
[0011] Optionally, in some embodiments, the system, method, and computer program product are configured to classify the response category as immediate concern when an error exists in the partial intervention data and no error exists in the non-intervention normal dataset.
[0012] Optionally, in some embodiments, the system, method, and computer program product are configured to classify the response category as not concerned when an error exists in the non-interventional normal dataset and no error exists in the partial interventional data.
[0013] Other embodiments and advantages of the disclosed embodiments are further described in the detailed description. Attached Figure Description
[0014] To gain a more complete understanding of this disclosure, reference is now made to the following brief description taken in conjunction with the accompanying drawings and specific embodiments, wherein similar reference numerals denote similar parts.
[0015] Figure 1 This is a schematic diagram illustrating a system for determining the intervention response category of a supply workflow according to an embodiment of the present disclosure.
[0016] Figure 2 This is a flowchart illustrating a method for determining the intervention response category of a supply workflow according to an embodiment of the present disclosure.
[0017] Figure 3 This is a diagram illustrating the supply features according to embodiments of the present disclosure.
[0018] Figure 4 It is a graph depicting the skewness relative to the number of nearest neighbors (k) according to embodiments of the present disclosure.
[0019] Figure 5 This is a diagram illustrating a subset of the skewness set according to embodiments of the present disclosure.
[0020] Figure 6 This is a diagram illustrating a first local maximum / minimum value according to an embodiment of the present disclosure.
[0021] Figure 7 This is a flowchart illustrating a method for performing iterative grouping according to an embodiment of the present disclosure.
[0022] Figure 8 This is a graph illustrating alarm time thresholds for valid variables according to embodiments of the present disclosure.
[0023] Figure 9 This is a graph illustrating the results of training and testing data from disjoint time periods according to embodiments of the present disclosure.
[0024] Figure 10 This is a block diagram illustrating the hardware architecture of a system that can implement aspects of the illustrative embodiments according to embodiments of the present disclosure.
[0025] The accompanying drawings are merely illustrative and are not intended to assert or imply any limitation regarding the environment, architecture, design, or process in which different embodiments may be implemented. Detailed Implementation
[0026] First, it should be understood that although illustrative embodiments of one or more examples are provided below, the disclosed systems, computer program products, and / or methods can be implemented using any number of techniques (whether currently known or existing). This disclosure should in no way be limited to the illustrative embodiments, drawings, and techniques shown below, but includes the exemplary designs and implementations illustrated and described herein, and can be modified within the full scope of the appended claims and their equivalents.
[0027] As used in the written disclosure and claims, the terms “comprising” and “including” are used in an open-ended manner and should therefore be construed as meaning “including, but not limited to”. Unless otherwise indicated, as used throughout this document, “or” need not be mutually exclusive, and the singular forms “a,” “an,” and “the” are intended to also include the plural forms unless the context clearly indicates otherwise.
[0028] As referenced herein, a module or unit may include one or more hardware or electronic components, such as circuitry, processors, and memory, which may be specifically configured to perform a particular function. Memory may be volatile or non-volatile memory storing data such as, but not limited to, computer-executable instructions, machine code, and other different forms of data. A module or unit may be configured to use data to execute one or more instructions to perform one or more tasks. In some cases, a module may also refer to a specific set of functions, software instructions, or circuitry configured to perform a particular task. For example, a module may include software components such as, but not limited to, data access objects, service components, user interface components, and application programming interface (API) components; hardware components such as circuitry, processors, and memory; and / or combinations thereof. As referenced herein, computer-executable instructions may be of any form, including but not limited to machine code, assembly code, and high-level programming code written in any programming language.
[0029] Figure 1 This is a schematic diagram illustrating a system 100 for determining the intervention response category of a provisioning workflow according to embodiments of the present disclosure. System 100 includes a provisioning system 110 that communicates with one or more clients 102 to provision one or more Virtual Server Instances (VSIs) 104. Alternatively, the disclosed embodiments may provision any computing resource from a variety of computing resources, including but not limited to bare metal servers or VSIs. For example, client 102 may request provisioning system 110 to provision a VSI or a bare metal server in the cloud. In embodiments, provisioning system 110 may provision VSIs or bare metal servers in the cloud thousands of times per day. Each provision may involve hundreds of steps, during which unstructured, timestamped logs are generated as the workflow progresses through the provisioning steps and reflects the provisioning status. During the provisioning process, a provision may get stuck (i.e., unable to proceed or complete) during one or more provisioning steps. Log data 112 may include different types of error messages indicating the potential cause of the stuck provision, some of which may be fatal errors.
[0030] Normal supply behavior occurs when a supply is completed without stalling at any stage and therefore requires no intervention of any kind. In practice, there is a lack of labels associated with supply to characterize normal behavior. Given the sheer volume of supply (e.g., 50,000 supplies per day), labeling such supplies is not an easy task. On the other hand, existing anomaly detection methods, such as those based on event log frequency counting, do not provide any guidance on when or whether to intervene in the event of an error. Supply may take longer than normal to complete; however, not all overdue transactions require manual attention, as they are capable of self-resolution.
[0031] To improve upon existing methods, this disclosure uses historical knowledge of supply where intervention occurred and did not occur to categorize supply into response categories based on supply errors, which indicate whether and when supply was intervened. The disclosed embodiments deliver high business value by reducing costs, manpower, and average detection time, which in turn improves average resolution time and customer satisfaction.
[0032] Specifically, refer to Figure 1Supply monitoring system 120 obtains log data 112. The term log data can include data provided in various ways (e.g., from a database or from an event log). Using historical knowledge of non-intervention data and intervention data, supply monitoring system 120 provides response categories that indicate whether and when to intervene in the supply. In an embodiment, supply monitoring system 120 includes an outlier detection module 130, an iterative grouping module 132, and a response mapping module 134. Outlier detection module 130 receives non-intervention data 122 as input. Non-intervention data 122 is training data where no human or system intervention is recorded. Supplies may not have recorded interventions for several reasons, including that they were not stuck in any supply step (considered normal behavior), or that they were stuck supplies but lacked resources for timely attention (considered outliers), or that the underlying problem was being addressed in a related but different supply (also considered outliers). As further described below, outlier detection module 130 divides non-intervention supply data 122 into two groups—non-intervention normal behavior supplies and outlier supplies. In this embodiment, the outlier detection module 130 uses a Local Outlier Factor (LOF) conditioned on skewness to generate a non-interventional outlier dataset 124 and a non-interventional normal dataset 126. LOF is based on the concept of local density, where local is given by nearest neighbors. The non-interventional normal dataset 126 is non-interventional data 122 with outliers removed. The iterative grouping module 132 takes the non-interventional normal dataset 126 as input. The iterative grouping module 132 performs grouping of the non-interventional normal dataset 126 to determine error importance and compute duration thresholds for different combinations of values of the effective variables of the supply. As described herein, the terms 'cluster' and 'group', 'cluster' and 'group', and 'sub-cluster' and 'sub-group' are used interchangeably. The response mapping module 134 uses the error importance determined in the iterative grouping module 132, the duration threshold results of the optional iterative grouping, and the presence of errors in the non-interventional normal data 126 and the partially intervened data 128 to classify the supply into response category 136. Partial intervention data 128 is training data for knowledge of human or system intervention. The term "partial" is used to indicate that partial intervention data 128 may not include all supplies that benefit from the intervention. For example, due to the large volume of supplies occurring, not all stuck supplies can be resolved by a limited number of cloud operators. If full intervention data is available, the method will work as expected with full intervention data. In one embodiment, response category 136 is one of immediate attention, delayed attention (i.e., giving the stuck supply time to resolve itself), or no attention.
[0033] Figure 2 This is a flowchart illustrating a method 200 for determining an intervention response category in a supply workflow according to an embodiment of the present disclosure. Method 200 may be provided by, but is not limited to, methods such as, but not limited to, […]. Figure 1 The supply monitoring system 120 is used to perform the monitoring. Supply is typically broken down into individual steps, such as first locating available hosts, then configuring the network, and then configuring the storage. The described method can be applied one step at a time, or a group of steps at a time. At its maximum, this group of steps can be the entire supply.
[0034] For each supply occurring within a given time window, method 200, in step 202, obtains from, such as Figure 1 The extraction of log data 112 from a set of event logs begins with specific characteristics. The term "log data" is used to refer to data from various sources, potentially including data stored in a database. For example... Figure 3 As shown, an embodiment of the supply characteristics includes supply parameter 306. Additional supply characteristics specific to the supply step include the calculated step duration (st...). i )304 and templated error 308 associated with the supply step. In an embodiment, method 200 identifies the timestamp of a message or event in the event log that indicates the start of the supply step. and timestamps of messages or events indicating the end of a supply step To have a unique identifier PID in the supply step i Duration of each supply determination step (st i )304. Method 200 calculates the step duration 304 as As part of step 202, method 200 determines values for different supply parameters 306 associated with the supply. In an embodiment, supply parameter 306 is user-specified. Non-limiting examples of supply parameter 306 may include, for example... Figure 3 The chart shows the operating system (OS) type, disk capacity, number of central processing units (CPUs), and random access memory (RAM) size. Furthermore, as part of step 202, method 200 also targets PIDs with unique identifiers. i The supply is identified by analyzing error messages in the event log that occur during the supply step. The event log may contain numerous error messages for the supply in the supply step. Furthermore, normal supply that does not require intervention may also have associated error messages in the event log. For example, an error message might indicate that a connection failed to open in the first few attempts but does not cause the supply to stop. Error messages may have certain supply-specific details, such as Internet Protocol (IP) addresses, device ID information, or other information specific to a particular supply. To design an analysis solution for error analysis, method 200 aggregates errors by removing supply-specific information from error messages to create templated errors 308. In an embodiment, method 200 aggregates supply data points (d... i ) is characterized as di :{PID i provisionParameters i ,templatedError i encodedError i st i Templated errors are further mapped to unique numbers. This process is called encoding and results in encoded error values. Encoded error values can be numbers such as 0, 1, 2, etc., where each number maps to a unique templated error. Specific cases where a supply does not have an associated error in a given step are also given a unique encoded error value corresponding to "no error". In the embodiments, each supply (in any given step) is associated with either no error or 1 error. Note that an "error" can be defined as including a single error message or a set of error messages. Each "error" corresponds to a unique combination of existing (templated) error messages. Other non-numeric supply parameters are converted to corresponding encoded values in a similar manner. For example, if the supply parameter 'OS_type' can take a set of values {'ubuntu', 'windows', 'centos'}, then the encoded_os_type value would be {0, 1, 2}, where 0 maps to 'ubuntu', 1 maps to 'windows', and so on. It can be noted that such mapping does not impose any ordering on the variables.
[0035] Method 200 performs outlier detection in step 204 to identify and remove outliers from the non-interventional data, resulting in two disjoint sets: a non-interventional normal dataset and a non-interventional outlier dataset. Method 200 improves upon the Local Outlier Factor (LOF) technique used for outlier detection. LOF is based on the concept of local density. By comparing the local density of a point with the local densities of its k nearest neighbors, Method 200 can identify regions with similar / high / low density compared to their neighbors. Distance metrics such as Manhattan distance can be used to measure the distance between two points. In an embodiment, for a given supply step, LOF is calculated on the step duration corresponding to the supply in the non-interventional set, conditioned on skewness. LOF splits the data into two sets for a predetermined value of k, where k is the number of nearest neighbors. In Method 200, different values of k are considered. In one embodiment, k can be chosen as a % of the dataset. The non-interventional dataset can be represented as g = {d i}, where d i It is a supply data point, and the duration of its supply step is determined by st i 304 is given. Method 200 can enumerate the value of k as K = {(j / 100)*N}, where j is a percentage of the dataset size and N is the number of data points in g. Method 200 is for kj The iteration steps for ∈K are as follows: (1) For each data point d i Calculate LOF, given g and k j (2) Calculate the outlier dataset, where the distance metric is Manhattan; Where t2 is the preset threshold for step_duration, and t is the threshold representing, for example, the near-Gaussian distribution between data points; (3) Calculate the trimmed normal dataset. And (4) calculate the skewness of the adjusted normal data. Figure 4 Example diagram 400 shows skewness plotted for various values of k (from 0 to 2500) according to embodiments of the present disclosure. The calculated skewness values s j Used to create a set S. Given a set of skewnesses S = {s} j} and nearest neighbor value K = {k j Method 200 determines the subset {s} from S. m}, such that S1>s m >S2, such as Figure 5 As shown in the diagram. S1 and S2 are skewness thresholds used to capture the self-solving capability of the supply. Self-solving capability is the ability of the supply to move out of a stuck state without any human intervention. Figure 6 As shown, method 200 uses the skewness subset {s} m The first local maximum / minimum is calculated by solving the second-order difference on the surface. In an embodiment, method 200 uses an approximation algorithm to determine the optimal k. The approximation of the optimal k is given by k. ml Given, where l is the index of the first local maximum / minimum, i.e., {s m The maximum / minimum values of the second derivatives of the skewness values in the set, such as... Figure 6 As shown. Method 200 works by targeting k = k ml Running LOF will divide the non-intervention data into a non-intervention normal dataset and a non-intervention outlier dataset.
[0036] Method 200 performs iterative grouping in step 206 using an unsupervised normal dataset. Unsupervised learning methods search for patterns in the data and cluster the data using features. These learning methods do not require any labels and therefore do not require user assistance for clustering purposes. In an embodiment, method 200 in step 206 is based on the following... Figure 7 Method 700, shown in the flowchart, performs iterative grouping using a non-interventional normal dataset.
[0037] Following the iterative grouping process, method 200 performs a supply response mapping in step 208 using the importance of errors determined by the iterative grouping, the optional duration threshold results of the iterative grouping, and the presence of errors in the non-interventional normal data and the partially interventional data, to classify the supply into response categories. The main idea behind the response mapping is to find a mapping function that maps input features to a set of output labels. Here, the set of relevant input features is the error message. The set of output labels is the response category, i.e., no concern, immediate concern, and delayed concern.
[0038] For each error that exists in the non-intervention normal dataset and is identified as a valid variable in any iteration of step 704 (as described below) and also exists in the partial intervention data i The alarm time is the maximum duration threshold (TimeToAlarm(error)). i )=maximum(duration threshold j )), where maximum(durationthreshold) j ) is the maximum of all duration thresholds across all clusters identified by the iterative grouping process. It's important to note that, depending on the iterative grouping, each cluster has an associated error code, which can represent "no error".
[0039] In this embodiment, if an error present in the normal non-intervention data is not identified as a valid variable, it receives a null (NULL) alarm time and a "not concerned" label independent of its presence in the partial intervention data. This is particularly suitable when minimizing false positives to reduce operator workload is a priority. For each error found in the partial intervention data but not in the normal non-intervention dataset, the alarm time is 0 (TimeToAlarm(error)). k ) = 0), because if error k In the intervention data and error k If an error does not exist in the non-intervention normal dataset, it means that the error requires immediate attention. For each significant error whose importance is calculated based on the F-statistic and is found in the non-intervention normal dataset rather than in the partial intervention data, the alarm time is empty (TimeToAlarm(error)). k () = Null), which means that this error does not need to be considered. Note that we assume the dataset is large enough that each error is seen in at least one of the normal non-intervention and / or partially intervention data.
[0040] In this embodiment, the response mapping marker determined using the alarm time threshold is calculated as follows: for each error assigned an alarm time of 0, the marker is "Immediate Concern". For each error assigned an empty alarm time, the marker is "No Concern". For each error assigned an alarm time greater than zero but not empty, the marker is "Delayed Concern". An alternative embodiment assigns markers directly based on the characteristics of any errors present in the supply step, without having to calculate the alarm time. Specifically, for each error determined to be a valid variable in any iteration of step 704 and present in both the normal non-intervention data and the partial intervention data, the marker is "Delayed Concern". In this embodiment, if an error present in the normal non-intervention data is not determined to be a valid variable, it receives a "No Concern" marker, independent of its presence in the partial intervention data. For any error in the partial intervention data but not in the non-intervention normal data, there is an associated "Immediate Concern" marker. For errors found in the non-intervention normal dataset, as referenced below... Figure 7 Any variables identified as valid in any iteration of step 704 and not found in the partial intervention data are marked with an associated "not concerned" tag. Therefore, as Figure 8 As shown and described below, a tag is assigned to an individual supply (in a given supply step) based on the error of the individual supply.
[0041] Figure 7 This is a flowchart illustrating a method 700 for performing iterative grouping using a non-interventional normal dataset according to an embodiment of the present disclosure. The inputs to method 700 are (a) non-interventional normal data and (b) a temporary list of unprocessed clusters, tmpList (which is initialized to an empty list). The temporary list of unprocessed clusters includes a list of valid variables associated with each cluster. The list of valid variables associated with each cluster is also initialized to an empty list.
[0042] Method 700 begins in step 701 by aggregating a list of unprocessed clusters. At the start of the method, tmpList is populated with groups corresponding to the uninterrupted normal data and is the only group present in the list. This group is marked 'unprocessed'. At this point, there are no valid variables associated with the clusters. In step 702, method 700 determines whether all clusters in tmpList have been marked as processed. If so, method 700 ends in step 713. If not all clusters have been marked as processed, then in step 703, method 700 selects unprocessed clusters and uses the supply characteristics of the supply step (except for the step duration) as factor variables for the current cluster and uses the step duration (st i)304 is used as the response variable to calculate the F-statistic for the analysis of variance, ignoring interaction effects. A non-restrictive example of a supply characteristic used as a factor variable is the supply parameter 306 (which can be encoded) and the encoded error message 310, such as... Figure 3 As shown. At the start of processing, the current cluster / group will be the entire uninterrupted normal dataset. Method 700 uses the F-statistic at step 704 to determine, for example, by comparing the F-statistic with a threshold, the effective variables that have an impact on the step duration. In step 705, method 700 determines whether any new effective variables exist in the currently being processed cluster. At the start of processing, the response will be 'yes' unless no variable was determined to be effective in step 704.
[0043] If, in step 705, there are no remaining additional factor variables or no new valid variables identified for clustering, then method 700 marks the cluster as processed and assigns a value n*IQR in step 714, where IQR is the interquartile range, n is a number such as 1, 1.5, 2, or 3, as a duration threshold for clustering, and returns to step 701.
[0044] In step 705, if new valid variables exist, method 700 clusters the current cluster into multiple clusters in step 706. Each cluster corresponds to a unique combination of values for the valid variable. In step 706, the total number (e.g., M) of unique combinations of values for the valid variable is also determined. For each of the multiple M clusters, method 700 calculates n*interquartile range (IQR) in step 707, where n is the variable (e.g., 6*IQR) and the percentage of data points whose step duration is greater than n*IQR, e.g., P_o. IQR is calculated as the difference between 75% and 25% of a set of step duration values in the identified clusters. Furthermore, since each of the multiple clusters needs to be checked to determine whether its corresponding P_o is less than a preset threshold, the count of the number of sub-clusters processed so far (given by subclusterProcessedCnt) is maintained and initialized to 0. In step 707A, method 700 compares the percentage of data points with step duration greater than n*IQR(P_o) for the sub-clusters obtained in step 706 with the preset threshold. If P_o is less than a preset threshold, then method 700 assigns the n*IQR calculated in step 707 as a duration threshold for a specific combination of valid variables associated with the cluster in step 708, and marks the cluster as processed. Note that based on the result of step 704, the set of valid variables associated with the cluster and their values are known. A list of those valid variables is maintained for each cluster. In step 711, subclusterProcessedCnt is incremented by 1 to indicate that the cluster has been processed.
[0045] Method 700 repeats steps 707A and 708 or steps 710 and 711 and 712 until all M sub-clusters (generated in step 706) have been processed. Step 712 checks whether all sub-clusters generated by method 700 in step 706 have been processed. If, for any sub-cluster generated using step 706, P_o is greater than the preset threshold determined in step 707A, then in step 710, that cluster, along with the valid variables used to obtain that cluster, is added to a temporary list tmpList of unprocessed clusters. It can be noted that an aggregate set of all such clusters is maintained in step 701. The result of this iterative grouping method is a set of clusters, each with its set of valid variables and their values, and its calculated duration threshold.
[0046] Figure 8 A diagram 800 illustrates the identification of "error" and "image template type" according to an embodiment of the present disclosure. In the embodiment, "encoding error" and "encoded image template type" are based on... Figure 7The F-statistic determined during the first access in step 704 is identified as a valid variable. The F-statistic is used to identify variables that affect the duration of the step. In Figure 800, the first seven of the eight rows correspond to the clusters determined in iterative grouping step 706. For each cluster, the disclosed embodiment calculates n*IQR and assigns this value to a duration threshold for the cluster. For each of the four coded error values (0, 1, 2, and 3), the text at the top gives the state of the error encoded by the coded error value. Specifically, errors encoded as 0 and 3 exist in both the partial intervention data and the non-intervention normal data and are identified as valid variables during iterative grouping. Supplies with this error will be assigned the "delayed attention" label. Errors encoded as 1 exist in the non-intervention normal data but not in the partial intervention data. Supplies with this error will be assigned the "no attention" label. Errors encoded as 2 exist only in the partial intervention data. Supplies with this error will be assigned the "immediate attention" label. Figure 800 also shows the duration threshold for each cluster, calculated as n*IQR. In Figure 800, during the first access to step 704, the valid variables are identified as “coding error” and “coded image template type”, resulting in combinations of parameter values (0,0), (0,1), (1,0), (1,1), (3,0), and (3,1), where the first value in each of these combinations is “coding error” and the second value in parentheses is “coded image template type”. For the (coding error, coded_tmplate_type) value corresponding to (3,0), in step 707A, the percentage of outliers is determined to be greater than a preset threshold; therefore, new valid variables (i.e., coded OS type) are identified for this cluster using steps 703 and 704. Based on the unique combinations of the three valid variables (i.e., encoding error, encoded_template_type, and coded OS type), the cluster is further subdivided into groups, resulting in the clusters shown in rows 3 and 4. Moreover, in Figure 800, the clusters where the coded OS type was not identified as a valid variable have NA values in the corresponding cells. The specific rows where the encoded OS type is labeled NA are 1, 2, 5, 6, and 7. The first cluster corresponds to the combination of encoding error and encoded image template type values of 0, 0. The duration threshold for this cluster is 123.53. The second cluster corresponds to the combination of encoding error and encoded image template type values of 0, 1. The duration threshold for this cluster is 363.5. The third cluster corresponds to the combination of encoding error, encoded image template type, and encoded OS type values of 3, 0, 0. The duration threshold for this cluster is 200.0. The fourth cluster corresponds to the combination of encoding error, encoded image template type, and encoded OS type values of 3, 0, 1. The duration threshold for this cluster is 230.0. The fifth cluster corresponds to the combination of encoding error and encoded image template type values of 3, 1.The duration threshold for this cluster is 185.0. The five clusters generated by the iterative grouping process set their alert time to 363.5, which is the maximum duration threshold for all clusters associated with any errors that appear in both the non-interventional normal and partially interventional data and are identified as significant variables during the iterative grouping process. These clusters correspond to coded errors 0 and 3, which exist in both the partially interventional and non-interventional normal data. This maximum threshold is set as the alert time for delayed attention categories of errors identified as significant variables in both the partially interventional and non-interventional normal data. Therefore, when a supply stall with an error coded 0 or 3 lasts for a period longer than the maximum threshold, the user or system should intervene to resolve the stalled supply. The last row in Figure 800 corresponds to coded error 2, which exists only in the partially interventional data. Therefore, a supply with coded error 2 will be marked as "Immediate Attention". The last row in Figure 800 has empty cells corresponding to coded errors, coded image template types, and coded OS types because error 2 does not exist in the non-interventional data and is not considered part of the iterative grouping process. As described above, in this embodiment, a response flag is assigned based on the determination and occurrence of an error in one or both of the partially intervened and non-interventional normal data, without calculating the alarm time. In this case, the user may be warned that the supply of attention may need to be delayed, but a specific time may not be given.
[0047] Figure 9 This is a graph 900 depicting the results of training and testing data from disjoint time periods according to embodiments of the present disclosure. Column 902 of graph 900 lists different supply steps, such as STEP1, STEP2, STEP3, and STEP4. Column 904 provides the quantity of supplies including the corresponding supply steps in column 902. Column 906 provides the percentage of true positive rate (i.e., recall). Column 908 provides the percentage of positive predictive values (i.e., precision). The true positive rate is measured based on the number of true positives and the number of false positives. A supply is a true positive when there is evidence of human intervention and the response classification is “immediate concern” or “delayed concern.” A supply is a false positive when there is no evidence of human intervention but the response label is designated as “immediate concern” or “delayed concern.” Similarly, a supply is a true negative when there is no evidence of human intervention and the response label is designated as “not concerned.” A false negative occurs when the supply is interfered with by a human because it is blocked, but the assigned response label is “not concerned.” As shown in Table 900, using the disclosed embodiments, there is high precision and high recall with very few false positives. Therefore, the disclosed embodiments provide an efficient method for predicting intervention response supply workflows using data logs guided by knowledge-based intervention data.
[0048] Figure 10This is a block diagram illustrating the hardware architecture of a system 1000, in which various aspects of exemplary embodiments may be implemented according to embodiments of the present disclosure. For example, the data processing system 1000 may be configured to store and execute data for performing tasks in… Figure 1 , Figure 2 ,and Figure 7 The instructions for the process described herein. In the depicted example, the data processing system 1000 employs a hub architecture including a Northbridge and Memory Controller Center (NB / MCH) 1006 and a Southbridge and Input / Output (I / O) Controller Center (SB / ICH) 1010. A processor 1002, main memory 1004, and graphics processor 1008 are connected to the NB / MCH 1006. The graphics processor 1008 can be connected to the NB / MCH 1006 via an Accelerated Graphics Port (AGP). A computer bus, such as bus 1032 or bus 1034, can be implemented using any type of communication structure or architecture that provides data transfer between different components or devices attached to that structure or architecture.
[0049] In the depicted example, network adapter 1016 is connected to SB / ICH 1010. Audio adapter 1030, keyboard and mouse adapter 1022, modem 1024, read-only memory (ROM) 1026, hard disk drive (HDD) 1012, compact disk read-only memory (CD-ROM) drive 1014, universal serial bus (USB) port and other communication ports 1018, and peripheral component interconnect / fast peripheral component interconnect (PCI / PCIe) device 1020 are connected to SB / ICH 1010 via buses 1032 and 1034. PCI / PCIe devices may include, for example, Ethernet adapters, plug-in cards, and personal computing (PC) cards for notebook computers. PCI uses a card bus controller, while PCIe does not. ROM 1026 may be, for example, a flash basic input / output system (BIOS). Modem 1024 or network adapter 1016 can be used to send and receive data over a network.
[0050] HDD 1012 and CD-ROM drive 1014 are connected to SB / ICH 1010 via bus 1034. HDD 1012 and CD-ROM drive 1014 may use, for example, an Integrated Drive Electronics (IDE) or Serial Advanced Technology Attachment (SATA) interface. In some embodiments, HDD 1012 may be replaced by other forms of data storage devices, including but not limited to solid-state drives (SSDs). Super I / O (SIO) device 1028 may be connected to SB / ICH 1010. SIO device 1028 may be an on-board chip configured to assist in performing controller functions less demanding on SB / ICH 1010, such as controlling a printer port, controlling a fan, and / or controlling small light-emitting diodes (LEDs) of the data processing system 1000.
[0051] The data processing system 1000 may include a single processor 1002 or may include multiple processors 1002. Furthermore, the processor 1002 may have multiple cores. For example, in one embodiment, the data processing system 1000 may employ a large number of processors 1002 including hundreds or thousands of processor cores. In some embodiments, the processors 1002 may be configured to perform a set of coordinated computations in parallel.
[0052] The operating system is executed on the data processing system 1000 using the processor 1002. The operating system coordinates and provides... Figure 10 The data processing system 1000 controls various components within it. Different applications and services can run in conjunction with the operating system. Instructions for the operating system, applications, and other data reside on storage devices, such as one or more HDDs 1012, and can be loaded into main memory 1004 for execution by processor 1002. In some embodiments, additional instructions or data may be stored on one or more external devices. The processes described herein with respect to illustrative embodiments can be executed by processor 1002 using computer-usable program code, which may reside in memory (e.g., main memory 1004, ROM 1026) or one or more peripheral devices.
[0053] This invention can be a system, method, and / or computer program product with any possible level of technical detail integration. The computer program product may include a computer-readable storage medium having computer-readable program instructions thereon for causing a processor to execute aspects of the invention.
[0054] Computer-readable storage media can be a tangible means for retaining and storing instructions for use by an instruction execution device. Computer-readable storage media can be, for example, but not limited to, electronic storage devices, magnetic storage devices, optical storage devices, electromagnetic storage devices, semiconductor storage devices, or any suitable combination of the foregoing. A non-exhaustive list of more specific examples of computer-readable storage media includes: portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), static random access memory (SRAM), portable compact disk read-only memory (CD-ROM), digital universal disk (DVD), memory sticks, floppy disks, mechanical encoding devices such as punch cards or protrusions in slots having instructions recorded thereon, and any suitable combination of the foregoing. As used herein, computer-readable storage media should not be construed as transient signals themselves, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through waveguides or other transmission media (e.g., light pulses passing through fiber optic cables), or electrical signals transmitted through wires.
[0055] The computer-readable program instructions described herein can be downloaded from a computer-readable storage medium to a suitable computing / processing device via a network (e.g., the Internet, a local area network, a wide area network, and / or a wireless network), or to an external computer or external storage device. The network may include copper cables, optical fibers, wireless transmission, routers, firewalls, switches, gateway computers, and / or edge servers. A network adapter card or network interface in each computing / processing device receives the computer-readable program instructions from the network and forwards them to a computer-readable storage medium within the suitable computing / processing device.
[0056] Computer-readable program instructions used to perform the operations of this invention may be assembly instructions, instruction set architecture (ISA) instructions, machine instructions, machine-dependent instructions, microcode, firmware instructions, state setting data, integrated circuit configuration data, or source code or object code written in any combination of one or more programming languages, including object-oriented programming languages (such as Smalltalk, C++, etc.) and procedural programming languages (such as the "C" programming language or similar programming languages). The computer-readable program instructions may be executed entirely on a user's computer, partially on a user's computer, as a standalone software package, partially on a user's computer and partially on a remote computer, or entirely on a remote computer or server. In the latter case, the remote computer may be connected to the user's computer via any type of network (including a local area network (LAN) or a wide area network (WAN)) or may be connected to an external computer (e.g., via the Internet using an Internet service provider). In some embodiments, electronic circuitry including, for example, programmable logic circuitry, field-programmable gate arrays (FPGAs), or programmable logic arrays (PLAs) may execute computer-readable program instructions by personalizing the electronic circuitry with state information from the computer-readable program instructions in order to perform aspects of this invention.
[0057] The present invention will now be described with reference to flowchart illustrations and / or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It should be understood that each block of the flowchart illustrations and / or block diagrams, and combinations of blocks in the flowchart illustrations and / or block diagrams, can be implemented by computer-readable program instructions.
[0058] These computer-readable program instructions may be provided to a processor of a general-purpose computer, a special-purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions / actions specified in one or more blocks of a flowchart and / or block diagram. These computer-readable program instructions may also be stored in a computer-readable storage medium that causes a computer, programmable data processing apparatus, and / or other device to operate in a particular manner, such that the computer-readable storage medium storing the instructions comprises an article of manufacture containing instructions that implement aspects of the functions / actions supplied in one or more blocks of a flowchart and / or block diagram.
[0059] Computer-readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other device to cause a series of operational steps to be performed on the computer, other programmable apparatus, or other device to produce a computer-implemented method, such that the instructions that execute on the computer, other programmable apparatus, or other device perform the functions / actions specified in one or more boxes of a flowchart and / or block diagram.
[0060] The flowcharts and block diagrams in the accompanying drawings illustrate the architecture, functionality, and operation of possible implementations of systems, methods, and computer program products according to various embodiments of the present invention. Each block in a flowchart or block diagram may represent a module, segment, or portion of instructions, including one or more executable instructions for implementing a specified logical function. In some alternative implementations, the functions marked in the blocks may occur in a different order than indicated in the figures. For example, depending on the functions involved, two consecutively shown blocks may actually be executed substantially simultaneously, or these blocks may sometimes be executed in reverse order. It should also be noted that each block in the block diagrams and / or flowcharts, and combinations of blocks in the block diagrams and / or flowcharts, can be implemented using a dedicated hardware-based system that performs the supplied functions or actions or performs a combination of dedicated hardware and computer instructions.
[0061] Various embodiments of the invention have been described for illustrative purposes, but are not intended to be exhaustive or limited to the disclosed embodiments. Many modifications and variations will be apparent to those skilled in the art without departing from the scope of the described embodiments. Furthermore, the steps of the methods described herein can be performed in any suitable order, or simultaneously as appropriate. The terminology used herein has been chosen to best explain the principles of the embodiments, their practical application, or technical improvements to technologies found in the market, or to enable those skilled in the art to understand the embodiments disclosed herein.
[0062] Figure 10 This is a block diagram illustrating the hardware architecture of a system 1000 that may implement aspects of exemplary embodiments according to embodiments of the present disclosure. For example, the data processing system 1000 may be configured to store and execute for performing Figure 2 and Figure 3 The instructions for the process described herein. In the depicted example, the data processing system 1000 employs a hub architecture including a Northbridge and Memory Controller Center (NB / MCH) 1006 and a Southbridge and Input / Output (I / O) Controller Center (SB / ICH) 1010. A processor 1002, main memory 1004, and graphics processor 1008 are connected to the NB / MCH 1006. The graphics processor 1008 can be connected to the NB / MCH 1006 via an Accelerated Graphics Port (AGP). A computer bus, such as bus 1032 or bus 1034, can be implemented using any type of communication structure or architecture that provides data transfer between different components or devices attached to that structure or architecture.
[0063] In the depicted example, network adapter 1016 is connected to SB / ICH 1010. Audio adapter 1030, keyboard and mouse adapter 1022, modem 1024, read-only memory (ROM) 1026, hard disk drive (HDD) 1012, compact disc read-only memory (CD-ROM) drive 1014, universal serial bus (USB) port and other communication ports 1018, and peripheral component interconnect / fast peripheral component interconnect (PCI / PCIe) device 1020 are connected to SB / ICH 1010 via buses 1032 and 1034. PCI / PCIe devices may include, for example, Ethernet adapters, plug-in cards, and personal computing (PC) cards for notebook computers. PCI uses a card bus controller, while PCIe does not. ROM 1026 may be, for example, a flash basic input / output system (BIOS). Modem 1024 or network adapter 1016 can be used to send and receive data over a network.
[0064] HDD 1012 and CD-ROM drive 1014 are connected to SB / ICH 1010 via bus 1034. HDD 1012 and CD-ROM drive 1014 may use, for example, an Integrated Drive Electronics (IDE) or Serial Advanced Technology Attachment (SATA) interface. In some embodiments, HDD 1012 may be replaced by other forms of data storage devices, including but not limited to solid-state drives (SSDs). Super I / O (SIO) device 1028 may be connected to SB / ICH 1010. SIO device 1028 may be an on-board chip configured to assist in performing controller functions less demanding on SB / ICH 1010, such as controlling a printer port, controlling a fan, and / or controlling small light-emitting diodes (LEDs) of the data processing system 1000.
[0065] The data processing system 1000 may include a single processor 1002 or may include multiple processors 1002. Furthermore, the processor 1002 may have multiple cores. For example, in one embodiment, the data processing system 1000 may employ a large number of processors 1002 including hundreds or thousands of processor cores. In some embodiments, the processors 1002 may be configured to perform a set of coordinated computations in parallel.
[0066] The operating system is executed on the data processing system 1000 using the processor 1002. The operating system coordinates and provides... Figure 10The data processing system 1000 controls various components within it. Different applications and services can run in conjunction with the operating system. Instructions for the operating system, applications, and other data reside on storage devices, such as one or more HDDs 1012, and can be loaded into main memory 1004 for execution by processor 1002. In some embodiments, additional instructions or data may be stored on one or more external devices. The processes described herein with respect to illustrative embodiments can be executed by processor 1002 using computer-usable program code, which may reside in memory (e.g., main memory 1004, ROM 1026) or one or more peripheral devices.
Claims
1. A method for determining the intervention response category of a supply workflow, the method comprising: Determine the supply characteristics of each supply step; Perform outlier detection to identify and remove outliers from non-interventional data to produce a non-interventional normal dataset; Iterative grouping is performed on the non-interventional normal dataset to determine the effective variables in the supply characteristics; as well as The supply response mapping is performed using the results of the iterative grouping, which includes the importance of errors, and the presence of errors in the non-interventional normal data and the partially interventional data, to classify the supply into response categories.
2. The method according to claim 1, wherein, The response category is one of immediate attention, delayed attention, and no attention.
3. The method according to claim 1, wherein, The outlier detection uses the Local Outlier Factor (LOF) to identify outliers in the uninterrupted data.
4. The method according to claim 1, wherein, The supply characteristics of the supply steps include: The step duration of each supply step is calculated for all supplies using the event log associated with that supply step; Identify the supply parameters for the supply step; and Identify error messages in the event log that occur as part of the supply step.
5. The method according to claim 4, wherein, Performing the iterative grouping using the non-interventional normal dataset includes: Calculate the F statistic based on the supply parameters, the error message, and the step duration; Based on the F-statistic, effective variables are determined; and The non-interventional normal dataset is clustered into multiple clusters based on the combination of the values of the effective variables.
6. The method according to claim 5, wherein, The percentage of data points exceeding n × interquartile range is compared with a threshold for each cluster to determine whether the cluster is a candidate for further segmentation, where n is variable.
7. The method according to claim 5, wherein, The duration threshold for each cluster is calculated by evaluating the n×interquartile range for each cluster, where n is variable.
8. The method according to claim 5, wherein, The response category is delayed concern when errors exist in both the partial intervention data and the non-intervention normal dataset, and when the errors are valid variables as a result of iterative grouping.
9. The method according to claim 8, wherein, The alarm time for the error is the maximum value among all duration thresholds across all clusters with non-zero and non-empty duration thresholds.
10. The method according to claim 1, wherein, The response category is immediate concern when the error exists in the partial intervention data but not in the non-intervention normal dataset.
11. The method according to claim 1, wherein, The response category is "not concerned" when the error exists in the non-interventional normal dataset but not in the partial interventional data.
12. A system for determining an intervention response category for a supply workflow, comprising a memory for storing instructions and a processor configured to execute the instructions to: Determine the supply characteristics of each supply step; Perform outlier detection to identify and remove outliers from non-interventional data to produce a non-interventional normal dataset; Iterative grouping is performed on the non-interventional normal dataset to determine the effective variables in the supply characteristics; as well as The supply response mapping is performed using the results of the iterative grouping, which includes the importance of errors, and the presence of errors in the non-interventional normal data and the partially interventional data, to classify the supply into response categories.
13. The system according to claim 12, wherein, The response category is one of immediate attention, delayed attention, and no attention.
14. The system according to claim 12, wherein, The processor is further configured to execute the instructions to identify the outliers in the uninterrupted data using the Local Outlier Factor (LOF).
15. The system according to claim 12, wherein, The processor is further configured to execute the instructions to: The step duration of each supply step is calculated for all supplies using the event log associated with that supply step; Identify the supply parameters for the supply step; and Identify error messages in the event log that occur as part of the supply step.
16. The system according to claim 15, wherein, The processor is further configured to execute the instructions to: Calculate the F statistic based on the supply parameters, the error message, and the step duration; Based on the F-statistic, effective variables are determined; and The non-interventional normal dataset is clustered into multiple clusters based on the combination of the values of the effective variables.
17. The system according to claim 15, wherein, The percentage of data points exceeding n × interquartile range is compared with a threshold for each cluster to determine whether the cluster is a candidate for further segmentation, where n is variable.
18. The system according to claim 15, wherein, The duration threshold for each cluster is calculated by measuring the n×interquartile range of each cluster, where n is variable.
19. The system according to claim 15, wherein, The response category is delayed attention when the error, as a result of iterative grouping, is a valid variable and exists in both the partial intervention data and the non-intervention normal dataset, and wherein the alert time for the delayed attention is the maximum of all duration thresholds across all clusters having non-zero and non-empty duration thresholds; the response category is immediate attention when the error exists in the partial intervention data but not in the non-intervention normal dataset; and the response category is no attention when the error exists in the non-intervention normal dataset but not in the partial intervention data.
20. A computer program product for determining an intervention response category of a supply workflow, the computer program product comprising a computer-readable storage medium having program instructions implemented therewith, the program instructions being executable by a system processor to cause the system to: Determine the supply characteristics of each supply step; Perform outlier detection to identify and remove outliers from non-interventional data to produce a non-interventional normal dataset; Iterative grouping is performed on the non-interventional normal dataset to determine the effective variables in the supply characteristics; as well as The supply response mapping is performed using the results of the iterative grouping, which includes the importance of errors, and the presence of errors in the non-interventional normal data and the partially interventional data, to classify the supply into response categories.