System and method for finding historically similar fault events

By receiving and comparing fault event metadata, and using natural language processing and machine learning models to generate similarity scores, the problem of not being able to quickly find similar fault events in existing systems is solved, thereby improving the efficiency of fault event resolution and reducing company costs.

CN122249798APending Publication Date: 2026-06-19FIDELITY INFORMATION SERVICES LLC

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
FIDELITY INFORMATION SERVICES LLC
Filing Date
2024-09-30
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Existing computing systems are unable to quickly and accurately identify and locate similar historical failure events, leading to longer failure event resolution times and increased company costs.

Method used

By receiving metadata from current and historical failure events, and using natural language processing algorithms and machine learning models, the system compares the configurable IDs, names, descriptions, and knowledge base articles of the failure events to generate similarity scores and output a list of similar historical failure events.

Benefits of technology

It improves the efficiency of fault event identification and resolution, helps users quickly find similar historical fault events, reduces fault event resolution time, and lowers company costs.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122249798A_ABST
    Figure CN122249798A_ABST
Patent Text Reader

Abstract

A method for finding similar historical failure events is disclosed. The method includes: receiving a data object indicating the occurrence of a current failure event associated with a configurable item, the data object including current failure event metadata, the current failure event metadata including a configurable item identifier (ID), a configurable item name, and a description of the current failure event; receiving multiple historical data objects corresponding to multiple previous failure events; determining one or more historical data objects similar to the data object from among the multiple historical data objects based on a comparison of the current failure event metadata and the metadata of previous failure events; generating a score for each of the one or more historical data objects based on the comparison of the current failure event metadata and the metadata of previous failure events; and outputting the one or more historical data objects similar to the data object to a user.
Need to check novelty before this filing date? Find Prior Art

Description

[0001] Cross-reference to related applications

[0002] This patent application claims the benefit of priority to U.S. non-provisional application No. 18 / 492,172, filed October 23, 2023, the entire contents of which are incorporated herein by reference. Technical Field

[0003] This disclosure generally relates to information technology (IT) management systems, and more specifically to systems and methods for identifying historically similar failure events that have occurred in such systems. Background Technology

[0004] In computing systems, such as those executing financial services and electronic payment transactions, programming changes can occur. For example, software can be updated. Changes in a system can lead to failure events, defects, problems, errors, or malfunctions (collectively referred to as failure events). These failure events may occur at the time of the software change or at a later time. These failure events can incur significant costs for a company because users may be unable to access services, and because the company expends resources to resolve the failure events.

[0005] These failure events in the system may need to be investigated and resolved for the software service to function correctly. For example, the incident resolution team may spend time identifying what problems have occurred in the software service. The faster a failure event can be resolved, the lower the potential costs for the company. Therefore, it can be important for the company to identify and fix such failure events in a timely manner (e.g., by writing new code or updating deployed code).

[0006] Failure events within a system may be related and may recur from time to time. Identifying previous failure events similar to the current failure event may lead to faster resolution of the failure event (e.g., updates performed by a previous problem can be used to solve a new problem). Many existing computing systems lack the ability to find historically similar failure events to analyze new failure events. This disclosure relates to addressing this and other shortcomings of existing computing system failure event analysis.

[0007] The background description provided herein is for the purpose of generally presenting the context of this disclosure. Unless otherwise indicated herein, the materials described in this section are not prior art to the claims of this application and are not, by virtue of their inclusion in this section, admitted to being prior art or suggestions of prior art. Summary of the Invention

[0008] In some aspects, the technology described herein relates to a computer-implemented method for finding historically similar fault events in a system, the method comprising: receiving a data object indicating the occurrence of a current fault event associated with a configurable item, the data object including current incident metadata including an item identifier ID, a configurable item name, and a description of the current fault event; receiving a plurality of historical data objects corresponding to a plurality of previous fault events, each of the plurality of historical data objects indicating the occurrence of a previous fault event and including previous fault event metadata including an item identifier ID, a configurable item name, and a description of the previous fault event; determining one or more historical data objects similar to the data object from among the plurality of historical data objects based on a comparison of the current fault event metadata with the previous fault event metadata; generating a score for each of the one or more historical data objects based on the comparison of the current fault event metadata with the previous fault event metadata; and outputting the one or more historical data objects similar to the data object to a user via a graphical user interface (GUI).

[0009] In some respects, the techniques described herein relate to a method in which identifying one or more historical data objects further includes applying natural language processing algorithms to the data objects and multiple historical data objects.

[0010] In some respects, the techniques described herein relate to a method in which multiple historical data objects are received during a predetermined time period.

[0011] In some respects, the techniques described herein relate to a method that further includes using a natural language processing module to extract knowledge-based KB articles and topics from each of the descriptions of the current failure event and the descriptions of previous failure events.

[0012] In some respects, the techniques described in this paper involve a method in which a natural language processing module utilizes either a linear discriminant analysis algorithm or a Gibbs-sampled Dirichlet mixture model algorithm to extract the topic.

[0013] In some aspects, the techniques described herein relate to a method in which generating a score for each of one or more historical data objects includes: determining a first list of historical data objects based on the similarity between the configurable item ID of the current failure event and the configurable item ID of each of the previous failure events; determining a second list of historical data objects based on the similarity between the configurable item name of the current failure event and the configurable item name of each of the previous failure events; determining a third list of historical data objects based on the similarity between the topic of the current failure event and the topic of each of the previous failure events; and determining a fourth list of historical data objects based on the similarity between KB articles of the current failure event and KB articles of each of the previous failure events.

[0014] In some respects, the technique described herein relates to a method in which generating a score for each of one or more historical data objects further includes: assigning one or more initial scores to each of the one or more historical data objects based on whether the historical data object is identified as being in a first list, a second list, a third list, and / or a fourth list.

[0015] In some respects, the technique described herein relates to a method in which generating a score for each of one or more historical data objects further comprises: assigning a weighted average score to each of the one or more historical data objects, wherein if the historical data object is in only one of a first list, a second list, a third list, and a fourth list, the weighted average score is an initial score, and when the historical data object is in two or more of the first, second, third, and fourth lists, the weighted average score is a combination of the initial scores, which is a score generated for each of the one or more historical data objects.

[0016] In some respects, the techniques described herein relate to a method in which one or more historical data objects are included in a ranked list, which combines a first list, a second list, a third list, and a fourth list, and are ranked based on one or more weighted average scores.

[0017] In some aspects, the technology described herein relates to a system for finding historically similar failure events in a system, the system comprising: a memory storing processor-readable instructions therein; and at least one processor configured to access the memory and execute the processor-readable instructions to perform an operation comprising: receiving a data object indicating the occurrence of a current failure event associated with a configurable item, the data object including current failure event metadata including a configurable item identifier (ID), a configurable item name, and a description of the current failure event; receiving a plurality of historical data objects corresponding to a plurality of previous failure events, each of the plurality of historical data objects indicating the occurrence of a previous failure event and including previous failure event metadata including a configurable item ID, a configurable item name, and a description of the previous failure event; determining one or more historical data objects similar to the data object from among the plurality of historical data objects based on a comparison of the current failure event metadata with the previous failure event metadata; generating a score for each of the one or more historical data objects based on a comparison of the current failure event metadata with the previous failure event metadata; and outputting one or more historical data objects similar to the data object to a user via a graphical user interface (GUI).

[0018] In some respects, the techniques described herein relate to a system in which identifying one or more historical data objects further includes applying natural language processing algorithms to the data objects and multiple historical data objects.

[0019] In some respects, the technology described herein relates to a system in which multiple historical data objects are received over a predetermined time period.

[0020] In some respects, the technique described herein relates to a system in which the operation further includes: extracting knowledge base (KB) articles and topics from each of the description of the current failure event and the description of the previous failure event using a natural language processing module.

[0021] In some respects, the techniques described in this paper relate to a system in which a natural language processing module utilizes either a linear discriminant analysis algorithm or a Gibbs-sampled Dirichlet mixture model algorithm to extract the topic.

[0022] In some aspects, the techniques described herein relate to a system in which generating a score for each of one or more historical data objects includes: determining a first list of historical data objects based on the similarity between the configurable item ID of the current failure event and the configurable item ID of each of the previous failure events; determining a second list of historical data objects based on the similarity between the configurable item name of the current failure event and the configurable item name of each of the previous failure events; determining a third list of historical data objects based on the similarity between the topic of the current failure event and the topic of each of the previous failure events; and determining a fourth list of historical data objects based on the similarity between KB articles of the current failure event and KB articles of each of the previous failure events.

[0023] In some respects, the technology described herein relates to a system in which generating a score for each of one or more historical data objects further includes: assigning one or more initial scores to each of the one or more historical data objects based on whether the historical data object is identified as being in a first list, a second list, a third list, and / or a fourth list.

[0024] In some respects, the technique described herein relates to a system in which generating a score for each of one or more historical data objects further comprises: assigning a weighted average score to each of the one or more historical data objects, wherein if the historical data object is in only one of a first list, a second list, a third list, and a fourth list, the weighted average score is an initial score, and when the historical data object is in two or more of the first, second, third, and fourth lists, the weighted average score is a combination of the initial scores, which is a score generated for each of the one or more historical data objects.

[0025] In some respects, the technique described herein relates to a system in which one or more historical data objects are included in a sorted list, which combines a first list, a second list, a third list, and a fourth list, and sorts the one or more historical data objects based on a corresponding one or more weighted average scores.

[0026] In some aspects, the technology described herein relates to a non-transitory computer-readable medium storing processor-readable instructions that, when executed by at least one processor, cause the at least one processor to perform an operation comprising: receiving a data object indicating the occurrence of a current failure event associated with a configurable item, the data object including current failure event metadata including a configurable item identifier (ID), a configurable item name, and a description of the current failure event; receiving a plurality of historical data objects corresponding to a plurality of previous failure events, each of the plurality of historical data objects indicating the occurrence of a previous failure event and including previous failure event metadata including a configurable item ID, a configurable item name, and a description of the previous failure event; determining one or more historical data objects similar to the data object from among the plurality of historical data objects based on a comparison of the current failure event metadata with the previous failure event metadata; generating a score for each of the one or more historical data objects based on a comparison of the current failure event metadata with the previous failure event metadata; and outputting one or more historical data objects similar to the data object to a user via a graphical user interface (GUI).

[0027] In some respects, the techniques described herein relate to a non-transitory computer-readable medium in which identifying one or more historical data objects further includes applying natural language processing algorithms to the data objects and multiple historical data objects. Attached Figure Description

[0028] The accompanying drawings, which are incorporated in and form a part of this specification, illustrate several embodiments and, together with the description, serve to explain the principles of this disclosure.

[0029] Figure 1 An exemplary system overview is provided for a data pipeline for an artificial intelligence model, according to one or more implementations, which is used to predict and eliminate failure events in the system.

[0030] Figure 2 An exemplary block diagram of a system for determining historically similar failure events, according to one or more embodiments, is depicted.

[0031] Figure 3 An exemplary block diagram depicts similar subject modules according to one or more implementation schemes.

[0032] Figure 4 An exemplary flowchart is depicted illustrating how fault event data, according to one or more implementations, can be received and processed to determine similar fault events in the history of the system.

[0033] Figure 5A An exemplary table is depicted, which is output from the same configurable item (CI) module according to one or more implementations.

[0034] Figure 5B An exemplary table is depicted, based on one or more implementations, output from a similar configurable item (CI) module.

[0035] Figure 5C An exemplary table output from a similar topic module is depicted according to one or more implementation schemes.

[0036] Figure 5D Another exemplary table is depicted, output from a similar topic module according to one or more implementation schemes.

[0037] Figure 5E An exemplary table output from a Knowledge Base (KB) article module is depicted according to one or more implementation schemes.

[0038] Figure 6 An exemplary table depicts a weighted score output from a full similarity type module according to one or more implementations.

[0039] Figure 7 An exemplary table is depicted, output from a full similarity type module according to one or more implementations.

[0040] Figure 8 A flowchart for identifying similar failure events in history, based on one or more implementation schemes, is depicted.

[0041] Figure 9 Computer systems for performing the techniques described herein are illustrated according to one or more embodiments of the present disclosure. Detailed Implementation

[0042] This disclosure generally relates to the field of software testing, and more specifically to systems and methods for finding similar failure events in history.

[0043] The subject matter of this disclosure will now be described more fully with reference to the accompanying drawings, which illustrate specific exemplary embodiments by way of illustration. Embodiments or implementations described herein as “exemplary” should not be construed as being preferred or advantageous, for example, relative to other embodiments or implementations; rather, it is intended to reflect or indicate that an embodiment is an “exemplary” embodiment. The subject matter can be embodied in a variety of different forms, and therefore, the covered or claimed subject matter is intended to be construed as not being limited to any of the exemplary embodiments set forth herein; exemplary embodiments are provided only for illustration. Similarly, a reasonably broad scope of the claimed or covered subject matter is intended. For example, the subject matter can be embodied as a method, apparatus, component, or system, among other things. Thus, embodiments can take the form, for example, hardware, software, firmware, or any combination thereof (other than software itself). Therefore, the following detailed description is not intended to be construed in a limiting sense.

[0044] Throughout the specification and claims, terms may have nuanced meanings beyond those explicitly stated, implied or suggested in the context. Similarly, the phrase "in one embodiment" as used herein does not necessarily refer to the same embodiment, and the phrase "in another embodiment" as used herein does not necessarily refer to different embodiments. For example, the claimed subject matter is intended to include, in whole or in part, combinations of exemplary embodiments.

[0045] The terms used below may be interpreted in their broadest and most reasonable manner, even when used in conjunction with a detailed description of certain specific examples of this disclosure. In fact, some terms may even be emphasized below; however, any term intended to be interpreted in any limiting manner will be explicitly and specifically defined in this Detailed Description section.

[0046] This disclosure generally relates to information technology (IT) management systems, and more specifically to systems and methods for identifying similar failure events in history.

[0047] Software companies strive to avoid outages caused by failures such as software or hardware component upgrades or team member changes. The system described herein can be configured to analyze and / or process event data from IT systems. The system described herein can, for example, receive a stream of event data over a period of time. This event data can be further described as information technology (IT) event data. Event data can include, but is not limited to: (1) incidents, (2) alarms, (3) change data, (4) problems; and / or (5) anomalies.

[0048] A failure event can be an event that may disrupt or cause loss of system operation, service, or functionality. Failure events can be manually reported by customers or personnel, automatically logged by internal systems, or otherwise captured. Failure events may be caused by factors such as hardware failure, software failure, software defects, human error, and / or cyberattacks. Deploying, refactoring, or releasing software code may, for example, lead to failure events. Failure events may be detected during, for example, downtime or performance changes. Failure events can include characteristics, where failure event characteristics can refer to the quality or characteristics associated with the failure event. For example, failure event characteristics may include, but are not limited to, the severity of the failure event, the urgency of the failure event, the complexity of the failure event, the scope of the failure event, the cause of the failure event, and / or which configurable items correspond to the failure event (e.g., which systems / platforms / products are affected by the failure event), how it is described in free-form text, which business unit is affected, which category / subcategory is affected, and / or which assignment group is the failure event.

[0049] For example, an information technology (IT) management system may receive failure events (e.g., data objects indicating the occurrence of failure events) at a constant rate throughout the day. When a failure event is received, it may not be clear how a particular failure event relates to previous failure events. A better understanding of the relationships between received failure events compared to similar past failure events can help users or the system identify and potentially resolve system failure events.

[0050] IT management systems can receive failure events (e.g., data objects indicating the occurrence of failure events) at a constant rate throughout the day. When a failure event is received, it may not be clear how a particular failure event relates to previous failure events. A better understanding of the relationships between received failure events compared to similar past failure events can help users or the system identify and potentially resolve system failure events.

[0051] Processing massive amounts of information, such as failure events, to generate meaningful and actionable insights in information technology (IT) operations can be valuable to organizations. As IT management systems utilize sophisticated tools and sensors, they can receive billions of data points, and information overload can become a problem. The system and method described in this paper enable the identification of historically similar failure events to provide additional insights. Historically similar failure events can help users better understand the relationships between various failure events and can provide insights into potential solutions.

[0052] As discussed above, identifying and resolving current failure events in a system can be critical for repairing and / or operating the system most efficiently. Identifying and analyzing solutions to similar failure events can help users and / or the system determine solutions for the current failure event. The current system may not be able to accurately and efficiently locate similar historical failure events.

[0053] To address this system and the aforementioned problems, this disclosure describes a system and method described herein that can utilize natural language processing modeling to determine historically similar failure events. One or more embodiments include a system that can identify and record the following attributes from previous failure events and their corresponding configurables: configurable ID, configurable name, summary topic, and knowledge base (KB) article. In some examples, additional attributes such as issue type and cluster type can be extracted from the failure event. These attributes can be determined by applying a fuzzy keyword algorithm to the corresponding description of the failure event. The system can then compare the configurable ID, configurable name, summary topic, and KB article of a newly received failure event with the corresponding data of all previous failure events to find historically similar failure events. The system and method can, for example, apply a weighted average to the received attributes to prepare a ranked list of historical failure events determined to be most similar to the received failure event. The system and method can utilize a natural process language model to determine the list. The system can be further configured to determine a ranked similarity list for each received list of metadata (e.g., for ID, name, summary topic, and KB article). The system can further determine a list of combinations of similar failure events in history based on the received metadata.

[0054] Figure 1 An exemplary system overview is provided for a data pipeline for an artificial intelligence model, according to one or more embodiments, which is used to predict and resolve failure events in the system. The data pipeline system 100 may be a platform with multiple interconnected components. The data pipeline system 100 may include one or more servers, intelligent network devices, computing devices, components, and corresponding software for aggregating and processing data.

[0055] like Figure 1As shown, the data pipeline system 100 may include a data source 101, a collection point 120, a secondary collection point 110, a front gate processor 140, a data storage device 150, a processing platform 160, a data sink layer 170, a data sink layer 171, and an artificial intelligence module 180.

[0056] Data source 101 may include internal data 103 and third-party data 199. Internal data 103 may be a data source directly linked to data pipeline system 100. Third-party data 199 may be a data source externally connected to data pipeline system 100, as will be described in more detail below.

[0057] Both internal data 103 and third-party data 199 of data source 101 may include fault event data 102. Fault event data 102 may include fault event reports, where information for each fault event is provided including one or more of the following: fault event number, closure date / time, category, closure code, closure notes, detailed description, brief description, root cause, or assignment group. Fault event data 102 may include fault event reports, where information for each fault event is provided including one or more of the following: problem keywords, description, summary, tags, problem type, fix version, environment, author, or comment. Fault event data 102 may include fault event reports, where information for each fault event is provided including one or more of the following: filename, script name, script type, script description, display identifier, message, committer type, committer link, attributes, file change, or branch information. Fault event data 102 may include one or more of the following: real-time data, market data, performance data, historical data, utilization data, infrastructure data, or security data. These are merely examples of information that can be used as data, and this disclosure is not limited to these examples.

[0058] Fault event data 102 can be automatically generated by monitoring tools that generate alerts and fault event data to provide notifications of high-risk actions and faults in the IT environment, and can be generated as tickets. Fault event data may include metadata such as, for example, text fields, identification codes, and timestamps.

[0059] Internal data 103 can be stored in a relational database including a fault event table. The fault event table can be provided as one or more tables and can include, for example, one or more of problems, tasks, risk conditions, fault events, or changes. The relational database can be stored in the cloud. The relational database can be encrypted and connected to a gateway. The relational database can send periodic updates to and receive periodic updates from the cloud. The cloud can be a remote cloud service, a local service, or any combination thereof. The cloud may include a gateway connected to a processing API configured to transmit data to collection point 120 or auxiliary collection point 110. The fault event table can include fault event data 102.

[0060] The data pipeline system 100 may include third-party data 199 generated and maintained by third-party data producers. Third-party data producers may generate fault event data 102 from Internet of Things (IoT) devices, desktop devices, and sensors. Third-party data producers may include, but are not limited to, Tryambak, Appneta, Oracle, Prognosis, ThousandEyes, Zabbix, ServiceNow, Density, Dyatrace, etc. Fault event data 102 may include metadata indicating that the data belongs to a specific client or associated system.

[0061] Data pipeline system 100 may include an auxiliary acquisition point 110 to collect and preprocess fault event data 102 from data source 101. The auxiliary acquisition point 110 may be utilized before data is transmitted to acquisition point 120. The auxiliary acquisition point 110 may be, for example, Apache Minifi software. In one example, the auxiliary acquisition point 110 may run on a microprocessor used by a third-party data producer. Each third-party data producer may have an instance of the auxiliary acquisition point 110 running on the microprocessor. The auxiliary acquisition point 110 may support data formats, including but not limited to JSON, CSV, Avro, ORC, HTML, XML, and Parquet. The auxiliary acquisition point 110 may encrypt the fault event data 102 collected from the third-party data producer. The auxiliary acquisition point 110 may encrypt the fault event data using, but is not limited to, mutually authenticated transport layer security (mTLS), HTTPS, SSH, PGP, IPsec, and SSL. The auxiliary acquisition point 110 may perform initial transformations or processing on the fault event data 102. The auxiliary collection point 110 can be configured to collect data from various protocols, immediately generate data traceability, apply transformation and encryption to the data, and prioritize the data.

[0062] Data pipeline system 100 may include collection point 120. Collection point 120 may be a system configured to provide a secure framework for routing, transforming, and delivering data from data source 101 to downstream processing devices (e.g., front-end gateway processor 140). Collection point 120 may be, for example, software such as Apache NiFi. Collection point 120 may receive raw data and corresponding fields of the data, such as source name and ingestion time. Collection point 120 may run on a Linux virtual machine (VM) on a remote server. Collection point 120 may include one or more nodes. For example, collection point 120 may receive fault event data 102 directly from data source 101. In another example, collection point 120 may receive fault event data 102 from auxiliary collection point 110. Auxiliary collection point 110 may use, for example, a site-to-site protocol to transmit fault event data 102 to collection point 120. Collection point 120 may include a flow algorithm. As described herein, the flow algorithm may connect different processors to transfer and modify data from one source to another. For each third-party data producer, collection point 120 can have a separate streaming algorithm. Each streaming algorithm can include a processing group. The processing group can include one or more processors. One or more processors can, for example, retrieve fault event data 102 from a relational database. One or more processors can utilize the processing API of internal data 103 to make API calls to the relational database to retrieve fault event data 102 from the fault event table. One or more processors can further transmit the fault event data 102 to a target system, such as front-end gateway processor 140. Collection point 120 can encrypt the data via HTTPS, mutual authentication transport layer security (mTLS), SSH, PGP, IPsec, and / or SSL. Collection point 120 can support data formats including, but not limited to, JSON, CSV, Avro, ORC, HTML, XML, and Parquet. Collection point 120 can be configured to write messages to and communicate with the cluster of front-end gateway processors 140.

[0063] The data pipeline system 100 may include a distributed event streaming platform, such as a front-end gateway processor 140. The front-end gateway processor 140 may connect to and be configured to receive data from the collection point 120. The front-end gateway processor 140 may be implemented in an Apache Kafka cluster software system. The front-end gateway processor 140 may include one or more message brokers and corresponding nodes. The message broker may, for example, be an intermediate computer program module that translates messages from the sender's formal messaging protocol to the receiver's formal messaging protocol. The message broker may reside on a single node within the front-end gateway processor 140. The message broker of the front-end gateway processor 140 may run on a virtual machine (VM) on a remote server. The collection point 120 may send fault event data 102 to one or more message brokers within the message brokers of the front-end gateway processor 140. Each message broker may include a topic for storing fault event data 102 of similar categories. The topic may be an ordered event log. Each topic may include one or more subtopics. For example, one subtopic may store fault event data 102 related to network problems, and another topic may store fault event data 102 related to security vulnerabilities from third-party data producers. Each topic may further include one or more partitions. Partitioning can be a systematic way of breaking down a topic log file into many logs, each of which may be hosted on a separate server. Each partition may be configured to store up to one byte of fault event data 102. Each topic may be evenly partitioned across one or more message brokers for load balancing and scalability. The front-end gateway processor 140 may be configured to classify received data into multiple client categories, thereby forming multiple datasets associated with the corresponding client categories. These datasets may be stored separately in storage devices, as described in more detail below. The front-end gateway processor 140 may further transfer data to storage devices and processors for further processing.

[0064] For example, the front-end gateway processor 140 can be configured to assign specific data to corresponding topics. Alarm sources can be assigned to alarm topics, and fault event data can be assigned to fault event topics. Change data can be assigned to change topics. Problem data can be assigned to problem topics.

[0065] The data pipeline system 100 may include a software framework for a data storage device 150. The data storage device 150 can be configured for long-term storage and distributed processing. The data storage device 150 may be implemented using, for example, Apache Hadoop. The data storage device 150 may store fault event data 102 transferred from the front-end gateway processor 140. Specifically, the data storage device 150 can be used for distributed processing of the fault event data 102, and a Hadoop Distributed File System (HDFS) within the data storage device can be used to organize the communication and storage of the fault event data 102. For example, HDFS can replicate data from any node of the front-end gateway processor 140. This replication can prevent hardware or software failures of the front-end gateway processor 140. Processing can be executed concurrently on multiple servers.

[0066] Data storage device 150 may include HDFS configured to receive metadata (e.g., fault event data). Data storage device 150 may further utilize the MapReduce algorithm to process the data. The MapReduce algorithm allows for parallel processing of large datasets. Data storage device 150 may further utilize another resource coordinator (YARN) to aggregate and store data. YARN can be used for cluster resource management and planning tasks of the stored data. For example, a cluster computing framework such as processing platform 160 may be deployed to further utilize the HDFS of data storage device 150. For example, if data source 101 stops providing data, processing platform 160 may be configured to retrieve data directly from data storage device 150 or via front-end gateway processor 140. Data storage device 150 may use a programming model to allow distributed processing of large datasets across a cluster of computers. Data storage device 150 may include a master node and HDFS for distributed processing across multiple data nodes. The master node may store metadata, such as the number of blocks and their locations. The master node may maintain a file system namespace and regulate client access to said files. The master node can include files and directories and perform file system operations, such as naming, closing, and opening files. Data storage device 150 can scale from a single server to thousands of machines, each providing local computing and storage. Data storage device 150 can be configured to store fault event data in unstructured, semi-structured, or structured form. In one example, multiple datasets associated with corresponding client categories can be stored separately. The master node can store metadata, such as the location of individual datasets.

[0067] Data pipeline system 100 may include a real-time processing framework, such as processing platform 160. In one example, processing platform 160 may be a distributed data stream engine without its own storage layer. For example, this could be the software platform Apache Flink. In another example, the software platform Apache Spark may be utilized. Processing platform 160 may support both stream processing and batch processing. Stream processing can be a type of data processing that performs continuous real-time analysis on received data. Batch processing may involve receiving discrete datasets to process in batches. Processing platform 160 may include one or more nodes. Processing platform 160 may aggregate fault event data 102 received from front-end gateway processor 140 (e.g., fault event data 102 that has already been processed by front-end gateway processor 140). Processing platform 160 may include one or more operators to transform and process the received data. For example, a single operator may filter fault event data 102 and then connect to another operator to perform additional data transformations. Processing platform 160 may process fault event data 102 in parallel. A single operator may reside on a single node within processing platform 160. Processing platform 160 can be configured to filter specific processed data and send only that specific processed data to a specific data receiving layer. For example, depending on the data source of fault event data 102 (e.g., whether the data is internal data 103 or third-party data 199), the data can be transmitted to a separate data receiving layer (e.g., data receiving layer 170 or data receiving layer 171). Furthermore, additional data that is not needed at downstream modules (e.g., at artificial intelligence module 180) can be filtered and excluded before being transmitted to the data receiving layer.

[0068] Processing platform 160 can perform three functions. First, processing platform 160 can perform data validation. The values, structure, and / or format of the data can be matched with the patterns of the destination (e.g., data receiving layer 170). Second, processing platform 160 can perform data transformation. For example, source fields, target fields, functions, and parameters can be extracted from the data. Based on the functions of the extracted data, specific transformations can be applied. The transformations can reformat the data for specific downstream uses. Users may be able to select specific formats for downstream uses. Third, processing platform 160 can perform data routing. For example, processing platform 160 can select the shortest and / or most reliable path to send data to the appropriate receiving layer (e.g., receiving layer 170 and / or receiving layer 171).

[0069] In one example, processing platform 160 can be configured to transmit a specific dataset to the data receiving layer. For instance, processing platform 160 can receive input variables from a specific artificial intelligence module 180. Processing platform 160 can then filter the data received from front-end gateway processor 140 and transmit only the data relevant to the input variables of the artificial intelligence module 180 to the data receiving layer.

[0070] Data pipeline system 100 may include one or more data receiving layers (e.g., data receiving layer 170 and data receiving layer 171). Fault event data 102 processed from processing platform 160 may be transferred to and stored in data receiving layer 170. In one example, data receiving layer 171 may be externally stored on a server of a specific client. Data receiving layers 170 and 171 may be implemented using software such as, but not limited to, PostgreSQL, HIVE, Kafka, OpenSearch, and Neo4j. Data receiving layer 170 may receive internal data 103 that has been processed and received from processing platform 160. Data receiving layer 171 may receive third-party data 199 that has been processed and received from processing platform 160. Data receiving layers may be configured to transfer fault event data 102 to artificial intelligence module 180. Data receiving layers may be data lakes, data warehouses, or cloud storage systems. Each data receiving layer may be configured to store fault event data 102 in either a structured or unstructured format. Data receiving layer 170 can store fault event data 102 in several different formats. For example, data receiving layer 170 can support data formats such as JavaScript Object Notation (JSON), comma-separated values ​​(CSV), Avro, Optimized Deterministic Representation (ORC), Hypertext Markup Language (HTML), Extensible Markup Language (XML), or Parquet. The data receiving layer (e.g., data receiving layer 170 or data receiving layer 171) can be accessed by one or more individual components. For example, the data receiving layer can be accessed by an unstructured query language (“NoSQL”) database management system (e.g., a Cassandra cluster), a graph database management system (e.g., a Neo4j cluster), additional processing programs (e.g., a Kafka + Flink program), and a relational database management system (e.g., a Postgres cluster). Therefore, additional processing can be performed on the processed data before it is received by the artificial intelligence module 180.

[0071] The data pipeline system 100 may include an artificial intelligence module 180. The artificial intelligence module 180 may include machine learning components. The artificial intelligence module 180 may use the received data to train and / or use machine learning models. The machine learning model may, for example, be a neural network. However, it should be noted that other machine learning techniques and frameworks may be used by the artificial intelligence module 180 to perform the methods contemplated by this disclosure. For example, the system and methods may be implemented using other types of supervised and unsupervised machine learning techniques, such as regression problems, random forests, clustering algorithms, principal component analysis (PCA), reinforcement learning, or combinations thereof. The artificial intelligence module 180 may be configured to extract and receive data from the data receiving layer 170.

[0072] Figure 2 An exemplary block diagram of a system 200 for determining historically similar failure events, according to one or more embodiments, is depicted. System 200 may include a data ingestion tool 202, a natural language processing-based platform 204, and an output interface 206. As will be described in more detail below, system 200 may be used to receive failure events (e.g., data objects indicating the occurrence of failure events) and corresponding metadata as input, and output a list of historically similar failure events (e.g., a list of historical data objects indicating the occurrence of previous failure events). System 200 may be comprised of... Figure 1 The data pipeline system 100 may be implemented by various aspects of the data pipeline system 100 or by any computer processing system such as device 900.

[0073] System 200 may include a data ingestion tool 202. According to an exemplary embodiment, data ingestion tool 202 may refer to a process and system for facilitating the transfer of fault events and fault event data (e.g., data objects containing data related to the fault event) to various tools, modules, components, and devices for identifying historically similar fault events. Data ingestion tool 202 may be configured to receive metadata of historical (e.g., previous) fault events and current fault events. For example, data ingestion tool 202 may include an application programming interface (API) configured to receive fault event data (both historical and current fault event data), which may include fault event reports, where information for each fault event is provided including one or more of fault event number, closure date / time, category, closure code, closure notes, detailed description, brief description, root cause, or assignment group. Fault event data may include fault event reports, where information for each fault event is provided including one or more of problem keywords, description, summary, tags, problem type, fix version, environment, author, or comment. Fault event data may include fault event reports, where information for each fault event is provided, including one or more of the following: filename, script name, script type, script description, display identifier, message, committer type, committer link, attributes, file change or branch information. Fault event data may include one or more of the following: real-time data, market data, performance data, historical data, utilization data, infrastructure data, or security data. Fault event data may also include information about which configurable has a specific fault event. For example, the following attributes may be included in the fault event report and configurable metadata: Configuration Management Database Configurable Item (CMDB_CI) ID (also known as sys_ID), CMDB_CI name, description (from which the topic can be identified), and Knowledge Base (KB) articles associated with each fault event. In some examples, a single fault event may include one or more configurables attached to that single fault event. These are merely examples of information that can be used as data, and this disclosure is not limited to these examples.

[0074] A CMDB_CI ID can be an identifier for a specific configurable item in the database. A CMDB_CI name can be a name used to describe a specific configurable item in the database. A topic can be a term describing a fault event. The topic of a fault event can be generated by an algorithm, as discussed further below. A knowledge base article is a document that provides articles, FAQs, guides, or troubleshooting suggestions for a specific fault event. Knowledge base articles can be manually entered for a specific fault event.

[0075] The fault event data received at data ingestion tool 202 may, for example, come from... Figure 1The data receiving layer 170. Data ingestion tool 202 can, for example, extract attributes from a configuration management database (CMDB) that can be located within the data receiving layer 170. Data ingestion tool 202 can further extract data from the configuration master table. The data received by data ingestion tool 202 may have been previously processed and filtered to provide only fault event data. The data can be further filtered to provide fault events from specific external systems. In another example, fault event data can be received via an electronic network such as the Internet via one or more computers, servers, and / or handheld mobile devices. The electronic network can be connected to provide fault event data automatically generated by external systems through monitoring tools that generate alerts and fault event data to provide notifications of high-risk actions, faults in the IT environment, and can be generated as work orders.

[0076] System 200 may further include a natural language processing (NLP)-based platform 204. In one example, the NLP-based platform 204 or components thereof may be implemented by an artificial intelligence module 180. The NLP-based platform 204 may be configured to receive new failure event data (e.g., via data ingestion tool 202) and utilize each of its components (e.g., the same configurable (CI) module 208, the similar configurable (CI) module 210, the similar topic module 212, the similar knowledge base (KB) article module 214, and the full similarity type module 216) to determine a list of historically similar failure events, which may be saved and then output (e.g., via output interface 206). The NLP-based platform 204 may further include one or more storage devices 218. The one or more storage devices 218 may be configured to store historical failure event data, new failure event data, and the determined list of historically similar failure event data.

[0077] The natural language processing-based platform 204 may further include a server system. The server system may also include processing devices for processing data stored in one or more storage devices 218. The server system may further include one or more machine learning tools or capabilities (e.g., implemented by the same CI module 208, similar CI module 210, similar topic module 212, similar KB article module 214, and full similarity type module 216).

[0078] In one example, one or more of the same CI module 208, similar CI module 210, similar topic module 212, similar KB article module 214, and full similarity type module 216 can be located on one or more separate computing devices accessible by system 200.

[0079] For example, the same CI module 208, similar CI module 210, similar topic module 212, similar KB article module 214, and full similarity type module 216 can each determine a list of historically similar fault events. These individual lists can then be filtered and combined by the natural language processing-based platform 204. The same CI module 208, similar CI module 210, similar topic module 212, similar KB article module 214, and full similarity type module 216 can examine and determine historical data within a specific time frame. For example, the modules can search all historical fault event data that occurred within the past year of the received fault event data. In some examples, the date range of the historical data can be predefined (e.g., within the past month, year, two years, five years, ten years, etc.). In some examples, the user may be able to input the date range of the historical data to be examined. For example, the user may be able to input that the currently received fault event data should only be compared with historical data received within the past six months.

[0080] For example, the same CI module 208 and the similar CI module 210 can identify similar historical fault events based on the configurable name (and more specifically, ID and name) associated with each interest. For both modules, the natural language processing-based platform 204 can search for similar fault events by identifying relevant configurables. The relevant configurables and the fault events that occurred can be output and saved. In contrast, the similar topic module 212, the similar KB article module 214, and the full similarity type module 216 can identify similar fault events based on extracting and analyzing the summary text associated with each received fault event among the received fault events.

[0081] The natural language processing-based platform 204 may include the same CI module 208. Fault events may occur for a specific configurable. A configurable may refer to (1) a product; (2) an allocation component of a product; or (3) a system that fulfills end-use functions, has different requirements, has functional and / or product relationships, and is designated for different controls in the configuration management system. When a fault event occurs, it may occur for a specific configurable. The same CI module 208 may be configured to, when a new fault event occurs, examine the configurable ID (also known as sys_ID) of the new fault event and compare the new fault event's configurable ID with all historical fault events and their corresponding configurable IDs (e.g., CI_ID metadata associated with each historical fault event). A matching algorithm may be applied to identify all past configurables with the same ID. This list of matching IDs and corresponding configurables and their corresponding fault events may be saved (e.g., saved to one or more storage devices 218). Furthermore, when saving, the matching ID may be recorded as saved.

[0082] Figure 5A An exemplary table 502, output from the same CI module 208 according to one or more embodiments, is depicted. As shown, the exemplary ID number (depicted as CMDB_CI) can be the same for the three identified configurable items. These three configurable items can be stored for future output.

[0083] The natural language processing-based platform 204 may include a similar CI module 210. The similar CI module 210 can extract the “CI name” of the configurable item for the received fault event and compare it with the CI names of all historical fault events and their corresponding configurable items. For example, the similar CI module 210 can utilize natural language processing (NLP) techniques to determine similar CI names. For instance, a text matching algorithm can extract the configurable item names of all past fault events and compare the past names with the configurable item names of the received fault event. Alternatively, regular expression (Regex), word segmentation, and / or fuzzy keyword algorithms can be applied to determine similar “CI names” of configurable items from the received fault event compared to historical fault events. This list can represent names that are slightly different from the searched names. For example, if the configurable item name is “e-banking,” these algorithms can search and determine configurable item names such as “E-banking,” “e-banking,” and / or “electronic banking.” The determined list can then be saved (e.g., to one or more storage devices 218).

[0084] Figure 5B An exemplary table 504, output from the similar CI module 210 according to one or more embodiments, is depicted. As shown, if an exemplary fault event occurs for configurable A, configurable A may have the name "APP ACBPRD ACB ACBS – CBS". The similar CI module 210 may (e.g., by using a fuzzy algorithm) identify two listed configurables, including the names "ACBS API" and "App ACB PRD ACB ACBS – CIBU". These configurables and their corresponding fault events can then be extracted and stored for later output.

[0085] The natural language processing-based platform 204 may include a similar topic module 212. The similar topic module 212 can compare summaries / descriptions associated with a failure event and compare them with summaries / descriptions of historical failure events. For example, the similar topic module 212 may utilize a natural language processing (NLP) model that classifies the text summary of each failure event in the text summary of the failure events into a predefined number of topics. The NLP model may utilize linear discriminant analysis (LDA) 302 (such as... Figure 3 (as depicted) and / or Gibbs-sampled Dirichlet hybrid model (GSDMM) 304 (as Figure 3 The LDA 302 and GSDMM 304 can both be unsupervised models trained to extract topics from the description of the fault event (e.g., from a brief description). The similar topic module 212 can, for example, identify natural topic groups that may not be externally grouped together.

[0086] LDA 302 may be preferred for longer text descriptions, for example. LDA 302 can identify the distribution of topics that fill each text description, and the distribution of words that each topic can cover. LDA 302 can return the probability distribution of the percentage contribution of each topic to the document (e.g., .0.3*topic_1, .0.7*topic_2).

[0087] GSDMM 304 can be an "extended LDA algorithm". GSDMM 304 may be preferred for fault events with short text descriptions (e.g., text of less than 100 characters). GSDMM 304 may, for example, process only a single topic (as opposed to LDA 302, which is used to determine multiple topics).

[0088] All newly received fault events (e.g., fault events received at data ingestion tool 202) can then be clustered using LDA 302 or GSDMM 304 of the similar topic module 212. These algorithms can be executed, for example, by a background scheduler (e.g., a workflow management platform, such as Apache Airflow), which can be implemented by the system 200 described herein. The identified topics can be referred to as topic clusters. Each received fault event can be grouped into one or more identified topic clusters. Topic clusters can be created once a set amount of data has been received (e.g., after a set amount of fault event data has been received by the system). For example, topic clusters can be determined after a set amount of data has been received, and the model can be retrained to create new topic clusters after a set time period or a set amount of fault events has been received. The identified clustered fault event groups can be ranked based on a similarity score applied to each text description of the historical fault events. Using a cutoff value, the record with the highest similarity score can be returned via API. Thus, the similar topic module 212 can output and save all historical fault events that have the same topic clusters associated with the received fault events.

[0089] Figure 5C and Figure 5D For example, it could be Tables 506 and 508 output from a similar CI module 210 according to one or more implementation schemes. Figure 5C In the example, the description of the fault event could be "1254: 'GP4 / CP2 – EMEA – SBK – SBK_FRCSBIM missing KPI because L-OVNT – SBK – DA – FFFPFL is stuck'". This description can then be analyzed by the similar topic module 210 and assigned to a specific topic. Figure 5C Includes a list of related failure events assigned to the same topic. Figure 5C A list of identified failure events assigned to the same topic is shown (e.g., a list of seven historically similar failure events shown in Table 5C).

[0090] exist Figure 5D In the process, the description of the fault event may include the phrase “Observe some waiting time between the EPS box and the Lune EFTS number:”, and then the description is analyzed by the similar CI module 210 and assigned to a specific topic. Figure 5D Includes a list of related failure events assigned to the same topic. Figure 5D An example is shown, displaying alternative descriptions and an assignment list of failure events with similarly defined themes.

[0091] The natural language processing-based platform 204 may include a similar KB article module 214. When a new major failure event is detected, a knowledge base (KB) article corresponding to the failure event can be created, containing a report related to the failure event. This KB article may be automatically generated by an internal system related to the failure event. The KB article may include suggestions based on the specific failure event. When a KB article is provided in response to a failure event, the user may further have the option to assign a rating based on the usefulness of the suggested KB article. This may be referred to as KB article rating. The KB article rating and the KB article itself may be received as input into the KB article module 214.

[0092] The Similar KB Article module 214 can utilize machine learning techniques to extract any listed KB articles from a specific failure event. For example, NLP can extract KB articles from the description of a failure event. The KB Article module 214 may have already extracted KB articles from all historical failure events. If a KB article for a new failure event is determined to match a historical failure event using a fuzzy or exact matching algorithm, the historical failure event can be saved / output.

[0093] Figure 5E For example, it could be a table output from a similar KB module 214 according to one or more implementations. A brief description of an exemplary input fault event "A" could be displayed above the table. This brief description of fault event A could include "KB article". KB module 214 may have already extracted the "KB article" from a brief description such as "GP4 / CP2". Figure 5E A list of fault events extracted from KB module 214 can be displayed, and these fault events can include the same KB article as fault event A in their respective brief descriptions.

[0094] The natural language processing-based platform 204 may include a full similarity type module 216. Full similarity type module 216 can be configured to combine all extracted fault events from the same CI module 208, similar CI module 210, similar topic module 212, and similar KB article module 214, and rank and output the final list or related historical fault events (e.g., via output interface 206). Full similarity type module 216 may first receive and store the list and fault events within each list from all modules (e.g., same CI module 208, similar CI module 210, similar topic module 212, and similar KB article module 214). In some examples, some other modules may output a list without historical fault events (e.g., similar KB article module 214 may determine that no other historical fault events have the same / similar KB articles).

[0095] For example, the full similarity type module 216 can assign weighted scores to fault events from each received module. For instance, fault events output by a specific module (e.g., by the same CI module 208, similar CI module 210, similar topic module 212, and / or similar KB article module 214) can be assigned higher weighted scores for their specific identified historical fault events. Specifically, a weighted sum of scores can be applied to each fault event identified by the same CI module 208, similar CI module 210, similar topic module 212, and / or similar KB article module 214. For example, a specific weight can be applied to each fault event identified by a specific module. This specific weight can be referred to as the initial weight. The full similarity type module 216 can develop a list of all fault events identified by the same CI module 208, similar CI module 210, similar topic module 212, and / or similar KB article module 214. Fault events identified by more than one module can be identified. An initial weighted score calculation can be performed on the list of all fault events. For example, an initial weighted score can be assigned to each fault event, depending on which module identified the fault event.

[0096] Next, a weighted average score can be determined for each identified fault event. If only a single module identifies the fault event, the initial weighted score can be the weighted average score. If more than one module identifies a particular identified fault event, the average weighted score can be a combination of the initial weighted scores assigned to that fault event. For example, the average weighted score can be a combination of initial weighted scores. For example, the average weighted score can be determined by multiplying the assigned multiplier by the initial score, and the scores can be combined (e.g., weighted average score = SUM(inc_score * initial weighted score)).

[0097] For example, if multiple modules identify a single fault event, multiple initial weights can be applied to that single fault event. For instance, if the same CI module 208 and the similar topic module 212 identify the same fault event A, fault event A can be assigned two initial weighted scores based on the identification module. For example, an initial weighted score 10 can be applied based on the same CI module 208 that identified fault event A, and another initial weighted score 40 can be applied based on the similar topic module 212 that identified fault event A. The single fault event A can then have scores that are combined to determine a weighted average score; thus, an exemplary fault event could have an average weighted score of 50.

[0098] Figure 6An exemplary table 600 depicts the weighted scores output from the full similarity type module 216 according to one or more implementations. Table 600 illustrates exemplary weight assignments for each of the modules, including the same CI module 208, similar CI module 210, similar topic module 212, and similar KB article module 214. For example, historical failure events identified by the same CI module can be assigned a weight of 10, historical failure events identified by the similar CI module 210 can be assigned a weight of 5, historical failure events identified by the similar topic module 212 can be assigned a weight of 45, and historical failure events identified by the similar KB article module 214 can be assigned a score of 40. Users may be able to access and adjust the weights of different modules as needed.

[0099] Figure 7 An exemplary table 700 is depicted, showing the output from the full similarity type module 216 according to one or more implementations. For example, the full similarity type module 216 may first extract and compile all lists determined by the same CI module 208, similar CI module 210, similar topic module 212, and similar KB article module 214. Next, the full similarity type module 216 may combine the lists and assign the specific modules that determined the fault event (e.g., same CI module 208, similar CI module 210, similar topic module 212, and similar KB article module 214) to a database (as depicted in column 702). In some examples, multiple modules may determine the same historical fault event. In these cases, multiple modules may be listed in column 702. Finally, the assigned weight for each fault event may be determined by the full similarity type module 216. The full similarity type module 216 may assign scores to the received historical fault events based on the assigned weight of each module. If the items in the list are determined by multiple modules, multiple weights may be included for a particular historical fault event. For example, if both the similar CI module 210 and the similar KB article module 214 identify the same historical failure event, then for that specific failure event, the assigned weights of the two modules will be added together. This combined weight can also be referred to as the final score, such as... Figure 7 As shown in column 704. The full similarity type module 216 can then determine the final list, in which the list of historical failure events is sorted based on the determined combined weights. This list can then be output to output interface 206.

[0100] Output interface 206 may include an application programming interface (API) configured to export a determined list of historically similar failure events (e.g., determined by full similarity type module 216). This may be output via an electronic network such as the Internet through one or more computers, servers, and / or handheld mobile devices. The determined list can then be accessed by a user via a computing device (e.g., computing device 900). Output interface 206 may also be configured to output graphs (e.g., exemplary graphs 502, 504, 506, 508, and 510) of the historical list determined by various modules of natural language processing platform 204.

[0101] Figure 4 An exemplary flowchart is depicted, according to one or more implementations, of how data can be processed by a natural language processing model to determine historically similar failure events in a system. Figure 4 The process described in the text can be derived from... Figure 1 Data pipeline system 100 and / or by Figure 2 The system 200 implementation.

[0102] At step 402, historical fault event data (e.g., historical data objects indicating the occurrence of previous fault events) from one or more systems may be received. Fault event data (both historical and new fault event data) may include fault event reports, where information for each fault event is provided as one or more of the following: fault event number, closure date / time, category, closure code, closure remarks, detailed description, brief description, root cause, or assignment group. Fault event data may include fault event reports, where information for each fault event is provided as one or more of the following: problem keywords, description, summary, tags, problem type, fix version, environment, author, or comment. Fault event data may include fault event reports, where information for each fault event is provided as one or more of the following: filename, script name, script type, script description, display identifier, message, submitter type, submitter link, attributes, file change, or branch information. Fault event data may include one or more of the following: real-time data, market data, performance data, historical data, utilization data, infrastructure data, or security data. For example, the following data may be included in a failure event report: Configuration Management Database_Configurable Item (cmdb_ci) ID, cmdb_ci name, and Knowledge Base (KB) articles associated with each failure event. In some examples, a single failure event may include multiple CIs attached to that single failure event. These are merely examples of information that can be used as data, and this disclosure is not limited to these examples.

[0103] Historical fault event data can be received within a set time period. For example, historical fault event data can be uploaded in batches (e.g., the system's stored fault event data can be uploaded within a set time period such as the past month, past year, or past ten years). Furthermore, as new fault events occur, this information can be saved and added to historical fault event data for future use by the system.

[0104] At step 404, new fault event data (e.g., a data object indicating the occurrence of a current fault event) may be received. This fault event data may be analyzed by the systems and methods described herein (e.g., system 200) to identify historically similar fault event data. For example, new fault event data may be automatically generated by external systems through monitoring tools that generate alerts and fault event data to provide notification of high-risk actions and faults in the IT environment. Fault event data may be generated as work orders (e.g., work orders that can be received by system 200). For example, the fault event received at step 404 and its corresponding data may be compared with all historical fault events and their corresponding data received at step 402. Fault event data may be automatically generated by monitoring tools that generate alerts and fault event data to provide notification of high-risk actions and faults in the IT environment, and may be generated as work orders. This can then be transmitted to the systems described herein (e.g., system 300).

[0105] At step 406, the system (e.g., system 200) may apply one or more natural language processing modules (e.g., same CI module 208, similar CI module 210, similar topic module 212, similar KB article module 214) to determine a list of historically similar failure events. The system may further compile the initially determined list into a single list (e.g., via full similarity type module 216). This compiled list may be further weighted and sorted based on which / which initial modules (e.g., same CI module 208, similar CI module 210, similar topic module 212, similar KB article module 214) determined the historical failure events.

[0106] At step 408, the sorted list can be output to one or more users. This list can be used for further analysis of the initially received failure event. For example, the list can be fed to external systems for further processing or given to individuals (e.g., IT experts analyzing the initially received failure event).

[0107] Figure 8 A flowchart 800 is depicted according to one or more implementation schemes for determining similar failure events in history.

[0108] At step 802, a data object indicating the occurrence of a current failure event associated with a configurable item may be received (e.g., by a natural language processing-based platform 204). The data object includes current failure event metadata, which includes a configurable item identifier (ID), a configurable item name, and a description of the current failure event.

[0109] At step 804, multiple historical data objects corresponding to multiple previous failure events can be received (e.g., by the natural language processing-based platform 204). Each of these historical data objects indicates the occurrence of a previous failure event and includes metadata about the previous failure event, including a configurable item ID, a configurable item name, and a description of the previous failure event. Multiple historical data objects can be received over a predetermined time period. Knowledge base (KB) articles and topics can be extracted using the natural language processing module from each of the descriptions of the current failure event and the descriptions of previous failure events. The natural language processing module can utilize either a linear discriminant analysis algorithm or a Gibbs-sampled Dirichlet mixture model algorithm to extract topics.

[0110] At step 806, one or more historical data objects similar to the data object can be identified from among multiple historical data objects based on a comparison between the current fault event metadata and the metadata of previous fault events. Identifying one or more historical data objects may further include applying a natural language processing algorithm to the data object and the multiple historical data objects.

[0111] At step 808, a score may be generated for each of one or more historical data objects based on a comparison of the current failure event metadata with the metadata of previous failure events. Generating a score for each of the one or more historical data objects may include: determining a first list of historical data objects based on the similarity between the configurable item ID of the current failure event and the configurable item ID of each of the previous failure events; determining a second list of historical data objects based on the similarity between the configurable item name of the current failure event and the configurable item name of each of the previous failure events; determining a third list of historical data objects based on the similarity between the topic of the current failure event and the topic of each of the previous failure events; and determining a fourth list of historical data objects based on the similarity between the KB articles of the current failure event and the KB articles of each of the previous failure events. Generating a score for each of the one or more historical data objects may include: assigning one or more initial scores to each of the one or more historical data objects based on whether the historical data object is determined to be in the first, second, third, and / or fourth list. Generating a score for each of one or more historical data objects may include: assigning a weighted average score to each of the one or more historical data objects, wherein if a historical data object is only in one of a first list, a second list, a third list, and a fourth list, the weighted average score is the initial score, and when a historical data object is in two or more of the first, second, third, and fourth lists, the weighted average score is a combination of the initial scores, which is the score generated for each of the one or more historical data objects. The one or more historical data objects may be included in a sorted list that combines the first, second, third, and fourth lists, and the one or more historical data objects are sorted based on the corresponding one or more weighted average scores.

[0112] At step 810, one or more historical data objects similar to the data object may be output to the user via a graphical user interface (GUI) (e.g., via output interface 206).

[0113] Figure 9 A computer system 900 for performing the techniques described herein is illustrated according to one or more embodiments of the present disclosure.

[0114] In addition to standard desktop computers or servers, any computer system capable of meeting the required storage and processing needs will be suitable for implementing embodiments of this disclosure, which are entirely within the scope of this disclosure. This can include tablet devices, smartphones, keypad devices, and any other computer devices, whether mobile or even distributed over a network (i.e., cloud-based).

[0115] Unless otherwise specifically stated, it should be understood from the following discussion that throughout the specification, the use of terms such as “processing,” “calculation,” “operation,” “determine,” “analysis,” etc., refers to the actions and / or processes of a computer or computing system or similar electronic computing device that manipulate data represented as physical quantities such as electronic quantities and / or convert that data into other data similarly represented as physical quantities.

[0116] In a similar manner, the term "processor" can refer to any device or part of a device that processes electronic data, for example, from registers and / or memory, to convert that electronic data into other electronic data, for example, that can be stored in registers and / or memory. "Computer," "computing machine," "computing platform," "computing device," or "server" can include one or more processors.

[0117] Figure 9 A computer system, denoted by 900, is illustrated. Computer system 900 may include a set of instructions that can be executed to cause computer system 900 to perform any one or more methods or computer-based functions disclosed herein. Computer system 900 may operate as a stand-alone device or may be connected to other computer systems or peripheral devices, for example, using a network.

[0118] In a networked deployment, computer system 900 can operate as a server or as a client user computer in a server-client user network environment, or as a peer-to-peer computer system in a point-to-point (or distributed) network environment. Computer system 900 can also be implemented as or incorporated into various devices, such as personal computers (PCs), tablet PCs, set-top boxes (STBs), personal digital assistants (PDAs), mobile devices, handheld computers, laptop computers, desktop computers, communication equipment, wireless telephones, landline telephones, control systems, cameras, scanners, fax machines, printers, pagers, personal trusted devices, web appliances, network routers, switches, or bridges, or any other machine capable of (sequentially or otherwise) executing a set of instructions specifying the actions to be taken by that machine. In certain implementations, computer system 900 can be implemented using electronic devices that provide voice, video, or data communications. Furthermore, while a single computer system 900 is illustrated, the term "system" should also be understood to include any collection of systems or subsystems that individually or jointly execute one or more sets of instructions to perform one or more computer functions.

[0119] like Figure 9 As illustrated, computer system 900 may include processor 902, such as a central processing unit (CPU), graphics processing unit (GPU), or both. Processor 902 can be a component in a variety of systems. For example, processor 902 may be part of a standard personal computer or workstation. Processor 902 may be one or more general-purpose processors, digital signal processors, application-specific integrated circuits (ASICs), field-programmable gate arrays (FPGAs), servers, networks, digital circuits, analog circuits, combinations thereof, or other devices now known or later developed for analyzing and processing data. Processor 902 may implement software programs, such as manually generated (i.e., programmed) code.

[0120] Computer system 900 may include memory 904 communicatable via bus 908. Memory 904 may be main memory, static memory, or dynamic memory. Memory 904 may include, but is not limited to, computer-readable storage media, such as various types of volatile and non-volatile storage media, including but not limited to random access memory, read-only memory, programmable read-only memory, electrically programmable read-only memory, electrically erasable read-only memory, flash memory, magnetic tape or disk, optical media, etc. In one embodiment, memory 904 includes a cache or random access memory for processor 902. In alternative embodiments, memory 904 is decoupled from processor 902, such as processor cache memory, system memory, or other memory. Memory 904 may be an external storage device or database for storing data. Examples include hard disk drives, optical discs (“CDs”), digital video discs (“DVDs”), memory cards, memory sticks, floppy disks, universal serial bus (“USB”) storage devices, or any other device operable to store data. Memory 904 is operable to store instructions executable by processor 902. The functions, actions, or tasks illustrated in the accompanying drawings or described herein can be executed by a programmed processor 902 that executes instructions stored in memory 904. The functions, actions, or tasks are independent of a particular type of instruction set, storage medium, processor, or processing strategy, and can be executed by software, hardware, integrated circuits, firmware, microcode, etc., operating individually or in combination. Similarly, processing strategies can include multiprocessing, multitasking, parallel processing, etc.

[0121] As shown, the computer system 900 may further include a display unit 910, such as a liquid crystal display (LCD), an organic light-emitting diode (OLED), a flat panel display, a solid-state display, a cathode ray tube (CRT), a projector, a printer, or other display device now known or later developed for outputting the determined information. The display 910 may serve as an interface for a user to view the functions of the processor 902, or specifically as an interface to software stored in memory 904 or in drive unit 906.

[0122] Alternatively or additionally, the computer system 900 may include an input device 912 configured to allow a user to interact with any component of the system 900. The input device 912 may be a numeric keypad, keyboard, or cursor control device such as a mouse or joystick, touchscreen display, remote control, or any other device operable to interact with the computer system 900.

[0123] Computer system 900 may also or alternatively include a disk or optical disk drive unit 906. Disk drive unit 906 may include a computer-readable medium 922 in which one or more sets of instructions 924, such as software, may be embedded. Furthermore, the instructions 924 may embody one or more methods or logic as described herein. The instructions 924 may reside wholly or partially within memory 904 and / or processor 902 during execution by computer system 900. Memory 904 and processor 902 may also include computer-readable media as discussed above.

[0124] In some systems, computer-readable medium 922 includes instructions 924, or receives and executes instructions 924 in response to a propagated signal, enabling devices connected to network 970 to transmit voice, video, audio, images, or any other data through network 970. Furthermore, instructions 924 may be transmitted or received on network 970 via communication port or interface 920 and / or using bus 908. Communication port or interface 920 may be part of processor 902 or may be a separate component. Communication port 920 may be created in software or may be a physical connection in hardware. Communication port 920 may be configured to connect to network 970, external media, display 910, or any other component or combination thereof in system 900. Connection to network 970 may be a physical connection, such as a wired Ethernet connection, or may be established wirelessly as discussed below. Similarly, additional connections to other components of system 900 may be physical connections or may be established wirelessly. Network 970 may alternatively be directly connected to bus 908.

[0125] Although computer-readable medium 922 is shown as a single medium, the term "computer-readable medium" can include single or multiple media, such as centralized or distributed databases, and / or associated caches and servers storing one or more sets of instructions. The term "computer-readable medium" can also include any medium capable of storing, encoding, or carrying a set of instructions for execution by a processor or causing a computer system to perform any one or more methods or operations disclosed herein. Computer-readable medium 922 can be non-transitory and can be tangible.

[0126] Computer-readable medium 922 may include solid-state memory, such as a memory card or other package housing one or more non-volatile read-only memories. Computer-readable medium 922 may be random access memory or other volatile rewritable memory. Additionally or alternatively, computer-readable medium 922 may include magneto-optical or optical media, such as a disk or magnetic tape or other storage device, to capture carrier signals, such as signals communicated via a transmission medium. Digital file attachments to emails or other self-contained information archives or sets of archives can be considered as distribution media as tangible storage media. Therefore, this disclosure is considered to include any and more of computer-readable media or distribution media in which data or instructions may be stored, as well as other equivalents and subsequent media.

[0127] In alternative implementations, dedicated hardware implementations, such as application-specific integrated circuits (ASICs), programmable logic arrays (PLA), and other hardware devices, can be configured to implement one or more methods described herein. Applications that can include various implementations of the apparatus and systems can broadly encompass a wide range of electronic and computer systems. One or more implementations described herein can be implemented using two or more specific interconnected hardware modules or devices with associated control and data signals that can communicate between and through modules, or as parts of an ASIC. Therefore, the systems of the present invention encompass software, firmware, and hardware implementations.

[0128] Computer system 900 can be connected to one or more networks 970. Network 970 can define one or more networks, including wired or wireless networks. Wireless networks can be cellular telephone networks, 802.11, 802.16, 802.20, or WiMAX networks. Furthermore, such networks can include public networks such as the Internet, private networks such as intranets, or combinations thereof, and can utilize various network protocols now available or developed in the future, including but not limited to TCP / IP-based network protocols. Network 970 can include wide area networks (WANs), local area networks (LANs), campus networks, metropolitan area networks, such as the Internet, direct connections via universal serial bus (USB) ports, or any other network that allows data communication. Network 970 can be configured to couple one computing device to another to enable data communication between the devices. Network 970 can generally be enabled to use any form of machine-readable medium for transferring information from one device to another. Network 970 can include communication methods through which information can be transferred between computing devices. Network 970 can be divided into subnets. A subnet can allow access to all other components connected to it, or a subnet can restrict access between components. Network 970 can be considered a public network connection or a private network connection, and can include, for example, a virtual private network or encryption or other security mechanisms employed over the public Internet.

[0129] According to various embodiments of this disclosure, the methods described herein can be implemented by a computer system executable software program. Furthermore, in exemplary non-limiting embodiments, implementations may include distributed processing, component / object distributed processing, and parallel processing. Alternatively, virtual computer system processing may be configured to implement one or more methods or functions as described herein.

[0130] Although this specification describes components and functions that can be implemented in specific embodiments with reference to particular standards and protocols, this disclosure is not limited to such standards and protocols. For example, standards for transmission on the Internet and other packet-switched networks (e.g., TCP / IP, UDP / IP, HTML, HTTP, etc.) represent examples of the prior art. Such standards are periodically superseded by faster or more efficient equivalents with substantially the same functionality. Therefore, alternative standards and protocols with the same or similar functionality as those disclosed herein are considered their equivalents.

[0131] It will be understood that, in one embodiment, the steps of the method described are performed by a suitable processor (or processors) of a processing (i.e., computer) system that executes instructions (computer-readable code) stored in a storage device. It will also be understood that the disclosed embodiments are not limited to any particular implementation method or programming technique, and that the disclosed embodiments can be implemented using any suitable technique for implementing the functions described herein. The disclosed embodiments are not limited to any particular programming language or operating system.

[0132] It should be understood that in the above description of exemplary embodiments, various features of the embodiments are sometimes combined together in a single embodiment, figure, or description thereof for the purpose of simplifying this disclosure and aiding in the understanding of one or more of the various inventive aspects. However, this approach of the disclosure should not be construed as reflecting an intention that the claimed embodiments require more features than expressly recited in each claim. Rather, as reflected in the appended claims, inventive aspects lie in fewer than all features of a single foregoing disclosed embodiment. Therefore, the claims following the detailed description are expressly incorporated herein by reference, wherein each claim is itself a separate embodiment.

[0133] Furthermore, while some embodiments described herein include features included in other embodiments, they do not include other features included in other embodiments. However, combinations of features from different embodiments are intended to be within the scope of this disclosure and form different embodiments, as will be understood by those skilled in the art. For example, in the appended claims, any claimed embodiment may be used in any combination.

[0134] Furthermore, this document describes some embodiments as methods or combinations of elements of methods that can be implemented by a processor of a computer system or by other means that perform the function. Thus, a processor having the necessary instructions for performing the elements of such methods forms a means for performing the elements of the method. Moreover, the elements of the apparatus embodiments described herein are examples of means for performing functions performed by those elements for the purpose of performing a function.

[0135] Many specific details are set forth in the description provided herein. However, it should be understood that embodiments of this disclosure can be practiced without these specific details. In other instances, well-known methods, structures, and techniques have not been shown in detail so as not to obscure the understanding of this description.

[0136] Similarly, it should be noted that the term "coupling," when used in the claims, should not be construed as limited to direct connection. The terms "coupling" and "connection" may be used together with their derivatives. It should be understood that these terms are not intended to be synonyms with each other. Therefore, the scope of expressing device A coupled to device B should not be limited to devices or systems in which the output of device A is directly connected to the input of device B. This implies the existence of a path between the output of A and the input of B, which may include other devices or components. "Coupling" can mean two or more elements in direct physical or electrical contact, or two or more elements not in direct contact but still cooperating or interacting with each other.

[0137] Therefore, while preferred embodiments considered to be present in this disclosure have been described, those skilled in the art will recognize that other and additional modifications may be made to this disclosure without departing from its spirit, and all such variations and modifications falling within the scope of this disclosure are intended to be claimed. For example, any formula given above is merely representative of the processes that may be used. Functions may be added to or removed from the block diagrams, and operations may be interchanged between function blocks. Steps may be added to or removed from the methods described within the scope of this disclosure.

[0138] The subject matter disclosed above should be considered illustrative rather than restrictive, and the appended claims are intended to cover all such modifications, enhancements, and other implementations falling within the true spirit and scope of this disclosure. Therefore, to the fullest extent permitted by law, the scope of this disclosure is determined by the broadest permissible interpretation of the appended claims and their equivalents, and should not be restricted or limited by the foregoing detailed description. While various embodiments of this disclosure have been described, it will be apparent to those skilled in the art that further embodiments and implementations are possible within the scope of this disclosure. Therefore, this disclosure is not limited except as provided in the appended claims and their equivalents.

Claims

1. A computer-implemented method for searching for historically similar failure events in a system, the method comprising: Receive a data object indicating the occurrence of a current failure event associated with a configurable item, the data object including current failure event metadata, the current failure event metadata including a configurable item identifier (ID), a configurable item name, and a description of the current failure event; Receive multiple historical data objects corresponding to multiple previous failure events, each of the multiple historical data objects indicating the occurrence of a previous failure event and including previous failure event metadata, the previous failure event metadata including a configurable item ID, a configurable item name and a description of the previous failure event; Based on the comparison between the current fault event metadata and the previous fault event metadata, one or more historical data objects similar to the data object are identified from among the plurality of historical data objects; Based on the comparison between the current fault event metadata and the previous fault event metadata, a score is generated for each of the one or more historical data objects; as well as The system outputs one or more historical data objects that are similar to the data object to the user via a graphical user interface (GUI).

2. The computer-implemented method of claim 1, wherein determining the one or more historical data objects further comprises applying a natural language processing algorithm to the data objects and the plurality of historical data objects.

3. The computer-implemented method according to claim 1, wherein the plurality of historical data objects are received during a predetermined time period.

4. The computer-implemented method according to claim 1, further comprising: The natural language processing module is used to extract knowledge base (KB) articles and topics from each of the descriptions of the current failure event and the descriptions of the previous failure events.

5. The computer-implemented method according to claim 4, wherein the natural language processing module uses a linear discriminant analysis algorithm or a Gibbs sampling Dirichlet mixture model algorithm to extract the topic.

6. The computer-implemented method of claim 5, wherein generating the score for each of the one or more historical data objects comprises: A first list of historical data objects is determined based on the similarity between the configurable item ID of the current failure event and the configurable item ID of each of the previous failure events. A second list of historical data objects is determined based on the similarity between the configurable item name of the current failure event and the configurable item name of each of the previous failure events. A third list of historical data objects is determined based on the similarity between the topic of the current failure event and the topic of each of the previous failure events. as well as A fourth list of historical data objects is determined based on the similarity between the KB article of the current failure event and the KB article of each of the previous failure events.

7. The computer-implemented method of claim 6, wherein generating the score for each of the one or more historical data objects further comprises: Based on whether the historical data object is determined to be in the first list, second list, third list, and / or fourth list, one or more initial ratings are assigned to each of the one or more historical data objects.

8. The computer-implemented method of claim 7, wherein generating the score for each of the one or more historical data objects further comprises: A weighted average score is assigned to each of the one or more historical data objects, wherein if the historical data object is only in one of the first, second, third, and fourth lists, the weighted average score is an initial score, and when the historical data object is in two or more of the first, second, third, and fourth lists, the weighted average score is a combination of the initial scores, and the weighted average score is the score generated for each of the one or more historical data objects.

9. The computer-implemented method of claim 8, wherein the one or more historical data objects are included in a sorted list, the sorted list combining the first list, the second list, the third list, and the fourth list, and the one or more historical data objects are sorted based on corresponding one or more weighted average scores.

10. A system for searching for historically similar failure events in a system, the system comprising: A memory in which processor-readable instructions are stored; and At least one processor, configured to access the memory and execute processor-readable instructions to perform operations, said operations including: Receive a data object indicating the occurrence of a current failure event associated with a configurable item, the data object including current failure event metadata, the current failure event metadata including a configurable item identifier (ID), a configurable item name, and a description of the current failure event; Receive multiple historical data objects corresponding to multiple previous failure events, each of the multiple historical data objects indicating the occurrence of a previous failure event and including previous failure event metadata, the previous failure event metadata including a configurable item ID, a configurable item name and a description of the previous failure event; Based on the comparison between the current fault event metadata and the previous fault event metadata, one or more historical data objects similar to the data object are identified from among the plurality of historical data objects; Based on a comparison between the current fault event metadata and the previous fault event metadata, a score is generated for each of the one or more historical data objects; and The system outputs one or more historical data objects that are similar to the data object to the user via a graphical user interface (GUI).

11. The system of claim 10, wherein determining the one or more historical data objects further comprises applying a natural language processing algorithm to the data objects and the plurality of historical data objects.

12. The system of claim 10, wherein the plurality of historical data objects are received during a predetermined time period.

13. The system of claim 10, further comprising: The natural language processing module is used to extract knowledge base (KB) articles and topics from each of the descriptions of the current failure event and the descriptions of the previous failure events.

14. The system of claim 13, wherein the natural language processing module uses a linear discriminant analysis algorithm or a Gibbs sampling Dirichlet mixture model algorithm to extract the topic.

15. The system of claim 14, wherein generating the score for each of the one or more historical data objects comprises: A first list of historical data objects is determined based on the similarity between the configurable item ID of the current failure event and the configurable item ID of each of the previous failure events. A second list of historical data objects is determined based on the similarity between the configurable item name of the current failure event and the configurable item name of each of the previous failure events. A third list of historical data objects is determined based on the similarity between the topic of the current failure event and the topic of each of the previous failure events. as well as A fourth list of historical data objects is determined based on the similarity between the KB article of the current failure event and the KB article of each of the previous failure events.

16. The system of claim 15, wherein generating the score for each of the one or more historical data objects further comprises: Based on whether the historical data object is determined to be in the first list, second list, third list, and / or fourth list, one or more initial ratings are assigned to each of the one or more historical data objects.

17. The system of claim 16, wherein generating the score for each of the one or more historical data objects further comprises: A weighted average score is assigned to each of the one or more historical data objects, wherein if the historical data object is only in one of the first, second, third, and fourth lists, the weighted average score is an initial score, and when the historical data object is in two or more of the first, second, third, and fourth lists, the weighted average score is a combination of the initial scores, and the weighted average score is the score generated for each of the one or more historical data objects.

18. The system of claim 17, wherein the one or more historical data objects are included in a sorted list, the sorted list combining the first list, the second list, the third list, and the fourth list, and the one or more historical data objects are sorted based on a corresponding one or more weighted average scores.

19. A non-transitory computer-readable medium storing processor-readable instructions, which, when executed by at least one processor, cause the at least one processor to perform an operation, the operation comprising: Receive a data object indicating the occurrence of a current failure event associated with a configurable item, the data object including current failure event metadata, the current failure event metadata including a configurable item identifier (ID), a configurable item name, and a description of the current failure event; Receive multiple historical data objects corresponding to multiple previous failure events, each of the multiple historical data objects indicating the occurrence of a previous failure event and including previous failure event metadata, the previous failure event metadata including a configurable item ID, a configurable item name and a description of the previous failure event; Based on the comparison between the current fault event metadata and the previous fault event metadata, one or more historical data objects similar to the data object are identified from among the plurality of historical data objects; Based on the comparison between the current fault event metadata and the previous fault event metadata, a score is generated for each of the one or more historical data objects; as well as The system outputs one or more historical data objects that are similar to the data object to the user via a graphical user interface (GUI).

20. The non-transitory computer-readable medium of claim 19, wherein determining the one or more historical data objects further comprises applying a natural language processing algorithm to the data objects and the plurality of historical data objects.