Method and apparatus for determining anomalous application in microservice cluster, device, and medium

By collecting and analyzing log data from the microservice cluster, generating log templates and detecting differences, and combining call relationships to automatically locate abnormal applications, the high cost of traditional tracing systems is solved, enabling rapid fault location and efficient operation and maintenance.

WO2026118950A1PCT designated stage Publication Date: 2026-06-11CHINA TELECOM CLOUD TECH CO LTD

Patent Information

Authority / Receiving Office
WO · WO
Patent Type
Applications
Current Assignee / Owner
CHINA TELECOM CLOUD TECH CO LTD
Filing Date
2025-11-25
Publication Date
2026-06-11

AI Technical Summary

Technical Problem

In the operation and maintenance of medium and large-scale microservice application clusters, it is difficult for operation and maintenance personnel to quickly locate the root cause of failures in complex business scenarios. Traditional link tracing systems are costly to build and rely on front-line R&D personnel, resulting in low efficiency in fault handling.

Method used

By collecting application log data from the microservice cluster, generating historical and current log templates, analyzing the differences, and automatically detecting abnormal applications based on call relationships, a list of probability of abnormal root causes is generated, reducing reliance on front-line R&D personnel.

🎯Benefits of technology

It enables rapid location of the root cause of microservice cluster anomalies, reduces fault handling time and manpower investigation costs, and improves operation and maintenance efficiency.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN2025137494_11062026_PF_FP_ABST
    Figure CN2025137494_11062026_PF_FP_ABST
Patent Text Reader

Abstract

Embodiments of the present application provide a method and apparatus for determining an anomalous application in a microservice cluster, a device, and a medium. The method comprises: acquiring historical log data and current anomalous log data of applications, and an anomalous request of accessing a microservice cluster via an ingress gateway; generating at least one historical log template on the basis of the historical log data of the applications, and generating at least one current log template on the basis of the current anomalous log data of the applications; comparing the historical log template with the current log template to obtain differences of the applications; acquiring a target application called by the anomalous request and the difference of the target application; and determining an anomalous application in the microservice cluster on the basis of the difference of the target application. By means of the embodiments of the present application, log data is collected and undergoes centralized analysis, and by incorporating the calling relationship of the microservice cluster, possible root causes of anomalies within a microservice cluster can be obtained, thereby greatly reducing the dependence of operation and maintenance personnel on front-line research and development personnel, and reducing the loss and manual troubleshooting costs caused by a failure.
Need to check novelty before this filing date? Find Prior Art

Description

An abnormal application determination method, device and equipment of a micro-service cluster and a medium

[0001] CROSS-REFERENCE TO RELATED APPLICATIONS

[0002] The present application claims priority to the Chinese patent application No. 202411754501.9, filed on December 02, 2024, and entitled "An abnormal application determination method, device and equipment of a micro-service cluster and a medium", the content of which is incorporated herein by reference in its entirety. TECHNICAL FIELD

[0003] The present application relates to the technical field of micro-service, and in particular to an abnormal application determination method, device and equipment of a micro-service cluster and a medium. BACKGROUND

[0004] In the operation and maintenance of a medium or large micro-service application cluster, the operation and maintenance personnel usually do not have a deep understanding of the business provided by the service and the service call relationship involved in the business function. When a business fault occurs, the traditional processing method is for the operation and maintenance to notify the front-line R&D personnel to handle it. However, in a complex business scenario, multiple services and multiple teams are usually involved, and multiple services call each other, which makes it difficult for the R&D personnel to locate the fault. How to quickly locate the abnormal root cause of the complex application call network to reduce the fault duration has become an operation and maintenance pain point of an enterprise.

[0005] The traditional method for handling faults is to build a link tracking system, collect Tracing (link tracking system) data on each service, and find the abnormal call chain through the entry gateway when a fault occurs. However, this method depends on the construction and adaptation of a relatively complex link tracking system and the modification of the underlying framework, and the construction cost and difficulty are relatively high. SUMMARY

[0006] In view of the above problems, the present application aims to provide an abnormal application determination method, device and equipment of a micro-service cluster and a medium, which automatically detects the abnormal running state of the application by collecting and analyzing the log data of each application in the micro-service cluster, obtains a probability list of the abnormal root cause in cooperation with the call relationship of the micro-service cluster, reduces the dependence of the operation and maintenance personnel on the front-line R&D personnel, and reduces the loss and manpower investigation cost caused by the fault.

[0007] According to a first aspect of the present application, an abnormal application determination method of a micro-service cluster is provided, characterized in that the micro-service cluster includes at least one application and an entry gateway; the method includes:

[0008] obtaining historical log data, current abnormal log data of the application, and an abnormal request accessing the micro-service cluster via the entry gateway;

[0009] generating at least one historical log template according to the historical log data of the application, and generating at least one current log template according to the current abnormal log data of the application;

[0010] comparing the historical log template with the current log template to obtain a difference degree of the application;

[0011] obtaining a target application of an abnormal request call and the difference degree of the target application;

[0012] determining an abnormal application of the micro-service cluster based on the difference degree of the target application.

[0013] Optionally, the historical log data and the current abnormal log data of the application are obtained, including:

[0014] collecting log data of different applications in the micro-service cluster;

[0015] aggregating the log data of different applications according to the application dimension to obtain log data corresponding to the same application;

[0016] obtaining the historical log data and the current abnormal log data from the log data corresponding to the same application.

[0017] Optionally, the at least one historical log template is generated according to the historical log data of the application, and the at least one current log template is generated according to the current abnormal log data of the application, including:

[0018] classifying the historical log data and the current abnormal log data of the application;

[0019] respectively extracting the same logs of the historical log data to generate the historical log template for different categories of the historical log data;

[0020] respectively extracting the same logs of the current abnormal log data to generate the current log template for different categories of the current abnormal log data.

[0021] Optionally, the difference degree of the application is obtained by comparing the historical log template with the current log template, including:

[0022] determining the historical log data amount corresponding to any historical log template;

[0023] obtaining the contribution degree of the historical log template according to the proportion of the historical log data amount corresponding to the historical log template in the historical log data amount of the application;

[0024] determining the current abnormal log data amount corresponding to any current log template;

[0025] According to the proportion of the current abnormal log data amount corresponding to the current log template in the current abnormal log data amount of the application, a contribution degree of the current log template is obtained.

[0026] According to the contribution degree of the historical log template and the contribution degree of the current log template, a difference degree of the application is obtained.

[0027] Optionally, according to the contribution degree of the historical log template and the contribution degree of the current log template, the difference degree of the application is obtained, including:

[0028] The sum of the absolute values of the difference between the contribution degree of the historical log template and the contribution degree of the current log template is calculated to obtain the difference degree of the application.

[0029] Optionally, the target application of the abnormal request call includes:

[0030] The call link graph of the abnormal request is obtained.

[0031] According to the call link graph, the target application of the abnormal request call is obtained.

[0032] Optionally, the abnormal application of the micro-service cluster is determined based on the difference degree of the target application, including:

[0033] According to the call link graph of the abnormal request, the call depth of the target application is obtained.

[0034] According to the call depth of the target application and the difference degree of the target application, the abnormal probability of the target application is obtained.

[0035] According to the abnormal probability of the target application, the abnormal application of the micro-service cluster is determined.

[0036] According to a second aspect of the present application, a micro-service cluster abnormal application determination device is also provided, characterized in that the device includes:

[0037] A data acquisition module is configured to acquire historical log data of an application, current abnormal log data, and an abnormal request accessing a micro-service cluster via an entry gateway.

[0038] A log template generation module is configured to generate at least one historical log template according to the historical log data of the application, and generate at least one current log template according to the current abnormal log data of the application.

[0039] A difference degree acquisition module is configured to compare the historical log template with the current log template to obtain a difference degree of the application.

[0040] A target application acquisition module is configured to acquire a target application of an abnormal request call and a difference degree of the target application.

[0041] The abnormal application determination module is configured to determine the abnormal application of the micro-service cluster based on the difference degree of the target application.

[0042] According to a third aspect of the present application, an electronic device is provided, which comprises a processor, a memory, and a computer program stored in the memory and capable of running on the processor, and when the computer program is executed by the processor, the micro-service cluster abnormal application determination method described above is implemented.

[0043] According to a fourth aspect of the present application, a computer readable storage medium is provided, which stores a computer program, and when the computer program is executed by a processor, the micro-service cluster abnormal application determination method described above is implemented.

[0044] The micro-service cluster abnormal application determination method provided by the embodiments of the present application obtains the historical log data of the application, the current abnormal log data, and the abnormal request accessing the micro-service cluster via the entry gateway, generates at least one historical log template according to the historical log data of the application, generates at least one current log template according to the current abnormal log data of the application, compares the historical log template with the current log template to obtain the difference degree of the application, obtains the target application called by the abnormal request and the difference degree of the target application, and determines the abnormal application of the micro-service cluster based on the difference degree of the target application. The embodiments of the present application collect and centrally analyze log data, automatically detect the abnormal running state of the application at regular time intervals, cooperate with the calling relationship of the micro-service cluster to obtain a probability list of abnormal root causes, and then obtain the possible abnormal root causes of the micro-service cluster. Compared with the traditional ELK scheme in which each application is configured with a log template, the embodiments of the present application have greater flexibility and practicability. Compared with the traditional link tracking system, the embodiments of the present application do not need to make system modifications, and the application log is the most basic troubleshooting means in most systems and has greater practicality. The embodiments of the present application can greatly reduce the dependence of operation and maintenance personnel on front-line R&D personnel and reduce the loss and manpower caused by faults.

[0045] The above description is only a summary of the technical solutions of the present application. In order to more clearly understand the technical means of the present application, the embodiments of the present application can be implemented according to the content of the description, and in order to make the above and other purposes, characteristics and advantages of the present application more obvious and easy to understand, the specific embodiments of the present application are described below. BRIEF DESCRIPTION OF DRAWINGS

[0046] In order to make the purposes, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be described clearly and completely below with reference to the drawings in the embodiments of the present application. Obviously, the described embodiments are only some but not all of the embodiments of the present application. Based on the embodiments in the present application, all other embodiments obtained by a person of ordinary skill in the art without creative work fall within the protection scope of the present application. In the drawings:

[0047] FIG. 1 is a step flowchart of a method for determining an abnormal application of a micro-service cluster according to an embodiment of the present application;

[0048] FIG. 2 is an abnormal analysis diagram of a micro-service cluster according to an embodiment of the present application;

[0049] FIG. 3 is a statistical flowchart of a log template according to an embodiment of the present application;

[0050] FIG. 4 is a log template generation flowchart according to an embodiment of the present application;

[0051] FIG. 5 is a request call diagram of a micro-service cluster according to an embodiment of the present application;

[0052] FIG. 6 is a structural schematic diagram of an abnormal application determination apparatus of a micro-service cluster according to an embodiment of the present application. DETAILED DESCRIPTION

[0053] In order to make the purposes, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be described clearly and completely below with reference to the drawings in the embodiments of the present application. Obviously, the described embodiments are only some but not all of the embodiments of the present application. Based on the embodiments in the present application, all other embodiments obtained by a person of ordinary skill in the art without creative work fall within the protection scope of the present application. In the drawings:

[0054] ELK: an open-source log collection component combination, which is a combination of three open-source software, namely, Elasticsearch (elastic search), Logstash (log storage), and Kibana (an open-source data analysis and visualization platform).

[0055] Tracing: a link tracking system, which is used to track the service details passed through by each request, including time consumption / state, etc.

[0056] In the traditional troubleshooting scene, developers usually rely on the most basic troubleshooting means of logs. Abnormal situations usually have a large number of abnormal outputs in logs. If a log clustering model of each application in the microservice cluster is built, the possible abnormal probability can be obtained by comparing the model difference. As the existing basic data, logs only need to be collected and analyzed, and the abnormal running state of the application is automatically detected at regular intervals. In combination with the calling relationship of the microservice cluster, a probability list of abnormal root causes is obtained. The dependence of front-line R&D personnel can be greatly reduced, and the loss and manpower caused by troubleshooting can be reduced.

[0057] Referring to FIG. 1, a step flowchart of a method for determining an abnormal application of a microservice cluster is shown, which can specifically include the following steps:

[0058] In step 101, historical log data of an application, current abnormal log data of the application, and an abnormal request accessing the microservice cluster via an entry gateway are obtained.

[0059] The historical log data usually refers to all log data within a certain time range, and the current abnormal log data usually refers to error or abnormal logs occurring within a certain time period (log data within a time period in which the microservice cluster currently occurs abnormally).

[0060] In the microservice cluster, all types of log data reported by all applications in the microservice cluster are collected to a log storage center. The collected log data is aggregated according to the application dimension, the log data of different applications is isolated, and the log data of the same application is summarized. Then, the historical log data and the current abnormal log data of each application can be obtained, for example, by using a query language, a log analysis tool, etc.

[0061] The log data of different applications is isolated for separate analysis and processing, and the log data of the same application is summarized for overall analysis and monitoring.

[0062] The log storage center is a place for centralized storage and management of log data, which usually includes the following components: log storage: using a distributed storage system to store log data, message queue: using a message queue to buffer log data to ensure high throughput and reliability of data, log processing: using a log processing framework to clean, convert and aggregate log data.

[0063] In step 102, at least one historical log template is generated according to the historical log data of the application, and at least one current log template is generated according to the current abnormal log data of the application.

[0064] The log clustering model is constructed by applying historical log data, and log templates of various types and indexes of the log templates of various types are obtained as a reference system, wherein the index is equivalent to the ID (Identification) of the log template, and is used to identify a unique log template. Specifically, the log clustering model is a model generated by analyzing log data through a clustering algorithm, which has the function of aggregating similar log records together to form different clusters, i.e., classifying log data, and then extracting representative log records for each cluster to form a log template. For example, clustering historical log data obtains two clusters: cluster 1: 100 log records, indicating user login success, and cluster 2: 200 log records, indicating user login failure; and two historical log templates can be extracted: log template 1: user login success, and log template 2: user login failure. Among them, 1 and 2 are indexes of the log templates, the log data amount corresponding to the log template 1 is 200, and the log data amount corresponding to the log template 2 is 100. The same method is used to cluster the current abnormal log data, and two clusters are obtained. For each cluster, a representative log record is extracted to form two current log templates. At this time, the log data amount corresponding to the log template 1 can be 120, and the log data amount corresponding to the log template 2 can be 230.

[0065] The log templates obtained by clustering the historical log data are the same as the log templates obtained by clustering the current log data, but the log data amount corresponding to the log templates obtained by clustering the historical log data is different from the log data amount corresponding to the log templates obtained by clustering the current log data. Therefore, the possible abnormal changes of the application can be evaluated by the change of the log data amount of the log template.

[0066] In step 103, the difference degree of the application is obtained by comparing the historical log templates with the current log templates.

[0067] After clustering the log data to obtain a plurality of log templates corresponding to each cluster and a log data amount corresponding to each log template, the proportion of each log template can be obtained. The difference between the proportion of the log template obtained by the historical log data and the proportion of the log template obtained by the current abnormal data is calculated to obtain the difference degree of each log template, and then the difference degrees of each log template are added to obtain the difference degree of the application.

[0068] A simple example is used for illustration. Assume that there are two historical log templates: log template 1: user login success, and log template 2: user login failure, wherein the log data amount corresponding to the log template 1 is 200, and the log data corresponding to the log template 2 is 100. The log data amount corresponding to the log template 1 is 120, and the log data amount corresponding to the log template 2 is 230. The historical log template proportion is:

[0069] Log template 1: 200 / (200+100) = 66.7%

[0070] Log template 2: 100 / (200+100) = 33.3%

[0071] The current daily log template ratio is:

[0072] Log template 1: 120 / (120+230) = 34.3%

[0073] Log template 2: 230 / (120+230) = 65.7%

[0074] Then the difference of each log template can be obtained:

[0075] Log template 1 difference = |66.7%-34.3%| = 32.4%

[0076] Log template 2 difference = |33.3%-65.7%| = 32.4%

[0077] The total difference obtained by adding the difference of each log template is: 32.4%+32.4% = 64.8%, and the total difference is the difference of the application.

[0078] Step 104, obtaining the target application of the abnormal request call and the difference of the target application.

[0079] The gateway of the microservice cluster can determine the abnormal request. The gateway is the entrance of the microservice cluster and is responsible for processing all requests entering the cluster, so it can capture and record detailed information of all requests, including request path, request parameters, response status code, etc. By analyzing these information, abnormal requests can be identified.

[0080] Then the calling link diagram of the abnormal request can be obtained through the link tracking system, and according to the calling link diagram, it can be obtained which applications are called by the abnormal request, and the difference of these applications has been calculated in the previous step.

[0081] Step 105, determining the abnormal application of the microservice cluster based on the difference of the target application.

[0082] The calling link diagram of the abnormal request obtained through the link tracking system can also obtain the calling depth of each application. Then according to the calling depth of each application and the difference corresponding to each application, the root cause probability ranking of each application is obtained, that is, the probability of the root cause of the microservice cluster exception. According to the root cause probability ranking, the application with larger probability can be directly located for exception investigation.

[0083] By the above method, by collecting and centrally analyzing log data, cooperating with the calling relationship of the micro-service cluster, the probability list of the abnormal root cause is obtained, the abnormal root cause of the complex application calling network can be quickly located, and the dependence of the operation and maintenance personnel on the front-line research and development personnel can be greatly reduced, and the loss and manpower investigation cost caused by the fault can be reduced.

[0084] In the embodiment of the application, the historical log data of the application, the current abnormal log data, and the abnormal request accessing the micro-service cluster via the entry gateway are obtained, at least one historical log template is generated according to the historical log data of the application, at least one current log template is generated according to the current abnormal log data of the application, the difference degree of the application is obtained by comparing the historical log template with the current log template, the target application called by the abnormal request and the difference degree of the target application are obtained, and the abnormal application of the micro-service cluster is determined based on the difference degree of the target application. In the embodiment of the application, log data is collected and centrally analyzed, the abnormal running state of the application is automatically detected at regular time intervals, the probability list of the abnormal root cause is obtained by cooperating with the calling relationship of the micro-service cluster, and then the possible abnormal root cause of the micro-service cluster is obtained. Compared with the traditional ELK scheme in which each application is configured with a log template, the flexibility and implementability are relatively large. Compared with the traditional link tracking system, no system modification is needed, and the application log is the most basic troubleshooting means in most systems, which has relatively large practicality. The dependence of the operation and maintenance personnel on the front-line research and development personnel can be greatly reduced, and the loss and manpower investigation cost caused by the fault can be reduced.

[0085] In an optional embodiment of the application, step 101 further includes the following steps:

[0086] S1011, collecting log data of different applications in the micro-service cluster;

[0087] S1012, aggregating the log data of different applications according to the application dimension to obtain log data corresponding to the same application;

[0088] S1013, obtaining historical log data and current abnormal log data from the log data corresponding to the same application.

[0089] Collect all kinds of log data reported by all applications in the micro-service cluster to the log storage center. The log collection methods include log agent, in-application log library, and application reporting. Among them, the log agent (Log Agent): deploy a log agent (such as Fluentd (Fluentd), Logstash (Logstash), Filebeat (Filebeat), etc.) on the application server. The agent is responsible for collecting local log files and sending log data to the log storage center; the in-application log library: integrate a log library (such as Log4j (Log4j), Logback (Logback), Serilog (Serilog), etc.) in the application code. Configure the log library to send log data directly to the log storage center; application reporting: the application directly sends log data to the log storage center through the HTTP (HyperText Transfer Protocol, HyperText Transfer Protocol) / HTTPS (HyperText Transfer Protocol Secure, HyperText Transfer Protocol Secure) interface.

[0090] In the process of transmitting log data from the application to the log center, the reliability and security of the data need to be ensured. A reliable transmission protocol is used, the log data is encrypted during transmission to prevent data leakage, the log data is compressed to reduce transmission bandwidth occupation, etc.

[0091] In the log storage center, log data can be aggregated according to the application dimension. During log collection, an application identifier is added to each log to facilitate subsequent aggregation by application dimension. During log storage, different indexes or partitions are created according to the application identifier to facilitate query and aggregation by application dimension. A log processing framework (such as Logstash (data processing pipeline), Fluentd (log collection and distribution system), Spark Streaming (Spark streaming), etc.) is used to aggregate log data and generate application dimension statistics (such as log volume, error rate, response time, etc.). Thus, the log data of different applications is isolated, and the log data of the same application is summarized.

[0092] After the log data is aggregated, the historical abnormal data and current log data of each application can be obtained through query language and log analysis tools.

[0093] In an optional embodiment of the present application, step 102 further includes the following steps:

[0094] S1021, classify the historical log data and the current abnormal log data of the application;

[0095] S1022, for different categories of historical log data, respectively extract the same log of historical log data, generate historical log template;

[0096] S1023, for different categories of current abnormal log data, respectively extract the same log of current abnormal log data, generate current log template.

[0097] The log data of different nodes of the same application is classified according to the application dimension, and the topN log templates of each application are calculated according to the similarity algorithm, the log data is converged, and the log templates of each application are stored for subsequent comparison. The similarity algorithm can use the edit distance algorithm, the edit distance (Edit Distance) is also called the Levenshtein distance (Levenshtein Distance), which is a method for measuring the difference between two strings. It represents the minimum number of single-character editing operations (insertion, deletion, replacement) required to convert one string to another. The smaller the edit distance, the more similar the two strings.

[0098] In the embodiments of the application, Jaro-Winkler Distance can be used to calculate the similarity of each line of log text, which is an improved edit distance algorithm. It increases the weight of prefix matching based on Jaro distance, making the similarity calculation more accurate. According to the calculated similarity, grouping and classification is performed, and the log text is grouped and classified according to the preset similarity threshold (such as 60%). If the similarity of two texts is greater than or equal to the threshold, they are classified into the same group.

[0099] For similar group content (log in the same group), use HanLP (Han Language Processing) word segmentation tool to perform word segmentation processing on the content of each group, perform intersection processing on the word sequence after word segmentation, extract common keywords, extract one of the log contents as a sample, and perform word segmentation processing. After word segmentation, compare with the extracted common keywords, and keep the intersection of the two to get the log template. For example, the log content in a group is:

[0100] User 1 login success

[0101] User 2 login success

[0102] Use HanLP to perform word segmentation processing on the content of the group, and the word segmentation result is:

[0103] User / 1 / login / success

[0104] User / 2 / login / success

[0105] The analyzed sequence is intersected to extract common keywords, and the intersection result is:

[0106] User / login / success

[0107] Then, one piece of log content is extracted as a sample and is subjected to word segmentation to obtain: user / 1 / login / success. The result is compared with user / login / success, and a log template: user<&>login success is obtained, wherein <&> is a placeholder. The sample is extracted and compared to determine the position of the placeholder.

[0108] In the embodiment of the application, the historical log data is taken as the log data of the previous day and the log data of the previous hour. At least one log template is generated from the log data of the previous day. It is assumed that three log templates: log template 1, log template 2 and log template 3 are generated as a reference template set. Three log templates are generated from the log data of the previous hour, and the generated three log templates are compared with the reference template set. If the similarity is more than 80%, it is considered to be the same template, and is used as a healthy reference system.

[0109] The current abnormal log data can be taken as the current 1 hour, indicating that the micro-service cluster has an exception in the current 1 hour period. Three log templates are generated from the current 1 hour log data, and the generated three log templates are compared with the reference template set. If the similarity is more than 80%, it is considered to be the same template, and is used as a detected object.

[0110] In an optional embodiment of the application, step 103 further includes the following steps:

[0111] S1031, for any historical log template, determining the historical log data amount corresponding to the historical log template;

[0112] S1032, obtaining the contribution degree of the historical log template according to the proportion of the historical log data amount corresponding to the historical log template in the historical log data amount of the application;

[0113] S1033, for any current log template, determining the current abnormal log data amount corresponding to the current log template;

[0114] S1034, obtaining the contribution degree of the current log template according to the proportion of the current abnormal log data amount corresponding to the current log template in the current abnormal log data amount of the application;

[0115] S1035, obtaining the difference degree of the application according to the contribution degree of the historical log template and the contribution degree of the current log template.

[0116] For each different application corresponding log data, the historical log data and the current abnormal log data corresponding to each application are obtained. The historical log data is also the log data of the previous day and the log data of the previous hour, and the current abnormal log data is the log data of the current hour.

[0117] The log template generated by the log data of the previous day is taken as the benchmark template set, and specifically as shown in Table 1:

[0118] Table 1 Benchmark template set Specifically, the log template generated by the log data of the previous day has: log template 1, log template 2, log template 3, etc.

[0119] The log template generated by the log data of the previous hour is taken as the health reference system, and specifically as shown in Table 2:

[0120] Table 2 Health reference system

[0121] Specifically, the log data amount corresponding to the log template 3 generated by the log data of the previous hour is 4390, and the log data amount corresponding to the log template 2 is 2381. Among them, the proportion of the log template 3 is C3, that is, the contribution degree of the log template 3 is C3, and the proportion of the log template n is Cn, that is, the contribution degree of the log template n is Cn. Taking the log template n as an example, the calculation method of the proportion of the log template, that is, the contribution degree, is: Cn=Log data amount (log number) corresponding to log template n / Sum of log data amounts corresponding to all log templates.

[0122] The log template generated by the current abnormal log data is taken as the detected object, and specifically as shown in Table 3:

[0123] Table 3 Detected object

[0124] Specifically, the log data amount corresponding to the log template 2 generated by the log data of the current hour is 3051, and the log data amount corresponding to the log template 1 is 2521. Among them, the proportion of the log template 2 is S2, that is, the contribution degree of the log template 2 is S2, and the proportion of the log template n is Sn, that is, the contribution degree of the log template n is Sn.

[0125] The clustering reference template of the previous hour is taken as the benchmark, and the difference of the proportion (contribution degree) of each log template in the template of the current hour is counted, and the calculation method is as follows:

[0126]

[0127] Wherein, n represents the log template index, m represents the number of log templates, and abs represents the absolute value.

[0128] Then, the total difference degree is obtained by the difference degree of each log template, that is, the difference degree of the application.

[0129] In an optional embodiment of the present application, S1035 further includes the following sub-steps:

[0130] S1035-1, calculating the sum of the absolute values of the difference between the contribution degree of the historical log template and the contribution degree of the current log template, to obtain the difference degree of the application.

[0131] After obtaining the difference degree of each log template by the difference degree formula, the difference degrees of each log template are added to obtain the total difference degree, that is, the difference degree of the application.

[0132] The difference degree of the application is compared with the threshold value, and if the difference degree of the application is greater than the threshold value, it is determined that the application in the current 1 hour may have a risk exception. The threshold value is set mainly to filter out applications with very small changes, and for such applications, analysis can not be needed.

[0133] In an optional embodiment of the present application, step 104 further includes the following sub-steps:

[0134] S1041, obtaining the call link graph of the abnormal request;

[0135] S1042, obtaining the target application of the abnormal request according to the call link graph.

[0136] A link tracking system, such as Zipkin (Zipkin link tracking system), Jaeger (Jaeger link tracking system), or SkyWalking (SkyWalking link tracking system), can be deployed in the micro-service cluster, and a link tracking client is integrated in each micro-service. The link tracking client records the call information, including the call time, the calling service, the called service, the call result, etc., in the call process of each micro-service (each micro-service is an independent application). These call information is sent to the link tracking system for storage and analysis, and the link tracking system generates a call link graph according to the stored call information to show the complete call path of the request in the micro-service cluster.

[0137] Through the interface or API (Application Programming Interface, application programming interface) provided by the link tracking system, the call link graph of the abnormal request is obtained, and the call link graph is analyzed to identify all the applications called by the abnormal request.

[0138] Referring to FIG. 5, a request calling diagram of a microservice cluster is shown according to an embodiment of the present application, and the diagram is as follows:

[0139] A client accesses the microservice cluster through a gateway, and the gateway captures and records detailed information of all requests, including request path, request parameter, response status code, etc. By analyzing the information, the abnormal request can be identified as request 1.

[0140] Request 1 involves applications A, C and D, and all applications called by the abnormal request (request 1) are obtained.

[0141] In an optional embodiment of the present application, step 105 further includes the following sub-steps:

[0142] S1051, obtaining the calling depth of the target application according to the calling link diagram of the abnormal request;

[0143] S1052, obtaining the abnormal probability of the target application according to the calling depth of the target application and the difference degree of the target application;

[0144] S1053, determining the abnormal application of the microservice cluster according to the abnormal probability of the target application.

[0145] The calling depth of all applications called by the abnormal request can also be obtained according to the calling link diagram of the abnormal request. As shown in FIG. 5, request 1 first calls application A, and then application A calls applications C and D, so the calling depth of application A is 1, and the calling depth of applications C and D is 2.

[0146] The difference degree of each application has been obtained through the foregoing steps, and then the root cause list of the calling depth and the difference degree of all applications called by the abnormal application can be obtained, as shown in Table 4:

[0147] Table 4 Root Cause List

[0148] Specifically, the calling depth of application A is 1, and its difference degree is diff(A), the calling depth of application C is 2, and its difference degree is diff(C), and the calling depth of application D is 1, and its difference degree is diff(D).

[0149] The abnormal root cause probability of each application can be obtained according to the calling depth and the difference degree of the application, and the specific process is as follows:

[0150] Application Abnormal Root Cause Probability=(depth(app) / max_depth+diff(app) / sum_diff) / 2

[0151] Wherein, max_depth represents the maximum call depth, sum_diff represents the difference degree sum, and app represents the application name. The application exception root cause probability is the probability that the microservice cluster exception is the root cause of the application.

[0152] After that, according to the exception root cause probability of each application, the application with the maximum exception root cause probability can be located for troubleshooting.

[0153] The application extracts the current log template and the historical log template by collecting and centrally analyzing log data, compares the current log template with the historical log template, and evaluates the application exception root cause probability in combination with the microservice cluster call depth and the call network. Compared with the traditional ELK scheme in which each application is configured with a log template, this scheme has greater flexibility and implementability. Compared with the traditional link tracking system, this scheme does not need to modify the system, and the application log is the most basic troubleshooting means in most systems, which has greater practicality.

[0154] Referring to FIG. 2, an exception analysis diagram of a microservice cluster is shown, according to an embodiment of the application, as follows:

[0155] Collect all kinds of log data reported by all applications (business application 1, business application 2, and business application 3) in the microservice cluster to the log storage center. The log storage center aggregates the collected log data according to the application dimension, isolates the log data of different applications, and aggregates the log data of the same application. The log data is analyzed by a clustering algorithm to obtain a clustering model of the latest log (1-hour log and 1-hour exception log). According to the model change, the possible abnormal changes of the application are obtained. The clustering model is a model generated by analyzing the log data by a clustering algorithm. Its function is to aggregate similar log records together to form different clusters (groups).

[0156] The application log template generated by the application history (here, the previous 1 day and the previous 1 hour) log text data is used as a benchmark reference. The log template generated by the application log data in the last N (N is determined according to the actual time period of the cluster exception, and here, 1) hours is compared with the log template generated by the historical log text data to obtain the difference degree.

[0157] Through the call relationship of the microservice (application) and in combination with the gateway entry abnormal request, the call topology network (relationship network) is found out. The related application exception log model analysis in the topology network (relationship network) is performed to obtain the abnormal link path that may have a problem, thereby obtaining an exception root cause analysis report.

[0158] Referring to FIG. 3, a statistical flowchart of a log template is shown, according to an embodiment of the application, as follows:

[0159] The log data is classified according to the application dimension, the log data of different applications is isolated, and the log data of the same application is summarized;

[0160] According to the classified log data, for different categories of log data, extract n log templates, count the number of log records (log data volume) of each category model, and obtain the distribution statistics (proportion, contribution) of each log template.

[0161] Referring to FIG. 4, a log template generation flowchart is shown according to an embodiment of the present application, as follows:

[0162] Suppose the log original text set in the figure is:

[0163] 2023-11-03 10:29:37WARN Watchdog-Reconnecting,last destination was / 171.16.30.2:31046java.util.concurrent.TimeoutException:Waited 3000milliseconds(plus 4milliseconds,330400nanoseconds delay)for ClientCalls$GrpcFuture 2023-11-03 15:09:01ERROR HikariPool-1-Error thrown while acquiring connection from data source 2023-11-03 15:58:40INFO query storage nodes success,http code:200 2023-11-03 15:59:40ERROR dial tcp 172.16.10.1:33295connectex:No connection could be made because the target machine actively refused it.2023-11-03 15:59:42ERROR dial tcp 172.16.10.2:33295connectex:No connection could be made because the target machine actively refused it.

[0164] The similarity is calculated using the edit distance algorithm (Jaro-Winkler Distance) for each line of text in the original text set of the comparison log, and the log text is grouped and classified according to a preset similarity threshold (such as 60%). If the similarity of two texts is greater than or equal to the threshold, they are classified into the same group.

[0165] Grouping and classification using the edit algorithm can obtain a set of log text data as follows:

[0166] ERROR dial tcp 172.16.10.1:33295connectex ERROR dial tcp 172.16.10.2:33295connectex

[0167] The HanLP is used for word segmentation for each line of content in the grouping content. For example, word segmentation is performed on the second line of text data to obtain:

[0168] ERROR / nx

[0169] dail / nx

[0170] tcp / nx

[0171] 172.16.10.2 / m

[0172] : / w

[0173] 33295 / m

[0174] connectex / nx

[0175] The intersection processing is performed on the segmented objects, and the word groups after the intersection are retained. Then, one log content is extracted as a sample and subjected to word segmentation processing. The segmented words are compared with the intersection of the previous step, and only the segmented words including the intersection are retained. Thus, the generated log template can be obtained as follows:

[0176] ERROR dial tcp<&#&>:33295connectex

[0177] Where <&#&> is a placeholder.

[0178] Referring to FIG. 6, a structural schematic diagram of an abnormal application determination device of a micro-service cluster is shown according to an embodiment of the present application. The device includes:

[0179] The data acquisition module 201 is configured to acquire historical log data of an application, current abnormal log data, and an abnormal request accessing the micro-service cluster via an entry gateway.

[0180] The log template generation module 202 is configured to generate at least one historical log template according to historical log data of the application, and generate at least one current log template according to current abnormal log data of the application;

[0181] The difference degree acquisition module 203 is configured to compare the historical log template with the current log template to obtain a difference degree of the application;

[0182] The target application acquisition module 204 is configured to acquire a target application of the abnormal request call and the difference degree of the target application;

[0183] The abnormal application determination module 205 is configured to determine an abnormal application of the micro-service cluster based on the difference degree of the target application.

[0184] In an optional embodiment of the present application, the data acquisition module 201 comprises:

[0185] The collection module is configured to collect log data of different applications in the micro-service cluster;

[0186] The aggregation module is configured to aggregate the log data of different applications according to the application dimension to obtain log data corresponding to the same application;

[0187] The log data acquisition module is configured to acquire historical log data and current abnormal log data from the log data corresponding to the same application.

[0188] In an optional embodiment of the present application, the log template generation module 202 comprises:

[0189] The classification module is configured to classify the historical log data and the current abnormal log data of the application;

[0190] The historical log template generation module is configured to extract the same log of the historical log data for different categories of historical log data, and generate a historical log template;

[0191] The current log template generation module is configured to extract the same log of the current abnormal log data for different categories of current abnormal log data, and generate a current log template.

[0192] In an optional embodiment of the present application, the difference degree acquisition module 203 comprises:

[0193] The first acquisition module is configured to determine the historical log data amount corresponding to the historical log template for any historical log template;

[0194] The first contribution degree acquisition module is configured to obtain the contribution degree of the historical log template according to the proportion of the historical log data amount corresponding to the historical log template in the historical log data amount of the application;

[0195] The second obtaining module is configured to determine, for any current log template, a current abnormal log data volume corresponding to the current log template.

[0196] The second contribution degree obtaining module is configured to obtain the contribution degree of the current log template according to a proportion of the current abnormal log data volume corresponding to the current log template in the current abnormal log data volume of the application.

[0197] The application difference degree obtaining module is configured to obtain the difference degree of the application according to the contribution degree of the historical log template and the contribution degree of the current log template.

[0198] In an optional embodiment of the present application, the application difference degree obtaining module comprises:

[0199] The calculation module is configured to calculate a sum of absolute values of differences between the contribution degrees of the historical log template and the current log template, and obtain the difference degree of the application.

[0200] In an optional embodiment of the present application, the target application obtaining module 204 comprises:

[0201] The call link diagram obtaining module is configured to obtain a call link diagram of the abnormal request.

[0202] The third obtaining module is configured to obtain a target application called by the abnormal request according to the call link diagram.

[0203] In an optional embodiment of the present application, the abnormal application determining module 205 comprises:

[0204] The call depth obtaining module is configured to obtain a call depth of the target application according to the call link diagram of the abnormal request.

[0205] The abnormal probability determining module is configured to obtain an abnormal probability of the target application according to the call depth of the target application and the difference degree of the target application.

[0206] The determining module is configured to determine an abnormal application of the micro-service cluster according to the abnormal probability of the target application.

[0207] In this embodiment, by acquiring historical log data and current abnormal log data of the application, as well as abnormal requests accessing the microservice cluster via the ingress gateway, at least one historical log template is generated based on the historical log data of the application, and at least one current log template is generated based on the current abnormal log data of the application. The difference between the historical log template and the current log template is compared to obtain the application's difference degree. The target application called by the abnormal request and the difference degree of the target application are obtained, and the abnormal application of the microservice cluster is determined based on the difference degree of the target application. This embodiment collects and centrally analyzes log data, automatically and periodically detects abnormal operating states of applications, and obtains a probability list of abnormal root causes by combining the call relationship of the microservice cluster, thereby obtaining the possible abnormal root causes of the microservice cluster. Compared with the traditional ELK solution that configures log templates for each application, it has greater flexibility and implementability. Compared with the traditional tracing system, no system modification is required. Application logs, as the most basic troubleshooting method, are already available in most systems and have great practicality. It can significantly reduce the dependence of operations and maintenance personnel on front-line R&D personnel and reduce the losses caused by failures and the cost of manpower troubleshooting.

[0208] An embodiment of this application also provides an electronic device, including a processor, a memory, and a computer program stored in the memory and capable of running on the processor. When the computer program is executed by the processor, it implements the above-described method for determining abnormal applications in a microservice cluster.

[0209] The memory may include random access memory (RAM) or non-volatile memory, such as at least one disk storage device. Optionally, the memory may also be at least one storage device located remotely from the aforementioned processor.

[0210] The processor can be a general-purpose processor, including a central processing unit (CPU), a network processor (NP), etc.; it can also be a digital signal processor (DSP), an application-specific integrated circuit (ASIC), a field-programmable gate array (FPGA), or other programmable logic devices, discrete gate or transistor logic devices, or discrete hardware components.

[0211] An embodiment of the present application further provides a computer readable storage medium, and the computer readable storage medium stores a computer program. The computer program is executed by a processor to implement the method for determining abnormal application of micro-service cluster.

[0212] For the device embodiment, it is basically similar to the method embodiment, so it is described simply, and the related part refers to the part of the method embodiment.

[0213] In the above embodiments, all or part of the embodiments can be implemented by software, hardware, firmware or any combination thereof. When implemented by software, all or part of the embodiments can be implemented in the form of a computer program product. The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on a computer, all or part of the processes or functions according to the embodiments of the present application are generated. The computer can be a general-purpose computer, a special-purpose computer, a computer network or other programmable device. The computer instructions can be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another computer readable storage medium, for example, the computer instructions can be transmitted from one website, computer, server or data center to another website, computer, server or data center through wired (such as coaxial cable, optical fiber, digital subscriber line (Digital Subscriber Line, DSL)) or wireless (such as infrared, wireless, microwave, etc.) mode. The computer readable storage medium can be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. integrated with one or more available media. The available media can be magnetic media (such as floppy disk, hard disk, magnetic tape), optical media (such as DVD (Digital Versatile Disc)) or semiconductor media (such as solid state disk (Solid State Disk, SSD)) and the like.

[0214] It should be noted that in this paper, the relationship terms such as first and second are only used to distinguish one entity or operation from another entity or operation, and do not necessarily require or imply any such actual relationship or order between the entities or operations. Moreover, the terms "include", "contain" or any other variants thereof are intended to cover non-exclusive inclusion, so that the process, method, article or device including a series of elements not only includes those elements, but also includes other elements not explicitly listed or inherent to such process, method, article or device. Without more limitation, the element defined by the statement "including a" does not exclude the presence of another identical element in the process, method, article or device including the element.

[0215] Each of the embodiments in the specification is described in a relevant manner, and the same or similar parts between the embodiments can be referred to each other. Each of the embodiments focuses on the difference from other embodiments. In particular, for the system embodiments, since they are basically similar to the method embodiments, the description is relatively simple, and the relevant parts can be referred to the part of the method embodiments.

[0216] The above merely describes the preferred embodiments of the present application, but not used to limit the protection scope of the present application. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims

1. A method for determining an abnormal application of a micro-service cluster, characterized in that, The micro-service cluster comprises at least one application and an entrance gateway; the method comprises: obtaining historical log data and current abnormal log data of the application, and an abnormal request accessing the micro-service cluster via the entrance gateway; generating at least one historical log template according to the historical log data of the application, and at least one current log template according to the current abnormal log data of the application; comparing the historical log template with the current log template to obtain the difference degree of the application; obtaining a target application called by the abnormal request and the difference degree of the target application; determining an abnormal application of the micro-service cluster based on the difference degree of the target application.

2. The method of claim 1, wherein, The method comprises: collecting log data of different applications in the micro-service cluster; aggregating the log data of the different applications according to the application dimension to obtain log data corresponding to the same application; obtaining historical log data and current abnormal log data from the log data corresponding to the same application.

3. The method of claim 1, wherein, The method comprises: classifying the historical log data and the current abnormal log data of the application; extracting the same log of the historical log data for different categories of the historical log data to generate a historical log template; extracting the same log of the current abnormal log data for different categories of the current abnormal log data to generate a current log template.

4. The method according to claim 1 or 3, characterized in that, The method comprises: determining the historical log data amount corresponding to any historical log template; obtaining the contribution degree of the historical log template according to the proportion of the historical log data amount corresponding to the historical log template in the historical log data amount of the application; determining the current abnormal log data amount corresponding to any current log template; obtaining the contribution degree of the current log template according to the proportion of the current abnormal log data amount corresponding to the current log template in the current abnormal log data amount of the application; obtaining the difference degree of the application according to the contribution degree of the historical log template and the contribution degree of the current log template.

5. The method of claim 4, wherein, The method comprises: calculating the sum of the absolute values of the difference between the contribution degree of the historical log template and the contribution degree of the current log template to obtain the difference degree of the application.

6. The method of claim 1, wherein, The method comprises: obtaining the calling link graph of the abnormal request; obtaining the target application called by the abnormal request according to the calling link graph.

7. The method according to claim 1 or 6, characterized in that, The method comprises: obtaining the calling depth of the target application according to the calling link graph of the abnormal request; obtain an abnormal probability of the target application according to the call depth of the target application and the difference degree of the target application; determine an abnormal application of the micro-service cluster according to the abnormal probability of the target application. 8.A device for determining abnormal application of a microservice cluster, characterized by, The micro-service cluster comprises at least one application and an entrance gateway. The device comprises: a data acquisition module, configured to acquire historical log data and current abnormal log data of the application, and an abnormal request accessing the micro-service cluster via the entrance gateway; a log template generation module, configured to generate at least one historical log template according to the historical log data of the application, and at least one current log template according to the current abnormal log data of the application; a difference degree acquisition module, configured to compare the historical log template with the current log template to obtain a difference degree of the application; a target application acquisition module, configured to acquire a target application called by the abnormal request and the difference degree of the target application; an abnormal application determination module, configured to determine an abnormal application of the micro-service cluster based on the difference degree of the target application.

9. An electronic device, comprising: The computer program is stored on the computer readable storage medium and is executable on the processor to implement the abnormal application determination method of the micro-service cluster according to any one of claims 1 to 7.

10. A computer-readable storage medium, characterized in that, The computer program is stored on the computer readable storage medium and is executable on the processor toimplement the abnormal application determination method of the micro-service cluster according to any one of claims 1 to7.