Alarm log pushing method and device, computer device and storage medium

What is AI technical title?
AI technical title is built by Patsnap AI team. It summarizes the technical point description of the patent document.
By calculating the fusion similarity and topological relationship of alarm logs, and using clustering algorithms to classify alarm logs, the problem of low alarm log processing efficiency is solved, and efficient and accurate alarm information processing is achieved, reducing the operational burden and storage requirements.

CN117290745BActive Publication Date: 2026-06-12JINAN INSPUR DATA TECH CO LTD

View PDF 3 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Patents(China)
Current Assignee / Owner: JINAN INSPUR DATA TECH CO LTD
Filing Date: 2023-09-15
Publication Date: 2026-06-12

Application Information

Patent Timeline

15 Sep 2023

Application

12 Jun 2026

Publication

CN117290745B

IPC: G06F18/23213; G06F18/22; G06F18/25; G06F18/24; G06N3/08; G06F11/30; G06F11/32

CPC: G06F18/23213; G06F18/22; G06F18/25; G06F18/24; G06N3/08; G06F11/3065; G06F11/327; Y02D10/00

AI Tagging

Application Domain

Hardware monitoring Energy efficient computing

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

Technical Problem

In existing technologies, alarm log processing relies on manual analysis, resulting in low processing efficiency, difficulty in covering all correlations between alarm logs, burden on storage space and network bandwidth, and impact on system performance.

Method used

By acquiring multiple alarm logs, calculating their fusion similarity, and using clustering algorithms to cluster similar or related alarm logs, representative alarm logs are identified, reducing the amount of analysis required by operations and maintenance personnel. Semantic features are analyzed using the continuous bag-of-words model and the term frequency-inverse document frequency model, and clustering is performed in conjunction with the topological relationships of storage devices.

Benefits of technology

It improves the efficiency and accuracy of alarm log processing, reduces the workload of operation and maintenance personnel, reduces storage space usage, has a strong ability to adapt to new scenarios, and does not require additional rule configuration.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN117290745B_ABST

Patent Text Reader

Abstract

The application relates to the technical field of alarm information processing, and discloses an alarm log pushing method and device, computer equipment and a storage medium, the method comprising the following steps: acquiring N alarm logs, wherein N is an integer greater than or equal to 2; determining the fusion similarity between any two alarm logs in the N alarm logs, wherein the fusion similarity is used for representing the similarity between any two alarm logs; performing clustering processing on the N alarm logs according to the fusion similarity to obtain K clustering clusters, wherein K is an integer, 1<=K<N, the fusion similarity between any two alarm logs in the clustering cluster is greater than a preset threshold; determining at least one representative alarm log in each clustering cluster in the K clustering clusters, wherein the representative alarm log is the alarm log with the maximum average fusion similarity to other alarm logs in the clustering cluster; and sending at least one representative alarm log in each clustering cluster in the K clustering clusters to a client. The application can improve the alarm information processing efficiency.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of alarm information processing technology, specifically to alarm log push methods, devices, computer equipment, and storage media. Background Technology

[0002] To ensure the smooth operation and troubleshooting of systems such as servers, a large number of alarm logs recorded during system operation are usually stored in a database. If these alarm logs are not processed in a timely manner, the huge amount of alarm logs will accumulate in the database for a long time, occupying a large amount of storage space, placing an unimaginable heavy burden on storage space, circuits and transmission network bandwidth, and also affecting system performance.

[0003] Currently, alarm log processing still relies on manual analysis and judgment by operations engineers. When processing alarm logs, engineers typically define rules to select alarms with the same fault from massive amounts of logs for batch processing. However, manually defining correlation rules makes it difficult to cover all relationships between alarm logs, resulting in low processing efficiency. Summary of the Invention

[0004] In view of this, the present invention provides an alarm log push method, apparatus, computer device and storage medium to solve the problem of low processing efficiency of alarm logs.

[0005] In a first aspect, the present invention provides an alarm log push method, the method being applied to a storage device, the method comprising: acquiring N alarm logs, where N is an integer greater than or equal to 2; determining the fusion similarity between any two alarm logs among the N alarm logs, the fusion similarity being used to characterize the degree of similarity between the two alarm logs; performing clustering processing on the N alarm logs according to the fusion similarity to obtain K clusters, where K is an integer and 1≤K<N, and the fusion similarity between any two alarm logs in the clusters is greater than a preset threshold; determining at least one representative alarm log in each of the K clusters, the representative alarm log being the alarm log in the cluster with the highest average fusion similarity to other alarm logs; and sending the at least one representative alarm log in each of the K clusters to a client.

[0006] The alarm log push method provided in this embodiment, after acquiring multiple alarm logs, first determines the fusion similarity between any two alarm logs. Secondly, it clusters the multiple alarm logs based on the fusion similarity, accurately classifying alarm logs reflecting different aspects of the fault. Then, it identifies at least one representative alarm log in each of the multiple clusters and sends this representative alarm log to the client, reducing the number of alarm logs that maintenance personnel need to analyze. In other words, through the above steps, the storage device can identify similar or related alarm logs, classify the alarm logs, and reduce the number of alarm logs that maintenance personnel need to analyze, alleviating their workload and improving the efficiency and accuracy of alarm information processing.

[0007] In an optional implementation, before determining the fusion similarity between any two alarm logs among the N alarm logs, the method further includes: preprocessing the N alarm logs to obtain a set of effective lexical units for each alarm log among the N alarm logs, wherein the set of effective lexical units includes at least one effective lexical unit, and the preprocessing includes at least one of log sorting, log deduplication, log cleaning, log word segmentation, stop word removal, and lexical standardization; determining the fusion similarity between any two alarm logs among the N alarm logs includes: determining the fusion similarity between any two alarm logs among the N alarm logs based on the set of effective lexical units.

[0008] The alarm log push method provided in this embodiment preprocesses the N alarm logs before determining the fusion similarity of any two alarm logs. This reduces noise in the original alarm logs, lowers the difficulty of determining the fusion similarity, and improves the processing efficiency of the alarm logs.

[0009] In one optional implementation, determining the fusion similarity between any two alarm logs among the N alarm logs based on the set of effective lexical units includes: inputting each effective lexical unit in the set of effective lexical units of the i-th alarm log into a continuous bag-of-words model to obtain a lexical vector for each effective lexical unit, i = 1, 2, ..., N; determining the term frequency feature of each effective lexical unit in the set of effective lexical units of the i-th alarm log based on a term frequency-inverse document frequency model; and determining the semantic features of the i-th alarm log based on the lexical vector and the term frequency feature of each effective lexical unit in the set of effective lexical units. The text similarity between the first alarm log and the second alarm log is determined based on the semantic features of the first alarm log and the second alarm log, wherein the first alarm log is one of the two alarm logs and the second alarm log is the other of the two alarm logs; the topological relevance between the first alarm log and the second alarm log is determined based on the software topology diagram and the hardware topology diagram used to describe the architecture of the storage device; and the fusion similarity between the first alarm log and the second alarm log is determined based on the text similarity and the topological relevance.

[0010] This embodiment mines the semantic features of effective words using a continuous bag-of-words model, analyzes the frequency features of effective words based on a term frequency-inverse document frequency model, and then fuses the semantic and frequency features to more accurately analyze the semantics of alarm logs. Furthermore, by using the alarm object field to link the alarm logs with the software and hardware topology of the storage device, the correlation between alarm logs is analyzed in a spatial dimension, which can more accurately determine the fusion similarity and thus improve the accuracy of clustering. In addition, this embodiment determines the fusion similarity based on the text similarity obtained from the continuous bag-of-words model and the term frequency-inverse document frequency model, and performs clustering processing on N alarm logs using a clustering algorithm. This eliminates the need for operations and maintenance experts to configure different rules for different scenarios and conditions, reduces the requirement for knowledge in the operations and maintenance domain, reduces the number of aggregation rules, and has a strong ability to adapt to new scenarios; adding new fault logs does not require additional configuration of new aggregation rules.

[0011] In an optional implementation, determining the fusion similarity between the first alarm log and the second alarm log based on the text similarity and the topological relevance includes: determining the fusion similarity using the following formula based on the weighting coefficient, the text similarity, and the topological relevance:

[0012] similarity(a,b)=α×texual(a,b)+(1-α)×correlation(a,b)

[0013] Wherein, a represents the first alarm log, b represents the second alarm log, similarity(a,b) represents the fusion similarity between the first alarm log and the second alarm log, α represents the weight coefficient, texual(a,b) represents the text similarity between the first alarm log and the second alarm log, and correlation(a,b) represents the topological correlation between the first alarm log and the second alarm log.

[0014] In one optional implementation, determining the semantic features of the i-th alarm log based on the lexical vector and the word frequency feature of each valid lexical in the set of valid lexical elements includes:

[0015] Based on the lexical vector and word frequency feature of each valid lexical in the set of valid lexical elements, the semantic features of the i-th alarm log are determined by the following formula:

[0016]

[0017] Among them, S i Let be the semantic features of the i-th alarm log. For W j i The word vectors, For W j i Word frequency features, W j i Let j be the j-th valid word in the set of valid words in the i-th alarm log, and l be the total number of words in the set of valid words.

[0018] In an optional implementation, determining the text similarity between the first alarm log and the second alarm log based on the semantic features of the first alarm log and the second alarm log includes:

[0019] Based on the semantic features of the first alarm log and the semantic features of the second alarm log, the text similarity is determined using the following formula:

[0020]

[0021] Where, a is the first alarm log, b is the second alarm log, and texual(a,b) is the text similarity between the first alarm log and the second alarm log. a S represents the semantic features of the first alarm log. b Let T represent the semantic features of the second alarm log, where T represents the transpose of the vector and ||·|2 represents the l2 norm.

[0022] In one optional implementation, the step of clustering the N alarm logs according to the fusion similarity to obtain K clusters includes: clustering the N alarm logs using the fusion similarity as a distance metric according to a density-based clustering algorithm to obtain the K clusters.

[0023] Since density-based clustering algorithms do not require specifying the number of categories, this implementation uses a density-based clustering algorithm to cluster N alarm logs, which can improve the clustering speed of N alarm logs.

[0024] In one optional implementation, determining at least one representative alarm log from each of the K clusters includes:

[0025] The representative alarm log for each of the K clusters is determined using the following formula:

[0026]

[0027] Where centroid is the representative alarm log, n represents the total number of alarm logs in the corresponding cluster, c and d are any two different alarm logs in the corresponding cluster, 1≤c≤n, 1≤d≤n.

[0028] In one optional implementation, after determining at least one representative alarm log in each of the K clusters, the method further includes: deleting redundant alarm logs, which are alarm logs in each of the K clusters other than the at least one representative alarm log.

[0029] This implementation saves storage space by deleting redundant logs.

[0030] In one optional implementation, obtaining the N alarm logs includes: obtaining the N alarm logs from a lightweight data collection client.

[0031] Secondly, the present invention provides an alarm log push device, the device comprising: an acquisition module, configured to acquire N alarm logs, where N is an integer greater than or equal to 2; a first processing module, configured to determine the fusion similarity between any two alarm logs among the N alarm logs, the fusion similarity being used to characterize the degree of similarity between the two alarm logs; a second processing module, configured to perform clustering processing on the N alarm logs according to the fusion similarity to obtain K clusters, where K is an integer and 1≤K<N, and the fusion similarity between any two alarm logs in the clusters is greater than a preset threshold; a third processing module, configured to determine at least one representative alarm log in each of the K clusters, the representative alarm log being the alarm log in the cluster with the highest average fusion similarity to other alarm logs; and a sending module, configured to send the at least one representative alarm log in each of the K clusters to a client.

[0032] In an optional embodiment, the apparatus further includes: a fourth processing module, configured to preprocess the N alarm logs to obtain a set of effective lexical units for each of the N alarm logs, wherein the set of effective lexical units includes at least one effective lexical unit, and the preprocessing includes at least one of log sorting, log deduplication, log cleaning, log word segmentation, stop word removal, and lexical standardization; the first processing module includes: a first processing unit, configured to determine the fusion similarity between any two alarm logs among the N alarm logs based on the set of effective lexical units.

[0033] In an optional implementation, the first processing unit includes: a first processing subunit, configured to input each valid word in the set of valid words of the i-th alarm log into a continuous bag-of-words model to obtain a word vector for each valid word, i = 1, 2, ..., N; a second processing subunit, configured to determine the word frequency feature of each valid word in the set of valid words of the i-th alarm log according to a word frequency-inverse document frequency model; a third processing subunit, configured to determine the semantic features of the i-th alarm log according to the word vector and the word frequency feature of each valid word in the set of valid words; and a fourth processing subunit, configured to determine the semantic features of the i-th alarm log according to the first alarm date. The semantic features of the first alarm log and the semantic features of the second alarm log determine the text similarity between the first alarm log and the second alarm log, wherein the first alarm log is one of the two alarm logs and the second alarm log is the other of the two alarm logs; a fifth processing subunit is configured to determine the topological relevance between the first alarm log and the second alarm log based on a software topology diagram and a hardware topology diagram used to describe the architecture of the storage device; a sixth processing subunit is configured to determine the fusion similarity between the first alarm log and the second alarm log based on the text similarity and the topological relevance.

[0034] In an optional implementation, the sixth processing subunit includes: a first calculation unit, configured to determine the fusion similarity based on the weight coefficients, the text similarity, and the topological relevance using the following formula:

[0035] similarity(a,b)=α×texual(a,b)+(1-α)×correlation(a,b)

[0036] Wherein, a represents the first alarm log, b represents the second alarm log, similarity(a,b) represents the fusion similarity between the first alarm log and the second alarm log, α represents the weight coefficient, texual(a,b) represents the text similarity between the first alarm log and the second alarm log, and correlation(a,b) represents the topological correlation between the first alarm log and the second alarm log.

[0037] In one optional implementation, the second processing module includes: a second processing unit, configured to perform clustering processing on the N alarm logs using the fusion similarity as a distance metric according to a density-based clustering algorithm, to obtain the K clusters.

[0038] In one optional implementation, the apparatus further includes a deletion module for deleting redundant alarm logs, wherein the redundant alarm logs are alarm logs in each of the K clusters other than the at least one representative alarm log.

[0039] In one optional implementation, the acquisition module further includes: a first acquisition unit, configured to acquire the N alarm logs from a lightweight acquisition client.

[0040] Thirdly, the present invention provides a computer device, comprising: a memory and a processor, the memory and the processor being communicatively connected to each other, the memory storing computer instructions, and the processor executing the computer instructions to perform the method described in the first aspect or any corresponding embodiment thereof.

[0041] Fourthly, the present invention provides a computer-readable storage medium storing computer instructions for causing a computer to perform the method described in the first aspect or any corresponding embodiment thereof. Attached Figure Description

[0042] To more clearly illustrate the specific embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the specific embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are some embodiments of the present invention. For those skilled in the art, other drawings can be obtained from these drawings without creative effort.

[0043] Figure 1 This is a flowchart illustrating an alarm log push method according to an embodiment of the present invention;

[0044] Figure 2 This is a flowchart illustrating another alarm log push method according to an embodiment of the present invention;

[0045] Figure 3 This is a flowchart illustrating the alarm log preprocessing process according to an embodiment of the present invention;

[0046] Figure 4 This is a schematic diagram of a continuous bag-of-words model according to an embodiment of the present invention;

[0047] Figure 5 This is a flowchart illustrating the process of determining the semantic features of alarm logs according to an embodiment of the present invention;

[0048] Figure 6 This is a hardware topology diagram of a storage device according to an embodiment of the present invention;

[0049] Figure 7This is a flowchart illustrating another alarm log push method according to an embodiment of the present invention;

[0050] Figure 8 This is a schematic diagram of the process for determining clusters according to an embodiment of the present invention;

[0051] Figure 9 This is a schematic diagram of the process for determining at least one representative alarm log according to an embodiment of the present invention;

[0052] Figure 10 This is a structural block diagram of an alarm log push device according to an embodiment of the present invention;

[0053] Figure 11 This is a schematic diagram of the hardware structure of a computer device according to an embodiment of the present invention. Detailed Implementation

[0054] To make the objectives, technical solutions, and advantages of the embodiments of the present invention clearer, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.

[0055] This invention provides an alarm log push method that can be applied to storage devices in systems such as servers and hosts.

[0056] The alarm log push method provided in this invention can accurately classify all similar or related alarm logs from multiple alarm logs and push representative alarm logs from different categories to operations and maintenance personnel. This can reduce the number of alarm logs that operations and maintenance personnel need to analyze, thereby improving the efficiency and accuracy of alarm processing.

[0057] According to an embodiment of the present invention, an alarm log push method embodiment is provided. It should be noted that the steps shown in the flowchart in the accompanying drawings can be executed in a computer system such as a set of computer-executable instructions. Furthermore, although a logical order is shown in the flowchart, in some cases, the steps shown or described may be executed in a different order than that shown here.

[0058] This embodiment provides an alarm log push method, which can be used in the storage devices of the aforementioned server, host, and other systems. Figure 1 This is a flowchart illustrating the alarm log push method according to an embodiment of the present invention, as shown below. Figure 1 As shown, the method includes the following steps:

[0059] Step S101: Obtain N alarm logs.

[0060] Where N is an integer greater than or equal to 2.

[0061] For example, alarm logs include information such as alarm time, level, alarm object, or alarm details, and multiple alarm logs can be stored in a database. Specifically, after obtaining multiple alarm logs, an index can be created for each alarm log to facilitate querying.

[0062] Step S102: Determine the fusion similarity between any two alarm logs among the N alarm logs.

[0063] Among them, the fusion similarity is used to characterize the degree of similarity between any two alarm logs.

[0064] For example, if four alarm logs a1, a2, a3 and a4 are obtained, the fusion similarity between a1 and a2, the fusion similarity between a1 and a3, the fusion similarity between a1 and a4, the fusion similarity between a2 and a3, the fusion similarity between a2 and a4, and the fusion similarity between a3 and a4 are determined.

[0065] Specifically, by fusing similarity, it is possible to identify alarm logs that are similar or related among multiple alarm logs.

[0066] Step S103: Cluster the N alarm logs according to the fusion similarity to obtain K clusters.

[0067] Where K is an integer, and 1 ≤ K < N. The fusion similarity between any two alarm logs in a cluster is greater than a preset threshold. The preset threshold can be configured by operations and maintenance personnel according to actual needs.

[0068] Specifically, through clustering, alarm logs with high similarity (greater than or equal to a preset threshold) can be grouped together.

[0069] For example, clustering algorithms such as K-means clustering, spectral clustering, or density-based spatial clustering of applications with noise (DBSCAN) can be used to cluster N alarm logs.

[0070] Step S104: Determine at least one representative alarm log in each of the K clusters.

[0071] Among them, the representative alarm log is the alarm log with the highest average fusion similarity with other alarm logs in the cluster.

[0072] Specifically, a cluster can have only one representative alarm log or multiple representative alarm logs.

[0073] Step S105: Send at least one representative alarm log from each of the K clusters to the client.

[0074] For example, the client can be a terminal device such as a computer or laptop.

[0075] Specifically, by identifying at least one representative alarm log in each of multiple clusters and sending at least one representative alarm log to the client, the number of alarm logs that operations and maintenance personnel need to analyze can be reduced, thereby improving the processing efficiency and accuracy of alarm logs.

[0076] The alarm log push method provided in this embodiment, after acquiring multiple alarm logs, first determines the fusion similarity between any two alarm logs. Secondly, it clusters the multiple alarm logs based on the fusion similarity, accurately classifying alarm logs reflecting different aspects of the fault. Then, it identifies at least one representative alarm log in each of the multiple clusters and sends this representative alarm log to the client, reducing the number of alarm logs that maintenance personnel need to analyze. In other words, through the above steps, the storage device can identify similar or related alarm logs, classify the alarm logs, and reduce the number of alarm logs that maintenance personnel need to analyze, alleviating their workload and improving the efficiency and accuracy of alarm information processing.

[0077] This embodiment provides an alarm log push method, which can be used in the aforementioned storage device. Figure 2 This is a flowchart illustrating the alarm log push method according to an embodiment of the present invention, as shown below. Figure 2 As shown, the method includes the following steps:

[0078] Step S201: Obtain N alarm logs.

[0079] Please see details Figure 1 Step S101 of the illustrated embodiment will not be described again here.

[0080] Step S202: Preprocess the N alarm logs to obtain the set of valid words for each alarm log in the N alarm logs.

[0081] The set of valid tokens includes at least one token. That is, the set of valid tokens includes one or more tokens. Preprocessing includes at least one of the following: log sorting, log deduplication, log cleaning, log segmentation, stop word removal, and token standardization.

[0082] In some alternative implementations, see [link to implementation details]. Figure 3 Step S202 may include the following steps:

[0083] Step S2021: Sort the N alarm logs.

[0084] Specifically, log sorting can refer to arranging N alarm logs in descending (or ascending) order according to their alarm time. The alarm logs include alarm time information.

[0085] Step S2022: Deduplicate the N alarm logs.

[0086] Specifically, after sorting the N alarm logs, deduplication is performed on the sorted N alarm logs. Deduplication involves deleting duplicate alarm logs from the N alarm logs.

[0087] Because storage devices log information too frequently, a large number of duplicate alarm logs may be present among the N alarm logs obtained, which is detrimental to subsequent analysis and processing. For example, a fixed time window can be set to deduplicate identical alarm logs within that window, retaining only the first alarm log, thus achieving initial compression of the alarm logs.

[0088] Step S2023: Clean the N alarm logs.

[0089] Specifically, after deduplicating N alarm logs, log cleaning is performed on the deduplicated N alarm logs. Log cleaning can involve replacing variables appearing in the alarm logs with common strings.

[0090] For example, alarm log content is semi-structured, and variables can include Internet Protocol (IP) addresses, port numbers, path addresses, website addresses, and email addresses. Various IP addresses can be replaced with "ipaddr," and various email addresses can be replaced with "email." Log cleaning can reduce storage space and eliminate noisy data in alarm logs.

[0091] Step S2024: Perform log segmentation on the N alarm logs.

[0092] Specifically, after cleaning N alarm logs, log segmentation is performed on the cleaned N alarm logs. Log segmentation converts the data in the alarm logs into a set of tokens.

[0093] For example, tokens are segmented based on the concatenation format of alarm logs to obtain a token set. For instance, if alarm logs are concatenated using camelCase, tokens are segmented using uppercase letters; if alarm logs are concatenated using underscores, tokens are segmented using underscores.

[0094] Step S2025: Remove stop words from N alarm logs.

[0095] Specifically, after obtaining N token sets, stop words are removed from each of the N token sets.

[0096] Stop word removal refers to removing tokens from a token set that lack practical meaning (substantive information). For example, deleting words like "the," "to," "in," and "is" from the token set. For instance, all stop words in the token set can be deleted based on a general stop word list. Specifically, the token set is iterated through; if a token in the token set is in the general stop word list, it is deleted; otherwise, it is retained.

[0097] Step S2026: Perform lexical standardization on N alarm logs.

[0098] Specifically, token standardization refers to restoring the token to its original form. After removing stop words from N token sets, token standardization is performed on each of the N token sets to obtain N valid token sets. These N valid token sets correspond one-to-one with N alarm logs.

[0099] Token standardization can include capitalization, stemming, and lemma reduction. For example, capitalization can convert a token with an initial capital letter to lowercase, such as Error --> error. Stemming can remove prefixes and suffixes from the token to obtain its root word, such as errors -> error, plays / playing / played -> play. Lemma reduction can transform complex token forms into basic forms based on a dictionary, such as drive -> drive, is / are / been -> be.

[0100] Step S203: Determine the fusion similarity between any two alarm logs among the N alarm logs based on the effective word set.

[0101] Specifically, step S203 is Figure 1 One specific implementation of step S102 in the illustrated embodiment.

[0102] For example, step S203 above may include the following steps:

[0103] Step S2031: Input each valid word in the set of valid words of the i-th alarm log into the continuous bag-of-words model to obtain the word vector of each valid word.

[0104] Where i = 1, 2, ..., N.

[0105] Specifically, the Continuous Bag of Words (CBOW) model is a neural network model used to generate word vectors. Word vectors (semantic features) are a method of representing words as fixed-length real-valued vectors that capture the semantic and syntactic relationships between word contexts. For example, the CBOW model can be used as follows... Figure 4 As shown.

[0106] For example, for each term W in the valid token set j i Employing the CBOW model, a common approach in Natural Language Processing (NLP), this method extracts the common semantics between tokens and their neighboring tokens, trains a neural network to mine the contextual information of the tokens, and extracts the token's lexical vectors. Among them, W j i Let represent the j-th token in the set of valid tokens for the i-th alarm log, and l represent the total number of tokens in the set of valid tokens.

[0107] The following examples illustrate the process by which this invention determines the lexical vector of each token in a valid token set.

[0108] Suppose we obtain three alert logs, and after processing, the valid token sets for these three alert logs are {jdbcpool status run}, {node ping alert host check time out}, and {disk current valueexceed threshold}. When determining the lexical vector for each token, we can first build a vocabulary based on the above valid token sets, listing all unique tokens in the corpus. For example, the vocabulary might include {"jdbc", "pool", "status", "run", "node", "ping", "alert", "host", "check", "time", "out", "disk", "current", "value", "exceed", "threshold"}.

[0109] Secondly, training samples for the CBOW model can be created based on the vocabulary. These training samples include context words (w) within a window (e.g., a window size of 2). j-1 ,w j+1 ) and the target word w corresponding to the preceding and following word words j Specifically, a fixed-size window can be used to slide through the valid token set to obtain training samples. For example, when the window size is 2, the input of one training sample can be [jdbc, status] and the target can be [pool], while the input of another training sample can be [pool, run] and the target can be [status].

[0110] After obtaining multiple training samples, a CBOW model can be constructed (e.g., Figure 4 (As shown) and the CBOW model is trained using the training samples described above. For example, this can be achieved by maximizing the likelihood function of the probability distribution of the target word (L = log(w...). j |w j-1 ,w j+1 The CBOW model is trained by taking the context token vectors as input and outputting the probability distribution of the target token. After training, the CBOW model determines the token vector for each token. For example, the token vector for "jdbc" might be [0.2, -0.4, 0.1, ...], and the token vector for "pool" might be [0.3, 0.5, -0.2, ...].

[0111] Step S2032: Determine the word frequency features of each valid word in the valid word set of the i-th alarm log according to the word frequency-inverse document frequency model.

[0112] Specifically, the word frequency features of effective lexical units can characterize the semantic contribution of those effective lexical units.

[0113] In the Term Frequency-Inverse Document Frequency (TF-IDF) model, term frequency represents the frequency with which a lexical term appears in the corresponding set of valid lexical terms. j i The word frequency can be determined by formula (1).

[0114]

[0115] Among them, TF i,j W j i word frequency, n i,j W j i The number of times W appears in the corresponding set of valid lexical units, where l represents the number of times W appears. j i The total number of lexical units in the corresponding set of valid lexical units.

[0116] In the TF-IDF model, inverse document frequency (IVF) represents the number of times a lexical term appears in different positions across the entire set of valid lexical terms. The fewer times a lexical term appears in different positions, the more important it is. j i The inverse document frequency can be determined by formula (2).

[0117]

[0118] Among them, IDF i,j W j i Inverse document frequency. W h i Location Defined as the ordinal position (kth lexical unit) of the occurrence of the lexical unit. It is the longest length among all valid token sets. In formula (2) The value can be determined by formula (3).

[0119]

[0120] The TF-IDF model can be expressed as the product of TF and IDF. In other words, W... j i The word frequency characteristics can be determined by formula (4).

[0121]

[0122] in, W j i The word frequency characteristics.

[0123] Step S2033: Determine the semantic features of the i-th alarm log based on the word vector and word frequency features of each valid word in the valid word set.

[0124] Specifically, such as Figure 5 As shown, the semantic features of each alarm log can be determined by weighted averaging the semantic features (lexical vectors) and word frequency features of each token in the valid token set.

[0125] For example, the semantic features of the i-th alarm log can be determined by formula (5).

[0126]

[0127] Among them, S i This represents the semantic features of the i-th alarm log. W j i The word vectors, W j i The word frequency characteristics.

[0128] Step S2034: Determine the text similarity between the first alarm log and the second alarm log based on the semantic features of the first alarm log and the second alarm log.

[0129] The first alarm log is one of the two alarm logs mentioned above, and the second alarm log is the other of the two alarm logs mentioned above.

[0130] Specifically, after obtaining the semantic features of each alarm log in the N alarm logs, the text similarity between any two alarm logs in the N alarm logs can be determined by formula (6).

[0131]

[0132] Where 'a' represents the first alarm log, 'b' represents the second alarm log, 'texual(a,b)' represents the text similarity between the first and second alarm logs, and S... a S represents the semantic features of the first alarm log. b The semantic features of the second alarm log are represented by T, where T represents the transpose of the vector and ||·|2 represents the l2 norm.

[0133] Step S2035: Determine the topology correlation between the first alarm log and the second alarm log based on the software topology diagram and hardware topology diagram used to describe the architecture of the storage device.

[0134] Specifically, storage devices exhibit two types of topology: software topology and hardware topology. Both the topological relationships between hardware components and the topological relationships between software components within a storage device can be represented by undirected graphs, namely, hardware topology graphs and software topology graphs. For example, hardware and software topology graphs can be obtained from a Configuration Management Database (CMDB). The alarm object field included in alarm logs can identify the hardware and software that generated the alarm log.

[0135] For example, the topological correlation between any two alarm logs in N alarm logs can be determined by formula (7).

[0136]

[0137] Where correlation(a,b) represents the topological correlation between the first and second alarm logs, and topological(a,b) represents the topological distance between the first and second alarm logs. m node_s n It is the set of nodes Λ of the software topology graph. s For any two nodes in the path, max(path) software (node_s m ,node_s n )) represents the maximum value of the shortest path lengths between any two nodes in the software topology graph, node_h p node_h q It is the set of nodes Λ in the hardware topology graph h For any two nodes in the path, max(path) handwarw (node_h p ,node_h q )) represents the maximum value of the shortest path length between any two nodes in the hardware topology graph.

[0138] For example, topological(a,b) can be determined by formula (8).

[0139] topological(a,b) = path software (a,b)+path hardware (a,b) (8)

[0140] Where, path software (a, b) represents the shortest path length between the two corresponding nodes on the software topology graph of the software corresponding to the first alarm log and the software corresponding to the second alarm log.hardware (a,b) represents the shortest path length between the two nodes corresponding to the hardware corresponding to the first alarm log and the hardware corresponding to the second alarm log on the hardware topology graph.

[0141] For example, the hardware topology of a storage device can be as follows: Figure 5 As shown below. Figure 5 Explain path hardware The calculation process of (a,b).

[0142] Assuming the first alarm log 'a' is generated by storage pool 11 and the second alarm log 'b' is generated by switch 31, as shown in the diagram, there are 6 paths connecting storage pool 11 to switch 31: storage pool 11-volume 21-switch 31, storage pool 11-volume 21-switch 32-switch 31, storage pool 11-volume 22-switch 32-switch 31, storage pool 11-volume 22-switch 32-volume 21-switch 31, storage pool 11-volume 23-switch 32-volume 21-switch 31, and storage pool 11-volume 23-switch 32-volume 21-switch 31. The shortest path is: storage pool 11-volume 21-switch 31, and the shortest path length is 3. Therefore, the path is... hardware (a, b) can be 3. If the first alarm log a and the second alarm log b come from the same hardware, for example, both from volume 21, then the path... hardware (a, b) can be 1; if the hardware from which the first alarm log a originates and the hardware from which the second alarm log b originates are not connected in the hardware topology graph, then path hardware (a,b) can be 0.

[0143] Additionally, path software The calculation process of (a,b) and path hardware (a,b) is similar and will not be explained in detail here.

[0144] Furthermore, with Figure 6The hardware topology diagram shown includes 10 nodes. Calculating the maximum shortest path length between any two nodes involves determining the shortest path length between any two nodes in the hardware topology diagram and then taking the maximum value. For example, the shortest path length from storage pool 11 to volumes 21, 22, and 23 is equal, both being 2; the shortest path length from storage pool 11 to switches 31 and 32 is equal, both being 3; the shortest path length from storage pool 11 to host 41 is 4; and the shortest path length from storage pool 11 to host disks 51, 52, and 53 is equal, both being 5. Therefore, the maximum shortest path length from storage pool 11 to all other nodes is 5. Similarly, the maximum topology distance from volumes 21, 22, 23, switches 31 and 32, host 41, host disks 51, 52, and 53 to other nodes can be obtained. Finally, taking the maximum value yields the maximum shortest path length between any two nodes.

[0145] Step S2036: Determine the fusion similarity between the first alarm log and the second alarm log based on text similarity and topological relevance.

[0146] For example, the fusion similarity can be determined by formula (9) based on the weight coefficient, text similarity and topological relevance.

[0147] similarity(a,b)=α×texual(a,b)+(1-α)×correlation(a,b) (9)

[0148] Here, similarity(a,b) represents the fusion similarity between the first and second alarm logs, and α is the weighting coefficient. α can be configured by operations personnel based on experience; for example, α can be 0.6.

[0149] Step S204: Cluster the N alarm logs according to the fusion similarity to obtain K clusters.

[0150] Please see details Figure 1 Step S103 of the illustrated embodiment will not be described again here.

[0151] Step S205: Determine at least one representative alarm log in each of the K clusters.

[0152] Please see details Figure 1 Step S104 of the illustrated embodiment will not be described again here.

[0153] Step S206: Send at least one representative alarm log from each of the K clusters to the client.

[0154] Please see details Figure 1Step S105 of the illustrated embodiment will not be described again here.

[0155] The alarm log push method provided in this embodiment preprocesses the N alarm logs before determining the fusion similarity of any two alarm logs. This reduces noise in the original alarm logs, lowers the difficulty of determining fusion similarity, and improves the processing efficiency of alarm logs. Furthermore, this embodiment mines the semantic features of effective words using the CBOW model and analyzes the frequency features of effective words using the TF-IDF model. Fusing the semantic and frequency features allows for more accurate semantic analysis of the alarm logs. The alarm object field then links the alarm logs to the software and hardware topology of the storage device, analyzing the correlation between alarm logs in a spatial dimension, which more accurately determines the fusion similarity and improves clustering accuracy. Additionally, this embodiment determines the fusion similarity based on text similarity obtained from the CBOW and TF-IDF models and clusters the N alarm logs using a clustering algorithm. This eliminates the need for operations and maintenance experts to configure different rules for different scenarios and conditions, reduces the requirement for knowledge in the operations and maintenance domain, decreases the number of aggregation rules, and has strong adaptability to new scenarios. Adding new fault logs does not require additional configuration of new aggregation rules.

[0156] This embodiment provides an alarm log push method, which can be used in the aforementioned storage device. Figure 7 This is a flowchart illustrating the alarm log push method according to an embodiment of the present invention, as shown below. Figure 7 As shown, the method includes the following steps:

[0157] Step S701: Obtain N alarm logs.

[0158] In some alternative implementations, N alarm logs can be retrieved from a lightweight acquisition client. For example, the lightweight acquisition client can be a device such as Logstash, Filebeat, Fluentd, Logagent, or rsyslog.

[0159] Specifically, the lightweight data collection client can monitor alarm logs generated during system operation in real time, and collect and integrate alarm logs from different sources. In addition, the lightweight data collection client can parse the collected alarm logs and extract key information from the original alarm logs using regular expressions. The collected fields include alarm time, level, alarm object, and alarm details.

[0160] Step S702: Determine the fusion similarity between any two alarm logs among the N alarm logs.

[0161] In some alternative implementations, the fusion similarity can be determined using a pre-trained machine learning module. Specifically, the input to the pre-trained machine learning model is any two alarm logs, and the output is the fusion similarity.

[0162] Step S703: Based on the density-based clustering algorithm, cluster the N alarm logs using fusion similarity as the distance metric to obtain K clusters.

[0163] Specifically, the DBSCAN clustering algorithm does not require specifying the number of clusters and has a faster clustering speed.

[0164] The following is combined with Figure 8 The above step S703 will be explained in detail.

[0165] For example, step S703 above may include the following steps:

[0166] Step S7031: Determine whether the i-th alarm log is a core alarm log.

[0167] Among them, the core alarm log is the alarm log that has a fusion similarity greater than the preset fusion similarity d with other alarm logs and the number of other alarm logs is greater than the preset number threshold minPoints.

[0168] Specifically, iterate through N alarm logs and obtain all alarm logs whose fusion similarity with the i-th alarm log is greater than d. If the total number of alarm logs is greater than minPoints, then the i-th alarm log is considered a core alarm log, and then step S7032 is executed. If the i-th alarm log is determined to be a core alarm log, then the i-th alarm log is added to the core alarm log list N.

[0169] If the i-th alarm log is not a core alarm log, then set i = i + 1 and re-execute step S7031.

[0170] In addition, if the similarity between an alarm log and any other alarm log is less than the preset similarity, the alarm log will be marked as a noisy alarm log and deleted to save storage space.

[0171] Step S7032: A temporary cluster is formed with the i-th alarm log as the core.

[0172] Specifically, if the i-th alarm log is a core alarm log, then the i-th alarm log and all alarm logs whose fusion similarity with the i-th alarm log is greater than d form a temporary cluster.

[0173] Step S7033: Determine whether i is N.

[0174] Specifically, if the i-th alarm log is the N-th alarm log, execute step S7034. If the i-th alarm log is not the N-th alarm log, set i = i + 1 and re-execute step S7031.

[0175] Step S7034: Merge the temporary clusters to obtain K clusters.

[0176] For example, temporary clusters of alarm logs that are density-connected are merged to obtain a new cluster. Specifically, it is determined whether each alarm log in the current temporary cluster is a core alarm log. If the alarm log is a core alarm log, the current temporary cluster is merged with the temporary cluster formed with that alarm log as the core, and the merging operation is repeated until all temporary clusters are processed.

[0177] Step S704: Determine at least one representative alarm log in each of the K clusters.

[0178] Specifically, the representative alarm log is the alarm log with the highest average fusion similarity to other alarm logs in the cluster.

[0179] For example, the representative alarm logs of the cluster are recorded as the centroid of the cluster. k The centroid of a cluster can be determined using formula (10).

[0180]

[0181] Here, centroid represents the centroid of the cluster, and n represents the total number of alarm logs in the cluster. c and d are any two distinct alarm logs in the cluster, where c represents the c-th alarm log and d represents the d-th alarm log, and 1 ≤ c ≤ n and 1 ≤ d ≤ n. Specifically, if the sum of the fusion similarities between alarm log c and other logs d (c is different from d) within the cluster is maximized, then c is the centroid of the cluster.

[0182] When the number of representative alarm logs is m (2≤m<<n), combined with Figure 9 The process of determining multiple representative alarm logs for clusters is explained.

[0183] The process of identifying multiple representative alarm logs may include the following steps:

[0184] Step S7041: Determine the preset number of representative alarm logs.

[0185] For example, the default number of representative alarm logs can be set to 5.

[0186] Step S7042: Determine the centroid of the cluster and add the centroid to the representative alarm log set.

[0187] Specifically, each alarm log in the cluster is traversed, and the centroid of the cluster is determined using the formula (10) above. After determining the centroid of the cluster, the centroid is added to the representative alarm log set Ω. k .

[0188] Step S7043: Delete the centroid from the cluster and update the cluster.

[0189] Step S7044: Determine whether the number of alarm logs in the representative alarm log set is less than the preset number.

[0190] Specifically, if the number of alarm logs in the representative alarm log set is less than a preset number, step S7042 is re-executed. The process ends if the number of alarm logs in the representative alarm log set is greater than or equal to the preset number. Afterwards, the representative alarm log set can be sent to the client.

[0191] Step S705: Delete redundant alarm logs.

[0192] The redundant alarm logs are the alarm logs in each of the K clusters, excluding at least one representative alarm log.

[0193] Specifically, deleting redundant alarm logs can save storage space on storage devices.

[0194] Step S706: Send at least one representative alarm log from each of the K clusters to the client.

[0195] Please see details Figure 1 Step S105 of the illustrated embodiment will not be described again here.

[0196] This embodiment also provides an alarm log push device, which is used to implement the above embodiments and preferred embodiments; details already described will not be repeated. As used below, the term "module" can be a combination of software and / or hardware that implements a predetermined function. Although the device described in the following embodiments is preferably implemented in software, hardware implementation, or a combination of software and hardware, is also possible and contemplated.

[0197] This embodiment provides an alarm log push device, such as... Figure 10 As shown, it includes:

[0198] Module 1001 is used to acquire N alarm logs. Where N is an integer greater than or equal to 2.

[0199] The first processing module 1002 is used to determine the fusion similarity between any two alarm logs from N alarm logs. The fusion similarity is used to characterize the degree of similarity between any two alarm logs.

[0200] The second processing module 1003 is used to cluster N alarm logs based on their fusion similarity to obtain K clusters. Here, K is an integer, and 1≤K<N, and the fusion similarity between any two alarm logs in a cluster is greater than a preset threshold.

[0201] The third processing module 1004 is used to determine at least one representative alarm log in each of the K clusters. The representative alarm log is the alarm log in the cluster with the highest average fusion similarity to other alarm logs.

[0202] The sending module 1005 is used to send at least one representative alarm log from each of the K clusters to the client.

[0203] In some alternative embodiments, the apparatus further includes:

[0204] The fourth processing module is used to preprocess N alarm logs to obtain a set of valid tokens for each of the N alarm logs. The set of valid tokens includes at least one valid token. Preprocessing includes at least one of the following: log sorting, log deduplication, log cleaning, log segmentation, stop word removal, and token standardization.

[0205] The first processing module 1002 includes:

[0206] The first processing unit is used to determine the fusion similarity between any two alarm logs among N alarm logs based on the effective lexical set.

[0207] In some optional implementations, the first processing unit includes:

[0208] The first processing subunit is used to input each valid word from the set of valid words in the i-th alarm log into the continuous bag-of-words model to obtain the word vector of each valid word. i = 1, 2, ..., N.

[0209] The second processing subunit is used to determine the term frequency features of each valid term in the set of valid terms of the i-th alarm log according to the term frequency-inverse document frequency model;

[0210] The third processing subunit is used to determine the semantic features of the i-th alarm log based on the lexical vector and word frequency features of each valid lexical in the set of valid lexicals.

[0211] The fourth processing subunit is used to determine the text similarity between the first alarm log and the second alarm log based on the semantic features of the first alarm log and the second alarm log. The first alarm log is one of any two alarm logs, and the second alarm log is the other of any two alarm logs.

[0212] The fifth processing subunit is used to determine the topology correlation between the first alarm log and the second alarm log based on the software topology diagram and hardware topology diagram used to describe the architecture of the storage device.

[0213] The sixth processing subunit is used to determine the fusion similarity between the first alarm log and the second alarm log based on text similarity and topological relevance.

[0214] In some alternative implementations, the sixth processing subunit includes:

[0215] The first calculation unit determines the fusion similarity based on the weighting coefficients, text similarity, and topological relevance using the following formula:

[0216] similarity(a,b)=α×texual(a,b)+(1-α)×correlation(a,b)

[0217] Where a represents the first alarm log, b represents the second alarm log, similarity(a,b) represents the fusion similarity between the first alarm log and the second alarm log, α represents the weight coefficient, texual(a,b) represents the text similarity between the first alarm log and the second alarm log, and correlation(a,b) represents the topological correlation between the first alarm log and the second alarm log.

[0218] In some optional implementations, the third processing subunit includes:

[0219] The second calculation unit is used to determine the semantic features of the i-th alarm log based on the word vector and word frequency features of each valid word in the valid word set, using the following formula:

[0220]

[0221] Among them, S i Let be the semantic features of the i-th alarm log. For W j i The word vectors, For W j i Word frequency features, W j iLet be the j-th valid word in the set of valid words for the i-th alarm log, and l represent the total number of words in the set of valid words.

[0222] In some alternative implementations, the fourth processing subunit includes:

[0223] The third calculation unit is used to determine the text similarity based on the semantic features of the first alarm log and the semantic features of the second alarm log using the following formula:

[0224]

[0225] Where a represents the first alarm log, b represents the second alarm log, and texual(a,b) represents the text similarity between the first and second alarm logs. a S is the semantic feature of the first alarm log. b The semantic features of the second alarm log are defined by T, which represents the transpose of the vector, and ||·|2 represents the l2 norm.

[0226] In some optional implementations, the second processing module 1003 includes:

[0227] The second processing unit is used to cluster N alarm logs using a density-based clustering algorithm, with fusion similarity as the distance metric, to obtain K clusters.

[0228] In some optional implementations, the third processing module 1004 includes:

[0229] The third processing unit is used to determine the representative alarm logs for each of the K clusters using the following formula:

[0230]

[0231] Where centroid is the representative alarm log, n is the total number of alarm logs in the corresponding cluster, c and d are any two different alarm logs in the corresponding cluster, 1≤c≤n, 1≤d≤n.

[0232] In some alternative embodiments, the apparatus further includes:

[0233] The deletion module is used to delete redundant alarm logs. Redundant alarm logs are alarm logs in each of the K clusters, excluding at least one representative alarm log.

[0234] In some optional implementations, the acquisition module 1001 further includes:

[0235] The first acquisition unit is used to acquire N alarm logs from the lightweight acquisition client.

[0236] Further functional descriptions of the above modules and units are the same as those in the corresponding embodiments described above, and will not be repeated here.

[0237] In this embodiment, the alarm log push device is presented in the form of a functional unit. Here, a unit refers to an application-specific integrated circuit (ASIC), a processor and memory that execute one or more software or fixed programs, and / or other devices that can provide the above functions.

[0238] This invention also provides a computer device having the above-described features. Figure 10 The alarm log push device shown.

[0239] Please see Figure 11 , Figure 11 This is a schematic diagram of the structure of a computer device provided in an optional embodiment of the present invention, such as... Figure 11 As shown, the computer device includes one or more processors 1110, memory 1120, and interfaces for connecting the components, including high-speed interfaces and low-speed interfaces. The components communicate with each other via different buses and can be mounted on a common motherboard or otherwise installed as needed. The processors can process instructions executed within the computer device, including instructions stored in or on memory to display graphical information of a GUI on external input / output devices (such as display devices coupled to the interfaces). In some alternative implementations, multiple processors and / or multiple buses can be used with multiple memories and multiple memory modules, if desired. Similarly, multiple computer devices can be connected, each providing some of the necessary operations (e.g., as a server array, a group of blade servers, or a multiprocessor system). Figure 11 Take the 1110 processor as an example.

[0240] Processor 1110 may be a central processing unit, a network processor, or a combination thereof. Processor 1110 may further include a hardware chip. The hardware chip may be an application-specific integrated circuit (ASIC), a programmable logic device (PLD), or a combination thereof. The programmable logic device may be a complex programmable logic device (CAMP), a field-programmable gate array (FPGA), a general-purpose array logic (GDA), or any combination thereof.

[0241] The memory 1120 stores instructions executable by at least one processor 1110 to cause the at least one processor 1110 to perform the method shown in the above embodiments.

[0242] The memory 1120 may include a program storage area and a data storage area. The program storage area may store the operating system and applications required for at least one function; the data storage area may store data created based on the use of the computer device. Furthermore, the memory 1120 may include high-speed random access memory and may also include non-transitory memory, such as at least one disk storage device, flash memory device, or other non-transitory solid-state storage device. In some alternative embodiments, the memory 1120 may optionally include memory remotely located relative to the processor 1110, and these remote memories may be connected to the computer device via a network. Examples of such networks include, but are not limited to, the Internet, intranets, local area networks, mobile communication networks, and combinations thereof.

[0243] The memory 1120 may include volatile memory, such as random access memory; the memory may also include non-volatile memory, such as flash memory, hard disk or solid-state drive; the memory 1120 may also include a combination of the above types of memory.

[0244] The computer device also includes a communication interface 1130 for communicating with other devices or communication networks.

[0245] This invention also provides a computer-readable storage medium. The methods described above according to embodiments of the invention can be implemented in hardware or firmware, or implemented as computer code that can be recorded on a storage medium, or implemented as computer code downloaded via a network and originally stored on a remote storage medium or a non-transitory machine-readable storage medium and then stored on a local storage medium. Thus, the methods described herein can be processed by software stored on a storage medium using a general-purpose computer, a dedicated processor, or programmable or dedicated hardware. The storage medium can be a magnetic disk, optical disk, read-only memory, random access memory, flash memory, hard disk, or solid-state drive, etc.; further, the storage medium can also include combinations of the above types of memory. It is understood that computers, processors, microprocessor controllers, or programmable hardware include storage components capable of storing or receiving software or computer code, which, when accessed and executed by the computer, processor, or hardware, implements the methods shown in the above embodiments.

[0246] Although embodiments of the invention have been described in conjunction with the accompanying drawings, those skilled in the art can make various modifications and variations without departing from the spirit and scope of the invention, and such modifications and variations all fall within the scope defined by the appended claims.

Claims

1. A method for pushing alarm logs, characterized in that, Applied to a storage device, the method includes: Retrieve N alarm logs, where N is an integer greater than or equal to 2; The N alarm logs are preprocessed to obtain a set of valid words for each alarm log. The set of valid words includes at least one valid word. The preprocessing includes at least one of the following: log sorting, log deduplication, log cleaning, log word segmentation, stop word removal, and word standardization. The fusion similarity between any two alarm logs in the N alarm logs is determined based on the set of effective lexical units, and the fusion similarity is used to characterize the degree of similarity between any two alarm logs. The N alarm logs are clustered based on the fusion similarity to obtain K clusters, where K is an integer and 1≤K<N. The fusion similarity between any two alarm logs in the cluster is greater than a preset threshold. Identify at least one representative alarm log in each of the K clusters, wherein the representative alarm log is the alarm log in the cluster with the highest average fusion similarity to other alarm logs; Send at least one representative alarm log from each of the K clusters to the client; Determining the fusion similarity between any two alarm logs among the N alarm logs based on the set of effective lexical terms includes: The first i Each valid word in the set of valid words in the alarm logs is input into a continuous bag-of-words model to obtain a word vector for each valid word. i =1, 2, ..., N; The term is determined based on the term frequency-inverse document frequency model. i The word frequency features of each valid word in the set of valid words in each alarm log; The term is determined based on the lexical vector and the word frequency feature of each valid lexical in the set of valid lexical elements. i Semantic features of alarm logs; The text similarity between the first alarm log and the second alarm log is determined based on the semantic features of the first alarm log and the semantic features of the second alarm log. The first alarm log is one of the two alarm logs, and the second alarm log is the other of the two alarm logs. Based on the software and hardware topology diagrams used to describe the architecture of the storage device, the topology correlation between the first alarm log and the second alarm log is determined using the following formula: in, The topological correlation between the first alarm log and the second alarm log. The topological distance between the first alarm log and the second alarm log. , It is the set of nodes in the software topology graph. Any two nodes in, The maximum value of the shortest path length between any two nodes in the software topology graph. , It is the set of nodes in the hardware topology graph. Any two nodes in, The maximum value of the shortest path length between any two nodes in the hardware topology graph; The fusion similarity between the first alarm log and the second alarm log is determined based on the text similarity and the topological relevance.

2. The method according to claim 1, characterized in that, Determining the fusion similarity between the first alarm log and the second alarm log based on the text similarity and the topological relevance includes: The fusion similarity is determined using the following formula based on the weighting coefficients, the text similarity, and the topological relevance: in, This indicates the first alarm log. This indicates the second alarm log. This indicates the fusion similarity between the first alarm log and the second alarm log. This represents the weighting coefficient. The text similarity between the first alarm log and the second alarm log is indicated. This indicates the topological relevance between the first alarm log and the second alarm log.

3. The method according to claim 1, characterized in that, The step of determining the first term based on the lexical vector and the word frequency feature of each valid lexical in the set of valid lexical elements. i The semantic features of an alarm log include: Based on the lexical vector and word frequency feature of each valid lexical in the set of valid lexical elements, the term is determined by the following formula. i Semantic characteristics of an alarm log: in, For the first i Semantic features of an alarm log for The word vectors, for Word frequency features For the first i The j-th valid word in the set of valid words in the alarm log. The total number of lexical units in the set of valid lexical units.

4. The method according to claim 1, characterized in that, Determining the text similarity between the first alarm log and the second alarm log based on the semantic features of the first alarm log and the second alarm log includes: Based on the semantic features of the first alarm log and the semantic features of the second alarm log, the text similarity is determined using the following formula: in, This is the first alarm log. This is the second alarm log. The text similarity between the first alarm log and the second alarm log. The semantic features of the first alarm log. The semantic features of the second alarm log. T Represents the transpose of a vector. express Norm.

5. The method according to any one of claims 1 to 4, characterized in that, The N alarm logs are clustered based on the fusion similarity to obtain K clusters, including: Based on a density-based clustering algorithm, the N alarm logs are clustered using the fusion similarity as a distance metric to obtain the K clusters.

6. The method according to any one of claims 1 to 4, characterized in that, The step of determining at least one representative alarm log in each of the K clusters includes: The representative alarm log for each of the K clusters is determined using the following formula: in, This refers to the representative alarm log. n This indicates the total number of alarm logs in the corresponding cluster. c and d For any two distinct alarm logs in the corresponding cluster, 1≤ c ≤ n ,1≤ d ≤ n。 7. The method according to any one of claims 1 to 4, characterized in that, After determining at least one representative alarm log in each of the K clusters, the method further includes: Delete redundant alarm logs, which are alarm logs in each of the K clusters other than the at least one representative alarm log.

8. The method according to any one of claims 1 to 4, characterized in that, The acquisition of N alarm logs includes: Obtain the N alarm logs from the lightweight data collection client.

9. A device for pushing alarm logs, characterized in that, The device includes: The acquisition module is used to acquire N alarm logs, where N is an integer greater than or equal to 2; The fourth processing module is used to preprocess the N alarm logs to obtain a set of valid words for each alarm log in the N alarm logs. The set of valid words includes at least one valid word. The preprocessing includes at least one of the following: log sorting, log deduplication, log cleaning, log word segmentation, stop word removal, and word standardization. The first processing module is used to determine the fusion similarity between any two alarm logs among the N alarm logs based on the set of effective lexical units, wherein the fusion similarity is used to characterize the degree of similarity between any two alarm logs; The second processing module is used to cluster the N alarm logs according to the fusion similarity to obtain K clusters, where K is an integer and 1≤K<N, and the fusion similarity between any two alarm logs in the cluster is greater than a preset threshold. The third processing module is used to determine at least one representative alarm log in each of the K clusters, wherein the representative alarm log is the alarm log in the cluster with the highest average fusion similarity with other alarm logs; The sending module is used to send at least one representative alarm log from each of the K clusters to the client; The first processing module includes: processing the first... i Each valid word in the set of valid words in the alarm logs is input into a continuous bag-of-words model to obtain a word vector for each valid word. i =1, 2, ..., N; The term is determined based on the term frequency-inverse document frequency model. i The word frequency features of each valid word in the set of valid words in each alarm log; The term is determined based on the lexical vector and the word frequency feature of each valid lexical in the set of valid lexical elements. i Semantic features of alarm logs; The text similarity between the first alarm log and the second alarm log is determined based on the semantic features of the first alarm log and the semantic features of the second alarm log. The first alarm log is one of the two alarm logs, and the second alarm log is the other of the two alarm logs. Based on the software and hardware topology diagrams used to describe the architecture of the storage device, the topology correlation between the first alarm log and the second alarm log is determined using the following formula: in, The topological correlation between the first alarm log and the second alarm log. The topological distance between the first alarm log and the second alarm log. , It is the set of nodes in the software topology graph. Any two nodes in, The maximum value of the shortest path length between any two nodes in the software topology graph. , It is the set of nodes in the hardware topology graph. Any two nodes in, The maximum value of the shortest path length between any two nodes in the hardware topology graph; The fusion similarity between the first alarm log and the second alarm log is determined based on the text similarity and the topological relevance.

10. A computer device, characterized in that, include: A memory and a processor, the memory and the processor being communicatively connected to each other, the memory storing computer instructions, the processor executing the computer instructions to perform the method of any one of claims 1 to 8.

11. A computer-readable storage medium, characterized in that, The computer-readable storage medium stores computer instructions for causing a computer to perform the method of any one of claims 1 to 8.