Log anomaly detection method and device for improving security of distributed system
By using a BiGRU network with dynamic graph modeling and attention mechanism, and integrating log semantics and frequency features, the problems of dynamic structure dependency and noise interference of log events are solved, achieving high-precision distributed system security protection and fault diagnosis.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- XINJIANG AIR & EARTH INTEGRATION LABORATORY TECHNOLOGY CO LTD
- Filing Date
- 2026-05-21
- Publication Date
- 2026-06-30
Smart Images

Figure CN122309286A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of network platforms, and in particular to a method and apparatus for detecting log anomalies to improve the security of distributed systems. Background Technology
[0002] In large-scale distributed systems such as cloud computing, edge computing, and computing power networks, system logs are the core data source for recording critical information such as computing node operations, resource scheduling, and service interactions. Effective log monitoring and fault diagnosis are crucial for ensuring high service reliability. Traditional log analysis methods mainly rely on manual review and rule matching, which are insufficient to cope with the exponentially increasing scale and complexity of modern systems.
[0003] In recent years, automated log anomaly detection methods based on deep learning have become mainstream. Sequence models (e.g., Long Short-Term Memory (LSTM) and Temporal Convolutional Networks (TCN)) treat logs as text sequences, effectively capturing sequential dependencies between events, but neglecting potential structural relationships between log events. On the other hand, Graph Neural Networks (GNNs) model dependencies between logs by constructing event graphs; however, most existing works use static graphs for modeling, failing to reflect the dynamic evolution of real log streams over time. Furthermore, log messages themselves are highly unstable; frequent changes in log formats during system iterations and version updates introduce a large amount of abnormal noise and interference, severely reducing the model's detection accuracy and generalization ability in real-world fault scenarios and secure operation and maintenance environments, and weakening the model's robustness to complex system failures.
[0004] Therefore, there is an urgent need for an anomaly detection method that can capture the dynamic structural dependencies of log events and effectively handle log evolution noise, so as to improve the security protection capability and fault diagnosis accuracy of distributed systems in complex evolutionary environments. Summary of the Invention
[0005] This invention provides a log anomaly detection method and apparatus for improving the security of distributed systems. The invention captures the structural relationships of log events through dynamic graph modeling, integrates log semantic features and fault association frequency features, and combines a bidirectional gated recurrent neural network (BiGRU) with an attention mechanism to achieve high-precision detection and early warning of security anomalies and potential faults in distributed systems. Simultaneously, it uses a random projection hash algorithm to effectively solve the noise interference problem caused by log template updates during system iteration, ensuring the robustness of the model in complex security and fault diagnosis scenarios. This provides reliable technical support for distributed system security protection and fault diagnosis. See the description below for details.
[0006] Firstly, a log anomaly detection method for improving the security of distributed systems, the method comprising:
[0007] Log template hash buckets are constructed based on the random projection hash algorithm. During the testing phase, hash retrieval and semantic similarity matching are performed on new log events that have not yet appeared, and redundant templates are automatically merged.
[0008] A sliding window is set to sample structured log template files, and log events are mapped to graph nodes. A dynamic event graph that evolves over time is constructed based on the order of events.
[0009] For the topological change part of the dynamic event graph, the graph attention network is used to adaptively aggregate and weightedly fuse the features of neighbor nodes to update the node embedding representation, and the semantic embedding features of the nodes are learned by combining residual connections.
[0010] The frequency of each log template within the sliding window is counted, and the normalized frequency vector is input into a lightweight encoder consisting of convolutional layers, activation functions, and pooling layers to output fixed-dimensional log frequency features.
[0011] The semantic embedding features obtained through the dynamic graph attention network are concatenated with the log frequency features to obtain the fused feature representation. The fused feature representation is then input into the attention-based BiGRU network to determine whether the node is abnormal.
[0012] Specifically, the method of constructing log template hash buckets based on the random projection hash algorithm, and performing hash retrieval and semantic similarity matching on new log events that have not yet appeared during the testing phase, and automatically merging redundant templates, involves:
[0013] In the training set, RPH hash buckets are constructed for the semantic embeddings of all known log templates;
[0014] During the testing phase, for log events that did not appear in the training set, their hash values were calculated and candidate templates were retrieved from the corresponding buckets.
[0015] If the cosine similarity between a new event and any candidate template exceeds a preset threshold, it is merged into the known candidate template; otherwise, it is retained as a new candidate template.
[0016] Specifically, the process of concatenating semantic embedding features with log frequency features to obtain the fused feature representation is as follows:
[0017] Calculate the current graph Compared to the previous timeframe Changing topology ,in, It is the adjacency matrix at time t. Let be the adjacency matrix at time t-1. Representing the adjacency matrix as it changes relative to the previous time step, we apply a graph attention mechanism to the changing topology:
[0018] ;
[0019] ;
[0020] ;
[0021] ;
[0022] in, This represents the attention score. For learnable weight matrix, For learnable vectors, LeakReLU, softmax, and It is a non-linear activation function. The expression `||` represents transpose, and `||` represents vector concatenation. Let i be the semantic feature of node i. Let j be the semantic features of node j. Let i be the set of neighboring nodes. The output feature representation of node i in DyGATv2; Let be the attention coefficient between node i and node j. For the final semantic embedding representation; the frequency vector is encoded using a frequency encoder, and features are fused:
[0023] ;
[0024] ;
[0025] in, Let v represent the frequency characteristics of the t-th time window, and v be the log event. It is a multilayer perceptron. It is a global average pooling layer. It is a one-dimensional convolutional layer. (v) is the frequency vector of log events. This represents the feature representation after fusion for the t-th time window. This is for splicing operations.
[0026] The attention-based BiGRU network consists of two independent GRU units. The forward GRU processes the fused feature representation sequentially along the feature sequence to capture historical context information; the backward GRU processes the fused feature representation in reverse order along the feature sequence to obtain future context information. The BiGRU network captures both historical and future context in the fused feature sequence through a bidirectional structure, and assigns different weight coefficients to the hidden states at different positions in the fused feature sequence through an attention mechanism, focusing on key segments where potential anomalies occur. Finally, the anomaly probability of the node is output through a fully connected layer and a sigmoid activation function to determine whether the node is an anomaly.
[0027] A second aspect is a log anomaly detection device for improving the security of a distributed system, the device comprising: a processor and a memory, the memory storing program instructions, the processor calling the program instructions stored in the memory to cause the device to execute the method described in any one of the first aspects.
[0028] Third aspect, a computer-readable storage medium storing a computer program, the computer program including program instructions that, when executed by a processor, cause the processor to perform the method described in any one of the first aspects.
[0029] The beneficial effects of the technical solution provided by this invention are:
[0030] 1. This invention proposes a graph-based log anomaly detection framework, LogSFG. This framework uses a log frequency encoder to learn the frequency characteristics of log counts, which can more effectively capture low-frequency security anomalies and hidden faults in distributed systems. By combining the semantic and frequency features of log sequences and inputting them into an attention-based BiGRU for anomaly detection, it can accurately identify anomaly patterns related to system security and faults, improve the detection efficiency of security anomalies and faults, and provide accurate support for system security protection and fault diagnosis.
[0031] 2. Dynamic graph modeling: The LogSFG framework proposed in this invention uses a sliding window to construct a dynamic event graph that evolves over time, effectively reflecting the system state. It uses dynamic graph attention (DyGATv2) for local embedding updates, avoiding the high overhead of full graph recomputation, and can quickly capture the dynamic evolution characteristics of system security anomalies and faults.
[0032] 3. Effectively address noise interference caused by log format evolution: In actual operation and maintenance environments, system version iterations often lead to frequent changes in log templates, generating a large number of redundant logs with similar semantics but different formats, which can easily interfere with the accurate identification of security anomalies and faults. This invention integrates the Random Projective Hash (RPH) algorithm to perform fast semantic matching and template merging on new log events during the testing phase, effectively suppressing the noise introduced by log evolution and ensuring that the model can still detect faults stably and accurately during system iteration.
[0033] 4. This invention uses precision, recall, and F1 score as evaluation metrics to effectively evaluate the model. It has achieved excellent performance on publicly available computer log datasets (BGL and Thunderbird) (F1 scores of 97.8 and 99.9 respectively), verifying its high accuracy and reliability in real-world complex scenarios. By analyzing node logs to determine the status of computing nodes, it can promptly isolate faulty nodes, ensuring the stable operation of the distributed system and thus improving the user experience. Attached Figure Description
[0034] Figure 1 This is a topological graph showing changes in the event graph;
[0035] Figure 2 This is a diagram of the computing power network computing node anomaly detection architecture of the present invention;
[0036] Figure 3 This is a schematic diagram of the structure of a BiGRU based on an attention mechanism.
[0037] Figure 4 The graphs show the performance metrics of the log dataset using and without dynamic graphs in this invention. Detailed Implementation
[0038] To make the objectives, technical solutions, and advantages of the present invention clearer, the embodiments of the present invention will be described in further detail below.
[0039] Example 1
[0040] A log anomaly detection method for improving the security of distributed systems, see [link to relevant documentation]. Figures 1-4 This method comprises six core steps: log parsing and semantic embedding, dynamic event graph construction, local graph embedding learning, frequency feature encoding, and anomaly classification, as detailed below:
[0041] 101: Collect raw logs from computing nodes and use log parsing tools to obtain structured log templates; use BERT (a natural language processing model based on Transformer) to semantically vectorize the log templates and generate semantic representations of log messages with fixed dimensions;
[0042] 102: Random Projective Hash (RPH) Algorithm for Log Template Denoising—During the testing phase, hash retrieval and semantic similarity matching are performed on new log events, automatically merging redundant templates to solve the noise interference problem caused by log format evolution during system iteration. A log template hash bucket is constructed based on the RPH algorithm. During the testing phase, hash retrieval and semantic similarity matching are performed on new log events that have not yet appeared, automatically merging redundant templates and effectively suppressing noise caused by log format evolution.
[0043] 103: Dynamic Event Graph Construction Based on Sliding Window – Mapping log events to graph nodes, constructing directed edges according to the event sequence, forming a dynamic event graph that evolves over time, and explicitly modeling the transition dependencies between log events. A sliding window is used to sample structured log template files, mapping log events to graph nodes, and constructing a dynamic event graph that evolves over time according to the event sequence.
[0044] 104: Incremental Graph Attention Embedding Update—Applies a graph attention network only to the topological changes in the dynamic event graph, adaptively aggregating and weighting neighbor features, and combining residual connections to update node embeddings, avoiding the high overhead of recomputing the entire graph. For the topological changes in the dynamic event graph, the graph attention network adaptively aggregates and weights neighbor node features, updates the node embedding representation, and combines residuals as semantic embedding features of the nodes;
[0045] 105: Log Frequency Feature Encoder – This encoder constructs a frequency vector by counting the occurrence frequency of log templates, eliminates window length differences through L2 normalization, and extracts frequency features using a lightweight nonlinear encoder (convolution + activation + pooling) to effectively capture low-frequency abnormal events and faults. It counts the occurrence frequency of each log template within a sliding window, constructs the original frequency vector, and performs L2 normalization. The normalized frequency vector is then input into a lightweight encoder consisting of convolutional layers, activation functions, and pooling layers, outputting fixed-dimensional log frequency features.
[0046] 106: Semantic-Frequency Dual Feature Fusion + Attention-Based BiGRU Anomaly Detection—This method concatenates and fuses semantic embedding features obtained through dynamic graph attention with frequency features, inputting the fused features into an attention-based BiGRU network to capture bidirectional temporal dependencies. The attention mechanism assigns higher weights to important features, achieving high-precision anomaly detection. The semantic embedding features obtained through the dynamic graph attention network (DyGATv2) are concatenated with log frequency features to obtain a fused feature representation. This fused feature representation is then input into an attention-based BiGRU network, where the attention mechanism assigns higher weights to important features. Finally, a fully connected layer and a Softmax activation function are used to determine the anomaly probability of a node, thereby determining whether the node is abnormal.
[0047] In summary, the embodiments of the present invention provide reliable technical support for the security protection and fault diagnosis of distributed systems through the robustness of the above steps 101-106 in complex security scenarios and fault diagnosis scenarios.
[0048] Example 2
[0049] The following section provides specific calculation formulas and examples. Figures 1-4 The solution in Example 1 will be further described below:
[0050] 201: Obtain the raw logs of the compute nodes and parse the raw logs of the compute nodes to extract the log template sequence;
[0051] This step includes: using a log parsing tool to perform structured processing on the original system logs, separating the log template and variable parameters; and using the BERT model to semantically vectorize the log template, generating a fixed-dimensional semantic embedding vector.
[0052] First, raw system log streams are collected from the target compute nodes or service clusters. Since raw logs contain many dynamic variables (e.g., timestamps, IP addresses, process IDs, etc.), direct modeling yields poor results. Therefore, efficient log parsing tools (Drain, Spell, etc.) are used to perform structured parsing of the logs. Taking Drain as an example, it uses a fixed-depth parse tree to merge similar logs into the same template, outputting in the following format:
[0053] “Node failed to connect to server ” → Template
[0054] This process maps each log entry to a unique sequence of log template IDs, effectively removing noise variables while preserving the semantic core.
[0055] 202: RPH Hash Bucket Construction and Redundant Template Merging Mechanism - During the training phase, RPH hash buckets are constructed for the semantic embeddings of all known log templates. During the testing phase, candidate templates are retrieved by hashing new events. If the cosine similarity exceeds the threshold, they are merged; otherwise, they are retained as new templates, effectively suppressing log evolution noise.
[0056] The Random Projective Hash (RPH) algorithm preprocesses log templates to merge redundant templates, including:
[0057] In the training set, RPH hash buckets are constructed for the semantic embeddings of all known log templates;
[0058] During the testing phase, for log events that did not appear in the training set, their hash values were calculated and candidate templates were retrieved from the corresponding buckets.
[0059] If the cosine similarity between a new event and any candidate template exceeds a preset threshold, it is merged into the known template; otherwise, it is retained as a new template.
[0060] 203: Dynamic Event Graph Construction and Changing Topology Calculation—A fixed node ID is assigned to each unique log template. A directed graph is constructed within a sliding window. The changing topology is obtained by calculating the difference between the adjacency matrices of the current window and the previous window. The graph embedding is updated only for the changed parts. The log template sequence is sampled using a sliding window, and a dynamic event graph that evolves over time is constructed.
[0061] This step includes: assigning a fixed node ID to each unique log template; and assigning a globally unique integer node ID to each unique log template (e.g.: , , …).
[0062] Subsequently, the log template text was semantically vectorized using BERT (a Transformer-based natural language processing model). Specifically, the template text was input into the BERT model, and the output vectors of all words were averaged to obtain the semantic embedding vector for the template.
[0063] Set the sliding window length w (e.g., w = 100 log entries) and the sliding step size s (e.g., s = 10). Perform sliding sampling on the log template ID sequence to obtain a series of continuous subsequences. For the log event sequence within the t-th window... Construct a directed dynamic event graph .
[0064] Among them, the node set : Contains all unique template IDs appearing within this window; edge set If an event occurs immediately preceding another event in the sequence, a directed edge is added; node attributes It consists of the BERT embeddings corresponding to step 102.
[0065] This graph structure can explicitly model the transition dependencies between log events and evolve dynamically as the time window slides.
[0066] 204: Local embedding learning is performed on the changing topology DyGATv2 using graph attention;
[0067] The DyGATv2 incremental local embedding learning strategy calculates the changed topology of the current graph compared to the previous time step, applies graph attention mechanisms to calculate attention scores and update embeddings only on the changed parts, and directly obtains embeddings from the embedding dictionary for unchanged nodes, reducing computational overhead. To efficiently handle the dynamic nature of the graph, this embodiment employs a dynamic graph attention network (DyGATv2). First, the changed topology of the current graph compared to the previous time step is calculated. Graph attention mechanisms are applied only to the changed topology. The entire calculation process is as follows:
[0068] ;
[0069] ;
[0070] ;
[0071] ;
[0072] in, It is the adjacency matrix at time t. Let be the adjacency matrix at time t-1. This represents the adjacency matrix that has changed relative to the previous time step. This represents the attention score. For learnable weight matrix, For learnable vectors, LeakReLU, softmax, and Let T be a non-linear activation function, T denote the transpose, and || denote the vector concatenation operation. Let i be the semantic feature of node i. Let j be the semantic features of node j. Let be the set of neighboring nodes of node i. The output feature representation of node i in DyGATv2; Let be the attention coefficient between node i and node j. For the final semantic embedding representation, .
[0073] The residual join + embedding dictionary caching mechanism—the updated embedding and the original embedding are residually joined to obtain the final semantic representation. The embeddings of nodes with unchanged topology are directly obtained from the embedding dictionary, achieving a balance between incremental updates and the preservation of historical information. The updated embedding and the original embedding are residually joined to obtain the final semantic representation of node i. The embeddings of nodes with unchanged topology are obtained from the embedding dictionary.
[0074] 205: The frequency vector is encoded by an encoder to output frequency features of fixed dimensions. ;
[0075] The process involves L2 normalization of the frequency vector and a lightweight nonlinear encoder. The frequency vector is generated by counting the occurrences of log events. L2 normalization eliminates differences in window length. A lightweight encoder consisting of one-dimensional convolution, ReLU, and global average pooling extracts frequency features to capture anomalous patterns in the frequency distribution. Specifically, the frequency vector is generated by counting the occurrences of each log event (node) within the current sliding window. L2 normalization is applied to this frequency vector to eliminate the influence of window length differences. The normalized frequency vector is then input into a lightweight nonlinear encoder consisting of a one-dimensional convolutional layer, an activation function, and a global average pooling layer to obtain the frequency features.
[0076] Specifically, the frequency of occurrence is calculated by counting the occurrence frequency of each log template within the t-th time window, forming a frequency vector. (Templates that do not appear are counted as 0). The unprocessed frequency vector The frequency vector dimension is defined as follows: After normalizing the frequency vector, it is input into a lightweight frequency encoder. This encoder consists of a one-dimensional convolutional layer (kernel_size=3, filters=64), a ReLU activation function, and a global average pooling layer, outputting frequency features of a fixed dimension. .
[0077] 206: Semantic-Frequency Dual Feature Fusion + Attention BiGRU Bidirectional Temporal Modeling—This method combines DyGATv2 semantic embeddings with frequency features. An attention-based BiGRU simultaneously captures forward and backward temporal dependencies and assigns higher weights to important features. A fully connected layer and a sigmoid activation function output anomaly probabilities, achieving collaborative anomaly detection based on structural semantics and statistical frequency. The semantic embedding features obtained through the Dynamic Graph Attention Network (DyGATv2) are concatenated with the frequency features to obtain the fused feature representation. This fused feature representation is then input into an attention-based BiGRU network. The BiGRU network can simultaneously capture forward and backward temporal dependencies and assign higher weights to more important features. Finally, a fully connected layer and a sigmoid activation function determine whether a node is an anomaly, ensuring the overall safe and stable operation of the distributed system. The calculation process is as follows:
[0078] ;
[0079] ;
[0080] ;
[0081] in, and These refer to the forward and backward hidden states of BiGRU, respectively. This refers to the hidden state of BiGRU. It is the attention score. It is a learnable weight matrix. For attention, there is an implicit spatial dimension. It is the hyperbolic tangent activation function. It is the classification weight matrix. It is a non-linear activation function used to output a normalized probability distribution.
[0082] In summary, the embodiments of the present invention provide reliable technical support for the security protection and fault diagnosis of distributed systems through the robustness of the above steps 201-206 in complex security scenarios and fault diagnosis scenarios.
[0083] Example 3
[0084] A log anomaly detection device for improving the security of a distributed system includes a processor, a memory, an input / output interface, and a communication module. The memory stores program instructions, and the processor calls the program instructions stored in the memory to make the device execute the steps in Embodiment 1.
[0085] This device can be deployed in the following scenarios:
[0086] Cloud-based operations and maintenance platform: Receives log streams from thousands of servers in real time for centralized anomaly detection;
[0087] Edge computing nodes: Deploy lightweight versions (such as using DistilBERT to replace BERT and simplifying the number of BiGRU layers) on resource-constrained edge devices to achieve rapid local alarms;
[0088] Hybrid architecture: preliminary filtering and feature extraction are performed at the edge, while complex graph modeling and final decision-making are performed in the cloud, balancing efficiency and accuracy;
[0089] The entire system supports advanced features such as hot model updates, log backtracking analysis, and anomaly root cause localization, and can be seamlessly integrated into existing AIOps platforms.
[0090] The processor described above performs the following steps:
[0091] Log template hash buckets are constructed based on the random projection hash algorithm. During the testing phase, hash retrieval and semantic similarity matching are performed on new log events that have not yet appeared, and redundant templates are automatically merged.
[0092] A sliding window is set to sample structured log template files, and log events are mapped to graph nodes. A dynamic event graph that evolves over time is constructed based on the order of events.
[0093] For the topological change part of the dynamic event graph, the graph attention network is used to adaptively aggregate and weightedly fuse the features of neighbor nodes, update the node embedding representation, and combine the residual as the semantic embedding feature of the node.
[0094] The frequency of each log template within the sliding window is counted, and the normalized frequency vector is input into a lightweight encoder consisting of convolutional layers, activation functions, and pooling layers to output fixed-dimensional log frequency features.
[0095] The semantic embedding features obtained through the dynamic graph attention network are concatenated with the log frequency features to obtain the fused feature representation. The fused feature representation is then input into the attention-based BiGRU network to determine whether the node is abnormal.
[0096] Specifically, a log template hash bucket is constructed based on a random projection hash algorithm. During the testing phase, hash retrieval and semantic similarity matching are performed on new log events that have not yet appeared, and redundant templates are automatically merged.
[0097] In the training set, RPH hash buckets are constructed for the semantic embeddings of all known log templates;
[0098] During the testing phase, for log events that did not appear in the training set, their hash values were calculated and candidate templates were retrieved from the corresponding buckets.
[0099] If the cosine similarity between a new event and any candidate template exceeds a preset threshold, it is merged into the known candidate template; otherwise, it is retained as a new candidate template.
[0100] Specifically, the semantic embedding features and log frequency features are concatenated to obtain the fused feature representation, as follows:
[0101] Calculate the current graph Compared to the previous timeframe Changing topology ,in It is the adjacency matrix at time t. Let be the adjacency matrix at time t-1. Representing the adjacency matrix as it changes relative to the previous time step, we apply a graph attention mechanism to the changing topology:
[0102] ;
[0103] ;
[0104] ;
[0105] ;
[0106] in, This represents the attention score. For learnable weight matrix, For learnable vectors, LeakReLU, softmax, and Let T be a non-linear activation function, T denote the transpose, and || denote the vector concatenation operation. Let i be the semantic feature of node i. Let j be the semantic features of node j. Let be the set of neighboring nodes of node i. The output feature representation of node i in DyGATv2; Let be the attention coefficient between node i and node j. For the final semantic embedding representation; the frequency vector is encoded using a frequency encoder, and features are fused:
[0107] ;
[0108] ;
[0109] in, Let v represent the frequency characteristics of the t-th time window, and v be the log event. It is a multilayer perceptron. It is a global average pooling layer. It is a one-dimensional convolutional layer. (v) is the frequency vector of log events. This represents the feature representation after fusion for the t-th time window. This is for splicing operations.
[0110] The attention-based BiGRU network consists of two independent GRU units: the forward GRU processes the fused feature representation sequentially along the feature sequence to capture historical context information; and the reverse GRU processes the fused feature representation in reverse order along the feature sequence to obtain future context information.
[0111] The BiGRU network captures both historical and future contexts in the fused feature sequence through a bidirectional structure. It assigns different weight coefficients to the hidden states at different positions in the fused feature sequence through an attention mechanism, focusing on key segments where potential anomalies may occur. Finally, it outputs the anomaly probability of the node through a fully connected layer and a sigmoid activation function, thereby determining whether the node is an anomaly.
[0112] In summary, the embodiments of the present invention provide reliable technical support for the security protection and fault diagnosis of distributed systems through the robustness of the above operations in complex security and fault diagnosis scenarios.
[0113] Example 4
[0114] The embodiments of the present invention were simulated and verified in accordance with standard scientific experimental procedures to evaluate the effectiveness of the proposed method.
[0115] 1. Experimental Objective
[0116] The performance of the LogSFG framework proposed in this invention on a real distributed system log dataset is verified, and a comparative analysis is conducted with current mainstream advanced methods.
[0117] 2. Experimental Environment and Dataset: Hardware Environment: A computing server configured with an Intel Xeon Gold 5218R processor, 256GB of RAM, and an NVIDIA A40 high-performance graphics card. Software Environment: Linux operating system, Python 3.9.24 programming language, PyTorch 2.4.1 deep learning framework, and torch-geometric 2.6.1 graph neural network library.
[0118] Datasets: Two large-scale real-world system log datasets were selected for validation. The BGL dataset, collected from the IBM Blue Gene / L supercomputer system, primarily covers hardware failures, software errors, and operational anomalies. The Thunderbird dataset comes from the Thunderbird cluster at Sandia National Laboratories (SNL).
[0119] 3. Evaluation Indicators
[0120] The following classification evaluation metrics, commonly used in the field of anomaly detection, are adopted: Precision: the proportion of samples judged as anomalous by the model that are actually anomalous, calculated as Precision = TP / (TP + FP); Recall: the proportion of truly anomalous samples correctly detected by the model, calculated as Recall = TP / (TP + FN); F1 score: the harmonic mean of precision and recall, comprehensively reflecting the model's detection performance, calculated as F1 = 2 × Precision × Recall / (Precision + Recall). Where TP represents the number of samples correctly identified as anomalous, FP represents the number of normal samples incorrectly labeled as anomalous, and FN represents the number of anomalous samples not detected.
[0121] 4. Experimental Results and Comparative Analysis
[0122] The method of this invention is compared with various current unsupervised and supervised methods on the BGL and Thunderbird datasets. The comparison results are shown in Table 1:
[0123] Table 1: Performance Comparison of Different Methods
[0124]
[0125] Results Analysis: The experimental results show that unsupervised methods such as PCA and OCSVM perform poorly, reflecting the significant limitations of traditional machine learning methods in handling high-dimensional and complex log data. On the BGL dataset, the method of this invention demonstrates outstanding precision at 0.986, indicating that it effectively reduces false positives while maintaining high detection accuracy. Logser also achieves high precision, but its performance is poor due to the presence of many log templates (i.e., "new events") not present during the training phase in the BGL test set. On the Thunderbird dataset, the log structure is relatively stable, and the method of this invention, along with NeuralLog and LogRobust, all achieved a precision score of 1.000. Unsupervised methods Logs2Graph and Logser also perform well, but overall, supervised models outperform unsupervised models. The method of this invention achieves the best F1 score on both datasets (0.978 for BGL and 0.999 for Thunderbird), validating the effectiveness of dynamic graph modeling and frequency feature fusion. Unless otherwise specified, the model numbers of the various devices in this embodiment of the invention are not limited, and any device that can perform the above functions is acceptable.
[0126] Those skilled in the art will understand that the accompanying drawings are merely schematic diagrams of a preferred embodiment, and the sequence numbers of the above embodiments of the present invention are for descriptive purposes only and do not represent the superiority or inferiority of the embodiments.
[0127] The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of the present invention should be included within the protection scope of the present invention.
Claims
1. A log anomaly detection method for improving security of a distributed system, characterized by, The method includes: Log template hash buckets are constructed based on the random projection hash algorithm. During the testing phase, hash retrieval and semantic similarity matching are performed on new log events that have not yet appeared, and redundant templates are automatically merged. A sliding window is set to sample structured log template files, and log events are mapped to graph nodes. A dynamic event graph that evolves over time is constructed based on the order of events. For the topological change part of the dynamic event graph, the graph attention network is used to adaptively aggregate and weightedly fuse the features of neighbor nodes to update the node embedding representation, and the semantic embedding features of the nodes are learned by combining residual connections. The frequency of each log template within the sliding window is counted, and the normalized frequency vector is input into a lightweight encoder consisting of convolutional layers, activation functions, and pooling layers to output fixed-dimensional log frequency features. The semantic embedding features obtained through the dynamic graph attention network are concatenated with the log frequency features to obtain the fused feature representation. The fused feature representation is then input into the attention-based BiGRU network to determine whether the node is abnormal. 2.The log anomaly detection method for improving security of a distributed system according to claim 1, wherein, The process of constructing log template hash buckets based on the random projection hash algorithm, performing hash retrieval and semantic similarity matching on new log events that have not yet appeared during the testing phase, and automatically merging redundant templates specifically involves: In the training set, RPH hash buckets are constructed for the semantic embeddings of all known log templates; During the testing phase, for log events that did not appear in the training set, their hash values were calculated and candidate templates were retrieved from the corresponding buckets. If the cosine similarity between a new event and any candidate template exceeds a preset threshold, it will be merged into the known candidate template. Otherwise, keep it as a new candidate template. 3.The log anomaly detection method for improving security of a distributed system according to claim 1, wherein, The process of concatenating semantic embedding features with log frequency features to obtain a fused feature representation is as follows: Calculate the current graph Compared to the previous timeframe Changing topology ,in, It is the adjacency matrix at time t. Let be the adjacency matrix at time t-1. Representing the adjacency matrix as it changes relative to the previous time step, we apply a graph attention mechanism to the changing topology: ; ; ; ; in, This represents the attention score. For learnable weight matrix, For learnable vectors, LeakReLU, softmax, and It is a non-linear activation function. The expression `||` represents transpose, and `||` represents vector concatenation. Let i be the semantic feature of node i. Let j be the semantic features of node j. Let i be the set of neighboring nodes. The output feature representation of node i in DyGATv2; Let be the attention coefficient between node i and node j. For the final semantic embedding representation; the frequency vector is encoded using a frequency encoder, and features are fused: ; in, Let v represent the frequency characteristics of the t-th time window, and v be the log event. It is a multilayer perceptron. It is a global average pooling layer. It is a one-dimensional convolutional layer. This is a frequency vector of log events. This represents the feature representation after fusion for the t-th time window. This is for splicing operations.
4. The log anomaly detection method for improving the security of a distributed system according to claim 1, characterized in that, The attention-based BiGRU network consists of two independent GRU units. The forward GRU processes the fused feature representation sequentially along the feature sequence to capture historical context information; the backward GRU processes the fused feature representation in reverse order along the feature sequence to obtain future context information. The BiGRU network captures both historical and future context in the fused feature sequence through a bidirectional structure, and assigns different weight coefficients to the hidden states at different positions in the fused feature sequence through an attention mechanism, focusing on key segments where potential anomalies occur. Finally, the anomaly probability of the node is output through a fully connected layer and a sigmoid activation function to determine whether the node is an anomaly.
5. A log anomaly detection device for improving the security of distributed systems, characterized in that, The device includes a processor and a memory, the memory storing program instructions, the processor invoking the program instructions stored in the memory to cause the device to perform the method according to any one of claims 1-4.
6. A computer-readable storage medium, characterized in that, The computer-readable storage medium stores a computer program, the computer program including program instructions that, when executed by a processor, cause the processor to perform the method described in any one of claims 1-4.