A hybrid log collection and alerting method and system
By combining a lock-free circular buffer, NetLink and UDP dual-mode transmission with a Bloom filter in the log processing pipeline, the contradiction between real-time log collection and throughput in existing technologies is resolved. This achieves efficient log transmission and storage optimization, and avoids the waste of storage resources caused by duplicate logs.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- CHINA UNICOM INTERNET OF THINGS CO LTD
- Filing Date
- 2026-05-21
- Publication Date
- 2026-06-19
AI Technical Summary
Existing technologies struggle to achieve both high real-time performance and high throughput log collection without impacting system performance in kernel anomaly and system fault detection, and duplicate log storms lead to wasted storage resources.
A log processing pipeline employing a lock-free circular buffer combined with NetLink, user-space UDP dual-mode transmission, and Bloom filter pre-suppression achieves zero-blocking high-speed log writing and low-latency cross-layer data transmission, while intercepting redundant logs at the storage layer.
Without affecting system performance, high real-time and high throughput log transmission is achieved, effectively avoiding the waste of storage resources due to duplicate logs and ensuring that the system can still maintain light-load operation when facing failures.
Smart Images

Figure CN122240424A_ABST
Abstract
Description
Technical Field
[0001] This invention belongs to the field of log processing and operation and maintenance monitoring technology, and in particular relates to a hybrid log collection and alarm method and system. Background Technology
[0002] Existing log management schemes in operating system kernels and system software face a core technical contradiction: on the one hand, timely detection of kernel anomalies and system failures requires log collection to have high real-time performance and low latency; on the other hand, the explosive growth of log data volume, especially high-frequency repetitive abnormal log storms, demands extremely high throughput and resource utilization in the transmission and storage processes. Existing solutions, such as those based on Syslog, require logs to be transferred from the kernel to user space via the file system, introducing significant I / O latency and context switching overhead, which cannot meet real-time requirements. While kernel probe-based solutions can directly obtain kernel information, they are complex to implement and incur significant system performance overhead. These solutions often use TCP or simple file writing at the transport layer, either facing a dilemma of balancing real-time performance and throughput, or lacking effective source suppression measures when facing repetitive log storms, resulting in a large amount of computing and storage resources being consumed by meaningless redundant data. Therefore, how to achieve both high real-time performance and high throughput with minimal impact on kernel performance, while effectively avoiding the waste of storage resources by repetitive logs, is a pressing technical problem to be solved in this field. Summary of the Invention
[0003] This application provides a hybrid log collection and alarm method and system. By constructing a log processing pipeline that combines a lock-free ring buffer with NetLink, user-space UDP dual-mode transmission, and Bloom filter pre-suppression, it achieves zero-blocking high-speed log writing and low-latency cross-layer data transmission bypassing the file system in kernel mode. At the transport layer, it balances the real-time performance and throughput of log transmission, and at the storage layer, it pre-intercepts redundant logs with minimal overhead, thereby optimizing the performance of the entire log generation, transmission, and storage chain.
[0004] The first aspect of this application discloses a hybrid log collection and alarm method, comprising the following steps: Log data is generated in kernel mode and written to an unlock-free circular buffer according to the preset format specification. The user-space server obtains the log data stored in the lockless circular buffer through the NetLink protocol, and sends it to the log server via UDP protocol in single real-time mode or batch asynchronous mode, depending on the characteristics of the log data. The log server uses a Bloom filter to determine the duplication of the received log data, and performs asynchronous writing and storage on log data that is determined to be non-duplicate, and performs suppression on log data that is determined to be duplicate. The lock-free circular buffer is a circular data structure that supports a multi-producer, single-consumer model and lock-free write operations.
[0005] Optionally, generating and writing log data to the lock-free circular buffer in kernel mode includes: The kernel-mode module invokes a preset macro to generate a target format log containing the module identifier, log level, and content. Get the write index of the current unlocked circular buffer; Copy the target format log to the buffer slot pointed to by the write index; Update the write index.
[0006] Optionally, the user-space server obtains the log data stored in the lockless circular buffer via the NetLink protocol, including: The user-space server constructs a NetLink message and sends it to the kernel space. The message contains an identifier of the target buffer. The kernel-mode NetLink processing routine locates the lock-free circular buffer based on the identifier and reads the current read index; Based on the read index, log data is retrieved in batches from the lock-free circular buffer and encapsulated into a NetLink response message.
[0007] Optionally, the step of sending the log data to the log server via UDP protocol in single real-time mode or batch asynchronous mode, based on the characteristics of the log data, includes: The log data obtained from the kernel space is parsed to determine its corresponding event level; If the event level is higher than a preset threshold, a UDP packet containing the log data will be immediately sent to the log server. If the event level is not higher than the preset threshold, the log data is stored in the user-mode sending buffer. When the amount of data in the user-mode sending buffer reaches the preset capacity or the preset time interval expires, the multiple log data in the user-mode sending buffer are encapsulated into a UDP data packet and sent to the log server.
[0008] Optionally, the step of encapsulating multiple log data entries in the user-space send buffer into a single UDP packet and sending it to the log server includes: Determine whether the amount of data in the user-mode send buffer has reached the preset capacity; If so, immediately perform encapsulation and transmission; If not, it continuously checks whether the time interval since the last transmission has reached the preset time interval, and performs encapsulation and transmission when it does.
[0009] Optionally, the log server uses a Bloom filter to determine the duplicateness of the received log data, and performs asynchronous writing and storage on log data determined to be non-duplicate, and performs suppression on log data determined to be duplicate, including: Extract key information from the log data, including the log generation module, log level, and feature values of the log content; Input the key information into the Bloom filter for querying; If the query results indicate that the key information may already exist, the log entry is determined to be a duplicate and discarded. If the query result indicates that the key information does not exist, then the log data is determined to be a non-duplicate log, it is placed in the asynchronous write queue, and the key information is added to the Bloom filter.
[0010] Optionally, after suppressing log data identified as duplicates, the process may also include: The intelligent analysis daemon collects newly written log data from the asynchronous write queue at variable polling intervals; The collected log data is input into a preset analysis model for pattern matching to obtain risk analysis results; The risk analysis results are sent to the alarm platform, and an alarm notification is triggered when the risk analysis results meet the rules associated with the preset alarm strategy.
[0011] Optionally, the step of collecting newly written log data from the asynchronous write queue at a variable polling interval includes: Get the rate at which new log entries are written within the previous statistical period; The interval between the next polling iterations is dynamically adjusted based on the generation rate and the preset negative correlation mapping relationship.
[0012] Optionally, the step of generating and writing log data to a lock-free circular buffer in kernel mode according to a preset format specification includes: The user-space client generates user-space logs that conform to the preset format specification by calling the dynamic library; The user-mode client sends the user-mode logs to the log server via the UDP protocol for duplicate detection and asynchronous writing.
[0013] The second aspect of this application discloses a hybrid log collection and alarm system for performing the method described in the first aspect.
[0014] In the log collection process provided in this application, firstly, a lock-free circular buffer unit in kernel space is used to handle bursty log generation requests. When the kernel producer writes logs to this buffer, the operation involves only one atomic-level write index update, achieving non-blocking writes and eliminating performance jitter on the kernel side from the source. Then, a data acquisition technique using NetLink is introduced. This technique does not rely on passive polling of the file system, but rather on user-space consumers actively initiating batch read requests through sockets. Combined with the lock-free buffer, an efficient data migration from kernel space to user space is completed at the memory level, significantly reducing the time logs reside in kernel space. This is the foundation for achieving low latency and high throughput. After obtaining this batch of log data, the UDP dual-mode sending mechanism comes into play. It processes the log stream according to pre-defined system rules: logs with high real-time requirements are encapsulated into UDP datagrams with zero latency and sent, avoiding the waiting latency caused by batch processing; while the vast majority of regular logs are stored in a local buffer, waiting for a certain quantity or time condition to be met before sending a UDP datagram containing multiple logs in a single system call. This amortizes the overhead of network data encapsulation and I / O operations by hundreds of times, resolving the contradiction between low latency and high throughput at the transport layer. While this batch processing mechanism is efficient, it may send a large number of duplicate logs to the server during a log storm. In this case, the Bloom filter's duplicate detection and suppression step forms the last line of defense before the server writes to disk. It uses minimal space and constant time in memory to determine whether the key information of a log already exists in the set; if it does, it is immediately discarded from memory, avoiding subsequent time-consuming disk seeks and write operations. This mechanism effectively reduces the amount of data entering the disk subsystem, protecting storage devices and saving valuable I / O bandwidth. This allows the system to maintain a light load even when encountering massive amounts of duplicate log printing caused by a fault, preventing it from crashing due to storage resource exhaustion. It is evident that the various features of the entire technical solution are interconnected and mutually conditional: lock-free buffering and NetLink provide high-speed data output, UDP dual-mode provides flexible data transmission, and the Bloom filter provides source filtering of useless data. These three elements work together to ensure optimal system performance throughout the entire process. Attached Figure Description
[0015] To more clearly illustrate the technical solutions in the embodiments of this application or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are some embodiments of this application. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.
[0016] Figure 1This is a flowchart of the hybrid log collection and alarm method in the embodiments of this application; Figure 2 This is a flowchart of S100 of the hybrid log collection and alarm method in the embodiments of this application; Figure 3 This is a flowchart of S200 of the hybrid log collection and alarm method in the embodiments of this application; Figure 4 This is a further flowchart of S200 of the hybrid log collection and alarm method in the embodiments of this application; Figure 5 This is a flowchart of S300 of the hybrid log collection and alarm method in the embodiments of this application; Figure 6 This is a flowchart of steps S400-S600 of the hybrid log collection and alarm method in the embodiments of this application; Figure 7 This is a flowchart of S400 of the hybrid log collection and alarm method in the embodiments of this application; Figure 8 This is a further flowchart of S100 of the hybrid log collection and alarm method in the embodiments of this application. Detailed Implementation
[0017] In the following description, specific details such as particular system architectures and techniques are set forth for illustrative purposes and not limiting, in order to provide a thorough understanding of the embodiments of this application. However, those skilled in the art will understand that this application can also be implemented in other embodiments without such specific details. In other instances, detailed descriptions of well-known systems, apparatuses, circuits, and methods are omitted so as not to obscure the description of this application with unnecessary detail.
[0018] In this application embodiment, when faced with the contradiction between real-time performance and throughput in kernel log collection, the conventional design approach of existing technologies is usually limited by the principle of layered decoupling. That is, they tend to optimize kernel-mode data transmission, user-mode protocol transmission, and server-side storage strategies as independent modules. This technological inertia leads engineers to improve them separately. At the kernel layer, for example, they optimize the number of data copies in NetLink communication, or design a separate ring buffer to smooth burst traffic. At the transport layer, they focus on making fixed protocol choices between TCP and UDP. At the storage layer, they deal with traffic surges by adding write buffers or batch disk writes. However, this separate optimization approach cannot solve the "weakest link" effect caused by the mismatch in processing capabilities of each link in the overall chain. The kernel generates logs extremely quickly, user-mode transmission is rate-limited, and the server is forced to block due to repeated writes. This solution breaks through the technical biases of traditional separate designs, proposing a cross-layer transmission structure that strongly couples a lock-free circular buffer with NetLink. Furthermore, it pre-applies a Bloom filter, typically used only for cache lookups, to the source of the log write path, making it functionally mutually defined with the UDP dual-mode sending mechanism. Specifically, applying NetLink's copy-less transmission characteristics directly to the physical memory pages of the lock-free circular buffer requires overcoming the challenge of ensuring memory order consistency between NetLink read operations and buffer write index updates between kernel-mode producers and user-mode consumers without introducing lock synchronization. This avoids log gaps or duplicate reads caused by out-of-order CPU execution, far exceeding the complexity of simply using NetLink or designing a lock-free queue. Meanwhile, transforming the Bloom filter from a read-accelerating cache to a write-intercepting log suppressor, and integrating it with the asynchronous batch UDP sending mechanism, requires resolving a key implementation conflict. UDP batch sending naturally prioritizes high aggregation to amortize overhead, but if the aggregated data blocks contain duplicate logs and are suppressed as a whole, non-duplicate valid logs will be lost. Therefore, this solution must perform fine-grained, line-by-line extraction of key log information and Bloom filter judgment within the UDP packet, embedding this judgment logic into the minimum processing path after the data arrives from the network card to maintain high throughput. Those skilled in the art will understand that, given existing technologies, engineers typically choose to perform batch deduplication before writing to disk on the server side, or perform single-line deduplication before sending in user space. The former leads to a large amount of duplicate data consuming network bandwidth, while the latter increases user-space computational overhead. This solution, by performing line-by-line deduplication immediately after receiving data on the server side, combined with the advantages of UDP batch sending, effectively achieves a balance between network bandwidth and computational overhead.Furthermore, in existing technologies, to ensure the consistency of data transmission between kernel mode and user mode, lock synchronization mechanisms are typically employed. For example, a mutex lock is added to the circular buffer in kernel mode, acquiring the lock when a kernel module writes data, and also when a user module reads data. However, this lock synchronization mechanism can cause kernel modules to be blocked when writing log data, especially when multiple kernel modules are writing simultaneously. Lock contention can severely impact kernel performance and even lead to kernel malfunctions. This solution utilizes a lock-free circular buffer supporting a multi-producer, single-consumer model, combined with the batch read mechanism of the NetLink protocol, achieving efficient data transmission between kernel mode and user mode without introducing lock synchronization. To ensure memory order consistency, the inventors, with a deep understanding of the kernel's memory model and the CPU's instruction set, used appropriate memory barrier instructions when updating and reading indexes to prevent data errors caused by out-of-order CPU execution. Additionally, in existing technologies, Bloom filters are commonly used in caching systems to accelerate queries for hot data and reduce access to backend storage. In this application scenario, a misjudgment by the Bloom filter will only result in one unnecessary backend storage access, without causing data loss. This solution uses a Bloom filter for log suppression, but false positives can lead to the discarding of valid log data, which is something to avoid in a logging system. To reduce the false positive rate, this solution optimizes the parameters of the Bloom filter and employs a strategy of periodically resetting the Bloom filter to prevent the false positive rate from increasing due to prolonged operation. Furthermore, this solution adds a time dimension constraint to the Bloom filter, only performing duplicate checks on log data from the most recent period. This not only reduces the false positive rate but also prevents the Bloom filter's bit array from becoming full.
[0019] This application provides a hybrid log collection and alarm method, such as... Figure 1 As shown, the method includes: S100: According to the preset format specification, generate log data in kernel mode and write it to the lock-free circular buffer; S200: The user-space server obtains the log data stored in the lockless circular buffer through the NetLink protocol, and sends it to the log server through the UDP protocol in single real-time mode or batch asynchronous mode according to the characteristics of the log data. S300: The log server uses a Bloom filter to determine the duplication of the received log data, and performs asynchronous writing and storage on the log data that is determined to be non-duplicate, and performs suppression on the log data that is determined to be duplicate; The lock-free circular buffer is a circular data structure that supports a multi-producer, single-consumer model and lock-free write operations.
[0020] In this embodiment, the hybrid log collection and alarm method first generates log data conforming to a preset format specification in kernel space and writes it to a lock-free circular buffer. The preset format specification unifies the structure of kernel-space logs, facilitating subsequent parsing and processing by user space and the server. The format specification may include fields such as the log generation timestamp, module identifier, log level, log content length, and the log content itself. The lock-free circular buffer is a circular data structure that supports a multi-producer, single-consumer model and lock-free write operations. Internally, it contains a fixed-size array and read and write indices pointing to the read and write positions, respectively. The multi-producer, single-consumer model means that multiple kernel modules can simultaneously act as producers, writing log data to the buffer, while only one user-space server acts as a consumer, reading data from the buffer. This model avoids contention between multiple consumers, simplifies implementation logic, and improves performance. Lock-free write operations mean that when writing log data, the kernel module does not need to acquire any mutex locks; it only needs to update the write index atomically to complete the write, thereby eliminating the performance loss caused by lock contention and the risk of kernel-space blocking. The size of the lock-free circular buffer can be configured according to the system's memory size and the log generation rate. For example, for servers with ample memory, the size of the lock-free circular buffer can be set to 16MB or 32MB to accommodate more bursts of log data; for embedded devices with limited memory, the size can be set to 1MB or 2MB to conserve memory resources. The size of each slot in the lock-free circular buffer can also be configured based on the average size of the log data. For instance, if the average log data size is 256 bytes, each slot can be set to 512 bytes to accommodate most of the log data and prevent truncation. It's important to note that when the lock-free circular buffer is full (i.e., when the write index catches up with the read index), new log data will not be able to be written to the buffer. To prevent the loss of important log data, a strategy of overwriting older data can be adopted, where new log data overwrites the oldest log data when the buffer is full. Alternatively, a strategy of discarding new data can be used, where new log data is discarded when the buffer is full. In practical applications, the appropriate strategy can be chosen based on the importance of the log data. For example, for critical error logs in kernel mode, a strategy of overwriting old data can be adopted to ensure that important log data can be written; for ordinary information-level logs, a strategy of discarding new data can be adopted to avoid system problems caused by buffer overflow.
[0021] It's important to note that the user-space server retrieves log data stored in a lock-free circular buffer via the NetLink protocol. NetLink is a socket protocol used for communication between kernel and user space. It provides a full-duplex communication method, supporting user space to actively send requests to kernel space, and kernel space to push events to user space. Compared to traditional file system-based communication methods, NetLink eliminates the need for file system intermediaries; data is transferred directly between kernel memory and user memory, reducing the number of data copies and context switching overhead, resulting in lower latency and higher throughput. After retrieving the log data, the user-space server selects either single-login real-time mode or batch asynchronous mode to send it to the log server via UDP, depending on the log data's characteristics. UDP is a connectionless transport layer protocol; it doesn't require establishing or maintaining connections, and it doesn't wait for confirmation from the other party before sending data. Therefore, it features high transmission speed and low overhead, making it suitable for scenarios like log data where reliability requirements are relatively low, but real-time performance and throughput requirements are high.
[0022] For example, the characteristics of log data can include the event level of the log, the importance of the log content, and the frequency of log generation. For high-level log data, such as logs indicating serious system errors, it needs to be sent immediately in single real-time mode so that the log server can obtain and process it promptly, preventing the fault from escalating. For low-level log data, such as logs indicating normal system operation, it can be sent in batch asynchronous mode, encapsulating multiple log data in a single UDP packet, thereby reducing the number of network transmissions and overhead, and improving transmission efficiency. After receiving the log data, the log server uses a Bloom filter to check for duplicates. A Bloom filter is a space-efficient probabilistic data structure consisting of a bit array and multiple hash functions, capable of quickly determining whether an element exists in a set. The characteristic of a Bloom filter is that when it determines that an element does not exist in the set, that element definitely does not exist; when it determines that an element exists in the set, that element may or may not exist, meaning there is a certain false positive rate. Since Bloom filters perform both query and insert operations in memory and have a constant time complexity, they are well-suited for fast duplicate detection of log data.
[0023] Specifically, for log data determined to be non-duplicate, the log server performs asynchronous write-to-storage; for log data determined to be duplicate, the log server performs a suppression operation, discarding the log data and not storing it. Asynchronous write-to-storage means that the log server places non-duplicate log data into an asynchronous write queue, and a dedicated write thread retrieves the data from the queue and writes it to the storage device, instead of performing a write operation immediately after receiving the log data. This method decouples the log data receiving and writing operations, preventing the write operation from blocking the receiving operation and improving the concurrency processing capability of the log server. At the same time, the asynchronous write queue can also smooth out sudden log traffic, preventing the storage device from being overloaded by receiving a large number of write requests in a short period of time.
[0024] In an optional embodiment, such as Figure 2 As shown, S100 further includes: S110: Kernel-mode modules call preset macros to generate target format logs containing module identifiers, log levels, and content; S120: Get the write index of the current unlocked circular buffer; S130: Copy the target format log to the buffer slot pointed to by the write index; S140: Update the write index.
[0025] In this embodiment, the process of generating and writing log data to a lock-free circular buffer in kernel mode first involves the kernel-mode module calling preset macros to generate a target format log containing the module identifier, log level, and content. The preset macros are a set of macro functions predefined in the kernel-mode module. Different macro functions correspond to different log levels, such as macros for generating information-level logs, macros for generating warning-level logs, and macros for generating error-level logs. When the kernel-mode module needs to record logs, it only needs to call the corresponding macro function and pass in the module identifier and log content. The macro function automatically assembles the current timestamp, module identifier, log level, log content length, and log content into target format log data according to the preset format specification. This method simplifies the logging operation of the kernel-mode module and ensures the uniformity of the log format. The definition of preset macros usually includes variable parameters so that the kernel-mode module can pass in log content of arbitrary length. When the macro function expands, it first calculates the length of the log content, and then copies the timestamp, module identifier, log level, log content length, and log content sequentially to a temporary buffer according to the preset format specification. Then, the macro function calls the write function of the lock-free circular buffer to write the log data in the temporary buffer to the lock-free circular buffer. It should be noted that in kernel mode, obtaining the current timestamp requires using the kernel-provided time function, which returns a high-precision kernel time. The module identifier is a predefined string used to identify the kernel module that generates the log data, such as representing the network module, block device module, or file system module. The log level is an integer; different log levels correspond to different macro functions, namely, informational logs, warning logs, error logs, and critical error logs.
[0026] It's important to note that after generating the target format log, the kernel-mode module retrieves the write index of the current lock-free circular buffer. The write index is an atomic variable indicating the position of the next buffer slot where log data can be written. Retrieving the write index is atomic, ensuring no conflict occurs when multiple kernel modules retrieve it simultaneously. After retrieving the write index, the kernel-mode module copies the target format log data to the buffer slot pointed to by the write index. After the copy operation is complete, the kernel-mode module updates the write index, incrementing its value by 1 and taking the modulo of the buffer size to ensure it always points to a valid position within the buffer. Updating the write index is also atomic, guaranteeing that the update will not be interrupted by operations from other kernel modules. Specifically, updating the write index uses an atomic increment operation, which atomically increments the write index value and returns the value before the update. This ensures that when multiple kernel modules call the atomic increment operation simultaneously, each module receives a unique write index value, preventing conflicts. When copying log data to the buffer slot, a memory copy function is used to copy the log data from the temporary buffer to the slot pointed to by the write index. After the copy is complete, a write memory barrier needs to be inserted to ensure that the log data has been completely copied into the buffer slot before the write index is updated. This prevents out-of-order execution of the CPU from causing the log data to be copied before the write index is updated, which would result in incomplete log data being read by the user space.
[0027] For example, suppose the lock-free circular buffer has 1024 slots, each slot being 4096 bytes. When a kernel module needs to write log data, it first calls an atomic operation function to obtain the current write index, assuming the current write index value is 512. Then, the kernel module copies the generated target format log to slot 512 in the buffer. After copying, the kernel module calls an atomic operation function to update the write index value to 513. If the write index value reaches 1024, it will automatically wrap around to 0, thus achieving cyclic use of the circular buffer.
[0028] Those skilled in the art will understand that in a multi-producer, single-consumer model, multiple kernel modules can perform the aforementioned write operations simultaneously. Since both acquiring and updating the write index are atomic operations, multiple kernel modules will not write to the same slot, nor will write index update errors occur. Furthermore, because write operations do not require acquiring any mutex locks, even if a large number of kernel modules write log data simultaneously, it will not cause kernel-level blocking, thus ensuring log write performance.
[0029] In an optional embodiment, such as Figure 3 As shown, the user-space server obtains the log data stored in the lockless circular buffer via the NetLink protocol, further including: S210: The user-space server constructs a NetLink message and sends it to the kernel space. The message contains an identifier of the target buffer. S220: The kernel-mode NetLink processing routine locates the lock-free circular buffer based on the identifier and reads the current read index; S230: Based on the read index, retrieve log data in batches from the lock-free circular buffer and encapsulate it into a NetLink response message.
[0030] In this embodiment, the process of a user-space server retrieving log data stored in a lockless circular buffer via the NetLink protocol first involves the user-space server constructing a NetLink message and sending it to kernel space. This message contains an identifier for the target buffer. Upon startup, the user-space server creates a NetLink socket and binds it to a specific NetLink protocol family. When it needs to retrieve log data from kernel space, the user-space server constructs a NetLink message of type "Log Retrieval Request," containing a unique identifier for the target lockless circular buffer so that kernel space can locate the corresponding buffer based on this identifier. When creating the NetLink socket, the user-space server uses a system call to specify the protocol family as the corresponding NetLink protocol family, the protocol type as the raw socket type, and the protocol number as a custom NetLink protocol number. Then, it uses a bind system call to bind the socket to a local address. The address structure specifies the protocol family as the corresponding NetLink protocol family, the process ID as the current process's PID, and the multicast group as 0. It should be noted that the NetLink message structure consists of a header and a message body. The header contains fields such as message length, type, flags, sequence number, and process ID. The content of the message body varies depending on the message type. For log retrieval request messages, the message body contains an identifier for the target buffer; for log retrieval response messages, the message body contains a batch of log data.
[0031] It's important to note that NetLink processing routines are pre-registered in kernel mode to handle NetLink messages sent from user mode. When kernel mode receives a log retrieval request message from user mode, the NetLink processing routine first parses the target buffer identifier in the message, and then searches for the corresponding lock-free circular buffer in the system based on this identifier. After finding the corresponding lock-free circular buffer, the NetLink processing routine reads the current read index. The read index is an atomic variable used to indicate the position of the next buffer slot where log data can be read. Upon receiving the request message from user mode, the kernel-mode NetLink processing routine first checks whether the message type and length are valid. If the message is invalid, it is discarded, and an error response message is sent to user mode. If the message is valid, the corresponding lock-free circular buffer is searched based on the target buffer identifier in the message. After finding the buffer, the number of unread log data in the current buffer is calculated, which is the value of the write index minus the read index. If the number of unread log data is 0, an empty response message is sent to user mode. If the number of unread log data is greater than 0, the number of log data to be read is determined based on the preset maximum batch size. Then, starting from the slot pointed to by the read index, log data is read sequentially and copied into the message body of the NetLink response message. After reading is complete, the value of the read index is updated using an atomic addition operation, and a read memory barrier is inserted to ensure that the log data has been completely copied into the response message before updating the read index. Finally, the NetLink response message is sent to the user-space server.
[0032] Specifically, the NetLink processing routine retrieves log data in batches from the lock-free circular buffer based on the read index. The number of log data items retrieved in a batch can be adjusted according to a preset maximum batch size, for example, a maximum of 64 log data items can be retrieved at a time. After retrieving the log data, the NetLink processing routine encapsulates this log data into a NetLink response message and sends it to the user-space server. Upon receiving the NetLink response message, the user-space server parses the log data in the message and performs subsequent processing.
[0033] For example, suppose the NetLink request message constructed by the user-space server contains a target buffer identifier of 0. Upon receiving this message, the kernel-space NetLink processing routine searches for the lock-free circular buffer with the identifier 0 in the system. After finding the buffer, it reads the current read index. Assuming the current read index value is 512 and the write index value is 520, this indicates that there are 8 unread log data entries in the buffer. The NetLink processing routine will start reading the 8 log data entries sequentially from the slot with index 512, and then encapsulate these log data entries into a NetLink response message and send it to the user-space server. After sending, the NetLink processing routine will update the read index value to 520 so that the next read will start from the new position.
[0034] This application is not limited to this. In some embodiments, the user-space server can use a timed polling method to send NetLink request messages to the kernel space to periodically obtain log data from the kernel space. The polling interval can be dynamically adjusted according to the log generation rate. When the log generation rate is high, the polling interval is shortened to ensure the real-time performance of the logs; when the log generation rate is low, the polling interval is extended to reduce unnecessary NetLink communication and kernel trap overhead.
[0035] In an optional embodiment, such as Figure 4 As shown, the step of sending the log data to the log server via UDP protocol in single real-time mode or batch asynchronous mode, based on the characteristics of the log data, further includes: S201: Parse the log data obtained from the kernel mode and determine its corresponding event level; S202: If the event level is higher than the preset threshold, the UDP packet containing the log data will be sent to the log server immediately. S203: If the event level is not higher than the preset threshold, the log data is stored in the user-mode sending buffer. When the amount of data in the user-mode sending buffer reaches the preset capacity or the preset time interval expires, the multiple log data in the user-mode sending buffer are encapsulated into a UDP data packet and sent to the log server.
[0036] In this embodiment, the process of the user-space server sending log data to the log server via UDP protocol in single real-time mode or batch asynchronous mode, based on the characteristics of the log data, first involves parsing the log data obtained from the kernel space to determine its corresponding event level. The event level of the log data is set by the kernel-space module when the log data is generated; it indicates the importance of the log data. Common event levels can include debug, information, warning, error, and critical error levels. Different event levels correspond to different values; the higher the value, the higher the event level and the more important the log data. The division of event levels can be adjusted according to actual needs. For example, event levels can be divided into 0 to 7 levels, where level 0 represents an emergency, level 1 an alarm, level 2 a critical error, level 3 an error, level 4 a warning, level 5 a notification, level 6 information, and level 7 debug. Preset event level thresholds can be set through a configuration file. For example, the threshold can be set to level 3, meaning that log data with an event level higher than 3 needs to be sent in real-time, while log data with an event level no higher than 3 can be sent in batches.
[0037] It's important to note that the user-space server pre-sets an event level threshold to distinguish between log data that needs to be sent in real-time and log data that can be sent in batches. If the event level of the parsed log data is higher than the preset threshold, it indicates that the log data is important and needs to be sent to the log server for processing promptly. Therefore, the user-space server will immediately send a UDP packet containing this log data to the log server. If the event level of the parsed log data is not higher than the preset threshold, it indicates that the log data is relatively less important and can be sent in batches. Therefore, the user-space server will store this log data in the user-space send buffer. The user-space send buffer can be a circular buffer or a linked list. The advantage of a circular buffer is its simplicity and fast access speed; the advantage of a linked list is that its size can be dynamically adjusted without wasting memory space. In practical applications, the appropriate buffer structure can be chosen based on the log generation rate and memory resources.
[0038] Specifically, the user-space send buffer is a circular buffer allocated in user-space memory, used to temporarily store log data that needs to be sent in batches. When the amount of data in the user-space send buffer reaches a preset capacity, or when the time interval since the last send reaches a preset time interval, the user-space server will encapsulate multiple log data entries in the user-space send buffer into a single UDP packet and send it to the log server. The preset capacity and preset time interval can be configured according to the actual network conditions and log generation rate; for example, the preset capacity can be set to 64KB, and the preset time interval can be set to 1 second. When encapsulating the UDP packet, a total length field and a log count field need to be added to the packet header. The total length field indicates the length of the entire UDP packet, and the log count field indicates the number of log data entries contained in the packet. Then, the length and content of each log data entry are added to the packet sequentially. In this way, after receiving the UDP packet, the log server can first read the total length field and the log count field, and then, based on the log count field, sequentially read the length and content of each log data entry, thereby correctly parsing all the log data.
[0039] For example, suppose the preset event level threshold is error level. Log data with an event level higher than error level needs to be sent in real time, while log data with an event level no higher than error level can be sent in batches. When the user-space server parses a log data entry with a severe error level, it immediately encapsulates the log data into a UDP packet and sends it to the log server via a UDP socket. When the user-space server parses a log data entry with an information level, it stores the log data in the user-space send buffer. If the data volume in the user-space send buffer has reached the preset 64KB, the user-space server will immediately encapsulate all the log data in the buffer into a UDP packet and send it out. If the data volume has not yet reached the preset capacity, the user-space server will continue to wait until the time interval since the last transmission reaches 1 second before encapsulating and sending the log data in the buffer.
[0040] Those skilled in the art will understand that a UDP transmission method combining single-record real-time mode and batch asynchronous mode can balance real-time performance and throughput in log transmission. For important log data, single-record real-time mode ensures timely arrival at the log server; for less important log data, batch asynchronous mode reduces network transmission frequency and overhead, improving transmission efficiency. This approach solves the problem that a single transmission mode cannot simultaneously balance latency and throughput.
[0041] In an optional embodiment, the step of encapsulating multiple log data entries in the user-space send buffer into a single UDP packet and sending it to the log server includes: Determine whether the amount of data in the user-mode send buffer has reached the preset capacity; If so, immediately perform encapsulation and transmission; If not, it continuously checks whether the time interval since the last transmission has reached the preset time interval, and performs encapsulation and transmission when it does.
[0042] In this embodiment, the process of encapsulating multiple log data entries in the user-space send buffer into a single UDP packet and sending it to the log server first determines whether the amount of data in the user-space send buffer has reached a preset capacity. The preset capacity is the maximum amount of data the user-space send buffer can hold. When the amount of data in the buffer reaches the preset capacity, it indicates that the buffer is full and immediate transmission is required; otherwise, new log data cannot be stored in the buffer. The configuration of the preset capacity and preset time interval needs to comprehensively consider network bandwidth, log generation rate, and log real-time requirements. If the network bandwidth is large, the log generation rate is high, and the log real-time requirements are high, the preset capacity and preset time interval can be appropriately reduced, for example, setting the preset capacity to 32KB and the preset time interval to 0.5 seconds. If the network bandwidth is small, the log generation rate is low, and the log real-time requirements are not high, the preset capacity and preset time interval can be appropriately increased, for example, setting the preset capacity to 128KB and the preset time interval to 2 seconds.
[0043] It should be noted that if the judgment result indicates that the amount of data in the user-space send buffer has reached the preset capacity, the user-space server will immediately perform encapsulation and sending operations. Encapsulation refers to assembling multiple log data entries in the buffer into a single UDP packet according to a preset format. Each log data entry can be separated by a specific delimiter, or the length information of each log data entry can be added to the header of the packet so that the log server can correctly parse each log data entry. Sending refers to sending the encapsulated UDP packet to the log server via a UDP socket. When the amount of data in the user-space send buffer reaches the preset capacity, the user-space server will immediately perform encapsulation and sending operations and clear the send buffer. When the time interval since the last transmission reaches the preset time interval, the user-space server will also perform encapsulation and sending operations and clear the send buffer. This embodiment ensures that log data does not remain in the send buffer for too long and also avoids excessive network overhead caused by frequent sending operations.
[0044] Specifically, if the judgment result indicates that the amount of data in the user-space send buffer has not yet reached the preset capacity, the user-space server will continuously check whether the time interval since the last send has reached the preset time interval. The preset time interval is the maximum time interval between two batch sendes. When the time interval since the last send reaches the preset time interval, even if the amount of data in the buffer has not reached the preset capacity, it is still necessary to send the data to avoid the log data remaining in the buffer for too long, which would affect the real-time performance of the logs. When judging whether the time interval since the last send has reached the preset time interval, the system's timer function can be used. For example, in a Linux system, a timer can be created using a timer creation function. When the timer expires, an event is triggered. The user-space server can listen for this event through an event listener system call. When the event occurs, the send operation is performed. The method used in this embodiment is more efficient than polling to check the time interval and can reduce CPU usage.
[0045] For example, assume the user-space send buffer has a preset capacity of 64KB and a preset time interval of 1 second. After storing a message-level log data into the buffer, the user-space server first checks the amount of data in the buffer. Let's assume the data volume is currently 32KB, which is less than the preset capacity of 64KB. Then, the user-space server checks the time interval since the last transmission. Let's assume the last transmission was 0.5 seconds ago, which is less than the preset time interval of 1 second, so the user-space server will continue to wait. During this waiting period, if new log data is stored in the buffer, bringing the data volume to 64KB, the user-space server will immediately perform encapsulation and transmission operations. If the data volume in the buffer does not reach 64KB within the 1-second waiting period, the user-space server will encapsulate the 32KB log data in the buffer into a UDP packet and send it out when the 1-second time interval is reached.
[0046] This application is not limited to this. In some embodiments, when the user-mode server is shut down, all remaining log data in the user-mode send buffer will be forcibly encapsulated and sent to the log server to avoid log data loss. In addition, when the user-mode server detects poor network conditions, the preset time interval can be appropriately increased to reduce the number of transmissions and avoid network congestion; when the network conditions are good, the preset time interval can be appropriately decreased to improve the real-time performance of the logs.
[0047] In an optional embodiment, such as Figure 5 As shown, S300 further includes: S310: Extract key information from the log data, including the log generation module, log level, and feature values of the log content; S320: Input the key information into the Bloom filter for querying; S330: If the query result indicates that the key information may already exist, then the log data is determined to be a duplicate log and discarded; S340: If the query result indicates that the key information does not exist, then the log data is determined to be a non-duplicate log, it is placed in the asynchronous write queue, and the key information is added to the Bloom filter.
[0048] In this embodiment, the log server uses a Bloom filter to determine the duplicate nature of the received log data. Log data identified as non-duplicate is asynchronously written to storage, while duplicate log data is suppressed. The process begins by extracting key information from the log data. This key information uniquely identifies a log entry and includes the log generation module, log level, and log content feature values. The log generation module indicates which kernel or user-space module generated the log entry; the log level indicates the importance of the log entry; and the log content feature value is a fixed-length value obtained by hashing the log content, uniquely representing its characteristics. The log content feature value can be calculated using hash algorithms such as MD5, SHA-1, and SHA-256. To improve computation speed, lightweight hash algorithms like MurmurHash and CityHash can be chosen. These hash algorithms are fast and have low collision rates, making them suitable for calculating log content feature values.
[0049] It's important to note that after extracting key information, the log server inputs this information into a Bloom filter for querying. The Bloom filter contains a bit array and multiple hash functions. During the query, the key information is input into each of these hash functions, resulting in multiple hash values, each corresponding to a position in the bit array. Then, it checks if all the values at these positions in the bit array are 1. If all are 1, the key information may already exist in the Bloom filter; if any position has a value of 0, the key information definitely does not exist in the Bloom filter. The bit array of the Bloom filter can be implemented using a bitmap data structure. For example, in C, an unsigned char array can be used to represent the bit array, with each unsigned char occupying 8 bits and representing 8 positions. The choice of hash function needs to satisfy the characteristic of uniform distribution; that is, for different inputs, the hash function should output hash values with a uniform distribution, thus reducing the false positive rate of the Bloom filter.
[0050] Specifically, if the query result indicates that the key information may already exist, the log server will determine that the log data is a duplicate log and perform a suppression operation, i.e., discard the log data without further processing. If the query result indicates that the key information definitely does not exist, the log server will determine that the log data is a non-duplicate log, put it into the asynchronous write queue, and add the key information to the Bloom filter. The process of adding key information to the Bloom filter is similar to the query process; the key information is input into multiple hash functions to obtain multiple hash values, and then the values at these positions in the bit array are set to 1. When adding key information to the Bloom filter, the various fields of the key information need to be concatenated into a string, and then this string is input into the hash function for calculation. For example, the feature values of the log generation module, log level, and log content can be concatenated into a string, and then the hash value of this string can be calculated. In this way, different key information will generate different hash values, thereby ensuring that the Bloom filter can correctly distinguish different log data. In addition, to avoid the false positive rate of the Bloom filter increasing over time, a strategy of periodically resetting the Bloom filter can be adopted. For example, the Bloom filter can be reset every hour, setting all bits in the bit array to 0. This way, the Bloom filter will only detect duplicate log data from the most recent hour, effectively reducing the false positive rate. Furthermore, since duplicate logs are typically generated in large quantities within a short period, periodically resetting the Bloom filter will not affect the effectiveness of duplicate log suppression.
[0051] For example, suppose the Bloom filter uses three hash functions and the bit array is 1024 bits in size. When the log server receives a log entry, it extracts its key information and inputs it into the three hash functions respectively, obtaining hash values 100, 200, and 300. Then, it checks the values at bits 100, 200, and 300 of the bit array. If all three values are 1, it means the key information may already exist, and the log entry is considered a duplicate and discarded. If any of these three values is 0, it means the key information definitely does not exist, and the log entry is considered a non-duplicate. It is then placed in the asynchronous write queue, and the values at bits 100, 200, and 300 of the bit array are all set to 1.
[0052] Those skilled in the art will understand that the false positive rate of a Bloom filter is related to the size of the bit array and the number of hash functions. A larger bit array and more hash functions result in a lower false positive rate, but also increase memory usage and computational overhead. In practical applications, the appropriate bit array size and number of hash functions can be selected based on the allowable false positive rate and the amount of log data. For example, for a scenario with an allowable false positive rate of 1% and 1 million log records, the bit array size would be approximately 10MB, and the number of hash functions would be approximately 7.
[0053] In an optional embodiment, such as Figure 6 As shown, after suppressing log data identified as duplicates, the process also includes: S400: The intelligent analysis daemon collects newly written log data from the asynchronous write queue at variable polling intervals; S500: Input the collected log data into the preset analysis model for pattern matching to obtain risk analysis results; S600: Send the risk analysis results to the alarm platform, and trigger an alarm notification when the risk analysis results meet the rules associated with the preset alarm strategy.
[0054] In this embodiment, after suppressing log data identified as duplicates, the intelligent analysis daemon collects newly written log data from the asynchronous write queue at a variable polling interval. The intelligent analysis daemon is an independent process running in the background of the log server. Its main function is to perform real-time analysis of the stored log data to identify potential system risks and anomalies. The variable polling interval means that the interval at which the intelligent analysis daemon collects log data is not fixed but dynamically adjusted according to the log data generation rate. This ensures real-time analysis while reducing unnecessary system overhead. The intelligent analysis daemon can adopt a multi-threaded architecture, with one thread responsible for collecting log data from the asynchronous write queue, and multiple threads responsible for inputting the log data into the analysis model for pattern matching. This improves the processing capacity of the intelligent analysis daemon and adapts to large-scale log data processing.
[0055] It's important to note that after the intelligent analysis daemon collects newly written log data, it inputs the collected log data into a preset analysis model for pattern matching to obtain risk analysis results. The preset analysis model can be either a rule-based model or a machine learning-based model. A rule-based model predefines a series of anomaly rules; when log data matches a rule, the corresponding risk analysis result is triggered. A machine learning-based model is trained on a large amount of historical log data; it can automatically discover anomaly patterns in the log data and identify potential system risks. Rules for the rule-based analysis model can be defined through a configuration file. The rule format can include fields such as rule ID, rule name, rule description, matching conditions, and risk level. Matching conditions can be a comparison of log data fields with specific values, or a logical combination of multiple conditions. For example, a rule's matching conditions could be that the log level equals the error level, the log content contains disk failure-related fields, and the risk level is severe.
[0056] Specifically, after obtaining the risk analysis results, the intelligent analysis daemon sends them to the alarm platform. The alarm platform is a system for managing and processing alarm information. It receives alarm information from various systems and processes it according to preset alarm policies. When the risk analysis results meet the rules associated with the preset alarm policies, the alarm platform triggers an alarm notification, notifying relevant personnel via phone, SMS, email, etc., to promptly address system anomalies. When the intelligent analysis daemon collects a log data entry, it sequentially matches that log data with all rules. If the log data matches a rule, a corresponding risk analysis result is generated, including the rule ID, rule name, risk level, matched log data, and timestamp. The intelligent analysis daemon then sends this risk analysis result to the alarm platform. If the log data does not match any rules, no risk analysis result is generated. Machine learning-based analysis models can employ supervised learning algorithms, such as decision trees, random forests, and support vector machines. First, a large amount of historical log data needs to be collected and labeled to identify which log data is normal and which is abnormal. Then, the model is trained using the labeled dataset to obtain a trained model. During actual operation, the collected log data is input into the trained model, and the model outputs the probability that the log data is abnormal. If the probability exceeds a preset threshold, it is judged as abnormal, and a risk analysis result is generated.
[0057] For example, suppose a pre-defined rule in the analysis model states that if more than 10 identical error-level log entries appear within one minute, the system is considered to have an anomaly risk. After the intelligent analysis daemon collects new log data from the asynchronous write queue, it counts the number of identical error-level log entries within one minute. Assuming the count is 15, exceeding the rule's limit of 10, the analysis model outputs a risk analysis result indicating a system anomaly risk. The intelligent analysis daemon sends this risk analysis result to the alarm platform. The alarm platform checks the pre-defined alarm policy and finds that the risk analysis result meets the rules associated with the alarm policy, thus triggering an alarm notification by sending an SMS and email to the system administrator, informing them of the system anomaly risk.
[0058] This application is not limited to this. In some embodiments, the intelligent analysis daemon can also write the risk analysis results into log data and store them together with the original log data for subsequent querying and analysis. Furthermore, the alarm platform can support multi-level alarm policies, triggering different levels of alarm notifications based on the severity of the risk analysis results. For example, for severe system failures, a telephone alarm can be triggered; for general system warnings, an email alarm can be triggered.
[0059] In an optional embodiment, such as Figure 7 As shown, S400 further includes: S410: Get the rate at which new log entries are written within the previous statistical period; S420: Based on the generation rate and the preset negative correlation mapping relationship, dynamically adjust the interval of the next polling.
[0060] In this embodiment, the intelligent analysis daemon collects newly written log data from the asynchronous write queue at a variable polling interval. First, it obtains the log generation rate of newly written logs within the previous statistical period. The statistical period is the time interval used to calculate the log generation rate, for example, it can be set to 1 minute. The log generation rate refers to the amount of log data newly written to the asynchronous write queue within the statistical period, reflecting the current log generation status of the system. The length of the statistical period can be set according to the fluctuations in log generation. If the log generation rate fluctuates significantly, the statistical period can be appropriately shortened, for example, set to 30 seconds, to more promptly reflect changes in the log generation rate; if the log generation rate is relatively stable, the statistical period can be appropriately extended, for example, set to 2 minutes, to reduce statistical overhead.
[0061] It's important to note that after obtaining the log generation rate for the previous statistical period, the intelligent analysis daemon dynamically adjusts the next polling interval based on this rate and a preset negative correlation mapping. The negative correlation mapping means that a higher log generation rate results in a shorter next polling interval, and a lower log generation rate results in a longer next polling interval. This mapping ensures that when the log generation rate is high, the intelligent analysis daemon can collect log data more frequently to detect system anomalies promptly; when the log generation rate is low, the intelligent analysis daemon can reduce the number of collections, lowering system overhead. The negative correlation mapping can use non-linear functions, such as exponential or logarithmic functions, to achieve more flexible polling interval adjustments. For example, the polling interval can be defined as the maximum polling interval multiplied by the exponential function value, where the parameter of the exponential function is the negative log generation rate divided by a coefficient. Thus, when the log generation rate is low, the polling interval will increase rapidly; when the log generation rate is high, the polling interval will decrease rapidly.
[0062] Specifically, the pre-defined negative correlation mapping can be implemented using a function or a mapping table. For example, a linear function can be defined where the polling interval equals the maximum polling interval minus the log generation rate multiplied by a coefficient. A minimum and maximum polling interval can be set to avoid intervals that are too short or too long. Alternatively, a segmented mapping table can be defined, dividing the log generation rate into multiple intervals, each corresponding to a fixed polling interval. When calculating the next polling interval, the minimum and maximum polling intervals must also be considered. The minimum value prevents the polling interval from being too short, causing the intelligent analysis daemon to frequently collect log data and consume excessive CPU resources; the maximum value prevents the polling interval from being too long, preventing log data from being analyzed in a timely manner and affecting the real-time performance of anomaly detection. For example, the minimum polling interval can be set to 1 second, and the maximum to 30 seconds.
[0063] For example, suppose the statistical period is 1 minute, the maximum polling interval is 10 seconds, the minimum polling interval is 1 second, and the coefficient is 0.01. If the log generation rate in the previous statistical period was 500 records / minute, then the next polling interval will be 10 minus 500 multiplied by 0.01, which equals 5 seconds. If the log generation rate in the previous statistical period was 1000 records / minute, then the next polling interval will be 10 minus 1000 multiplied by 0.01, which equals 0 seconds, and the polling interval will be set to the minimum value of 1 second. If the log generation rate in the previous statistical period was 0 records / minute, then the next polling interval will be 10 seconds, which is the maximum value.
[0064] Those skilled in the art will understand that using a variable polling interval can strike a balance between the real-time nature of log analysis and system overhead. When the log generation rate is high, shortening the polling interval can promptly detect system anomalies and prevent the fault from escalating; when the log generation rate is low, extending the polling interval can reduce the CPU and memory usage of the intelligent analysis daemon and improve the overall system performance.
[0065] In an optional embodiment, such as Figure 8 As shown, S100 further includes: S101: The user-space client generates user-space logs that conform to the preset format specification by calling the dynamic library; S102: The user-mode client sends the user-mode log to the log server via the UDP protocol to perform duplicate detection and asynchronous writing.
[0066] In this embodiment, the process of generating and writing log data to an unlock-free circular buffer in kernel mode according to a preset format specification also includes a user-mode client generating user-mode logs conforming to the preset format specification by calling a dynamic library. The dynamic library is a pre-compiled shared library file that provides a set of interface functions for generating and sending log data. User-mode applications can link to this dynamic library and call its interface functions to record log data without needing to implement the log generation and sending logic themselves. The interface design of the dynamic library should be simple and easy to use, providing logging functions for different levels, each corresponding to a different log level. The parameters of the interface functions should include a module identifier, log content, and variable parameters so that user-mode applications can pass in formatted log content.
[0067] It's important to note that user-space logs generated by user-space clients also conform to a preset format specification, possessing the same structure as kernel-space log data, including fields such as timestamp, module identifier, log level, log content length, and log content. This allows the log server to use unified processing logic to handle both kernel-space and user-space logs, eliminating the need to write different processing code for log data from different sources and simplifying the log server's implementation. When sending log data, user-space clients can employ the same UDP dual-mode sending method as the user-space server. Specifically, user-space logs with event levels higher than a preset threshold are sent in single-login real-time mode; user-space logs with event levels no higher than the preset threshold are sent in batch asynchronous mode. This further improves the transmission efficiency of user-space logs.
[0068] Specifically, after the user-space client generates user-space logs conforming to a preset format, it sends the user-space logs to the log server via the UDP protocol. Upon receiving the user-space logs, the log server uses the same method as for kernel-space logs, employing a Bloom filter to determine if the user-space logs are duplicated. Non-duplicate user-space logs are asynchronously written to storage, while duplicate logs are suppressed. This way, user-space logs also benefit from duplicate log suppression, reducing unnecessary storage write operations. During initialization, the dynamic library creates a UDP socket and reads parameters from the configuration file, such as the log server's IP address and port number, event level threshold, preset capacity of the user-space send buffer, and preset time interval. Then, the dynamic library starts a background thread to handle batch asynchronously sent log data. When a user-space application calls a logging function, the dynamic library generates log data conforming to the preset format. If the event level of the log data is higher than the preset threshold, it is immediately sent to the log server via the UDP socket; otherwise, the log data is stored in the user-space send buffer. The background thread periodically checks the status of the send buffer. When the sending conditions are met, it encapsulates the log data in the buffer and sends it to the log server.
[0069] For example, if a user-space application needs to log an information-level message, it first calls the logging interface function provided by the dynamic library, passing in the module identifier and the log content. The interface function automatically obtains the current timestamp and assembles the timestamp, module identifier, log level, log content length, and log content into a user-space log according to a preset format specification. Then, the interface function sends the user-space log to the log server via a UDP socket. After receiving the user-space log, the log server extracts its key information, inputs it into a Bloom filter for querying, and if it determines that the log is non-duplicate, it puts it into an asynchronous write queue for storage; if it determines that the log is duplicate, it discards it.
[0070] For example, the following is a demonstration of a user-space application written in Python calling a dynamic library to log data. First, the user-space application needs to import the Python module provided by the dynamic library, and then create a logging module object to identify the application's module. Next, the user-space application can call different logging functions to record different levels of logs, such as informational logs and critical error logs. Inside the logging functions, log data conforming to a preset format is automatically generated and sent to the logging server via the UDP protocol. The user-space application can manage the lifecycle of the logging client through a context manager; within the context manager's scope, the logging client automatically initializes and cleans up resources. The user-space application first imports the necessary modules and functions, including the timing module and the context management functions, different logging levels, and module classes provided by the dynamic library. Then, a main module object is created, specifying the module name as `python`. Two functions are defined, one for recording informational logs and the other for recording critical error logs. In the function recording informational logs, a submodule object named `info` is created, and then the informational logging function is called, passing in the submodule object and the log content. In the function that logs critical errors, a submodule object named `critical` is created. Then, a relatively long log entry is generated, and the critical error logging function is called, passing in the submodule object and the formatted log entry. In the main program, a logging context is created using a context manager, with a buffer capacity of 64 slots. Then, the functions for logging informational logs and critical error logs are called sequentially. When the context manager's scope ends, the logging client automatically sends the remaining log data in the send buffer to the logging server and cleans up related resources.
[0071] Based on the hybrid log collection and alarm method in the above embodiments, this application also provides a hybrid log collection and alarm system for executing the hybrid log collection and alarm method in the above embodiments.
[0072] In this embodiment, a hybrid log collection and alarm system is used to execute the aforementioned hybrid log collection and alarm method. The system comprises four main parts: a kernel-mode module, a user-mode server, a log server, and an alarm platform. The kernel-mode module runs in the operating system kernel and is primarily responsible for generating kernel-mode log data and writing it to a lock-free circular buffer. The user-mode server runs in the operating system user mode and is primarily responsible for obtaining log data from the lock-free circular buffer in the kernel mode via the NetLink protocol and sending it to the log server via the UDP protocol in either single-login real-time mode or batch asynchronous mode, depending on the characteristics of the log data. The log server runs on a separate server or cluster and is primarily responsible for receiving log data from the user-mode server and user-mode clients, using a Bloom filter to determine the log data for duplicates, asynchronously writing non-duplicate log data to storage, and providing the log data to the intelligent analysis daemon for analysis. The alarm platform is primarily responsible for receiving risk analysis results sent by the intelligent analysis daemon and triggering alarm notifications according to preset alarm policies. The various modules of the system communicate with each other through standard network protocols and interfaces, exhibiting good scalability and compatibility. Kernel-mode modules can be compiled into kernel modules, which can be dynamically loaded and unloaded at system runtime without recompiling the kernel. User-mode servers and log servers can be compiled into executable files and deployed on different operating systems.
[0073] Those skilled in the art will understand that the various modules of the above system can be deployed and expanded according to actual application scenarios. For example, the log server can be deployed as a distributed cluster to support larger-scale log data processing; the intelligent analysis daemon can be deployed on a separate server to avoid impacting the performance of the log server; and the alarm platform can be integrated with existing operation and maintenance management systems to achieve unified alarm management.
[0074] Those skilled in the art will understand that the above description is merely a preferred embodiment of this application and is not intended to limit the scope of protection of this application. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of this application should be included within the scope of protection of this application. Further details are omitted here.
Claims
1. A hybrid log collection and alarm method, characterized in that, Includes the following steps: Log data is generated in kernel mode and written to an unlock-free circular buffer according to the preset format specification. The user-space server obtains the log data stored in the lockless circular buffer through the NetLink protocol, and sends it to the log server via UDP protocol in single real-time mode or batch asynchronous mode, depending on the characteristics of the log data. The log server uses a Bloom filter to determine the duplication of the received log data, and performs asynchronous writing and storage on log data that is determined to be non-duplicate, and performs suppression on log data that is determined to be duplicate. The lock-free circular buffer is a circular data structure that supports a multi-producer, single-consumer model and lock-free write operations.
2. The method according to claim 1, characterized in that, The process of generating and writing log data to a lock-free circular buffer in kernel mode includes: The kernel-mode module invokes a preset macro to generate a target format log containing the module identifier, log level, and content. Get the write index of the current unlocked circular buffer; Copy the target format log to the buffer slot pointed to by the write index; Update the write index.
3. The method according to claim 1, characterized in that, The user-space server retrieves the log data stored in the lock-free circular buffer via the NetLink protocol, including: The user-space server constructs a NetLink message and sends it to the kernel space. The message contains an identifier of the target buffer. The kernel-mode NetLink processing routine locates the lock-free circular buffer based on the identifier and reads the current read index; Based on the read index, log data is retrieved in batches from the lock-free circular buffer and encapsulated into a NetLink response message.
4. The method according to claim 1, characterized in that, The step of sending the log data to the log server via UDP protocol in either single real-time mode or batch asynchronous mode, based on the characteristics of the log data, includes: The log data obtained from the kernel space is parsed to determine its corresponding event level; If the event level is higher than a preset threshold, a UDP packet containing the log data will be immediately sent to the log server. If the event level is not higher than the preset threshold, the log data is stored in the user-mode sending buffer. When the amount of data in the user-mode sending buffer reaches the preset capacity or the preset time interval expires, the multiple log data in the user-mode sending buffer are encapsulated into a UDP data packet and sent to the log server.
5. The method according to claim 4, characterized in that, The step of encapsulating multiple log data entries in the user-space send buffer into a single UDP packet and sending it to the log server includes: Determine whether the amount of data in the user-mode send buffer has reached the preset capacity; If so, immediately perform encapsulation and transmission; If not, it continuously checks whether the time interval since the last transmission has reached the preset time interval, and performs encapsulation and transmission when it does.
6. The method according to claim 1, characterized in that, The log server uses a Bloom filter to determine the duplicateness of the received log data, asynchronously writes and stores log data that is determined to be non-duplicate, and suppresses log data that is determined to be duplicate, including: Extract key information from the log data, including the log generation module, log level, and feature values of the log content; Input the key information into the Bloom filter for querying; If the query results indicate that the key information may already exist, the log entry is determined to be a duplicate and discarded. If the query result indicates that the key information does not exist, then the log data is determined to be a non-duplicate log, it is placed in the asynchronous write queue, and the key information is added to the Bloom filter.
7. The method according to claim 6, characterized in that, After suppressing log data identified as duplicates, the process also includes: The intelligent analysis daemon collects newly written log data from the asynchronous write queue at variable polling intervals; The collected log data is input into a preset analysis model for pattern matching to obtain risk analysis results; The risk analysis results are sent to the alarm platform, and an alarm notification is triggered when the risk analysis results meet the rules associated with the preset alarm strategy.
8. The method according to claim 7, characterized in that, The step of collecting newly written log data from the asynchronous write queue at a variable polling interval includes: Get the rate at which new log entries are written within the previous statistical period; The interval between the next polling iterations is dynamically adjusted based on the generation rate and the preset negative correlation mapping relationship.
9. The method according to claim 1, characterized in that, The step of generating and writing log data to a lock-free circular buffer in kernel mode according to a preset format specification includes: The user-space client generates user-space logs that conform to the preset format specification by calling the dynamic library; The user-mode client sends the user-mode logs to the log server via the UDP protocol for duplicate detection and asynchronous writing.
10. A hybrid log acquisition and alarm system, characterized in that, Used to perform the method as described in any one of claims 1 to 9.