A non-intrusive log compression method, system, device and storage medium
By obtaining the parameters of the log function through the eBPF application and compressing them in kernel mode, the problem of large storage space occupied by log files is solved, and efficient log file storage and traceability capabilities are achieved.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- YUNSHAN NETWORKS BEIJING INC
- Filing Date
- 2023-05-24
- Publication Date
- 2026-06-16
AI Technical Summary
In existing technologies, log files occupy a large amount of storage space, resulting in wasted storage space and affecting the efficiency of subsequent problem tracing.
The eBPF application obtains multiple parameters of the log function and compresses them in kernel mode. The parameters are then sorted and stored according to their length, data type, and order, achieving efficient compression of the log file.
It effectively reduces the storage space occupied by log files, improves the storage efficiency of log files, and reduces storage space requirements without affecting traceability.
Smart Images

Figure CN116501710B_ABST
Abstract
Description
Technical Field
[0001] This application relates to the technical field of computer application performance monitoring, and in particular to a non-intrusive log compression method, system, device and storage medium. Background Technology
[0002] Network devices, system platforms, and service programs running on them all generate an event record called a log, also known as a log file, during their operation.
[0003] Log files vary depending on the service application, and may include application logs, security logs, system logs, scheduler service logs, FTP logs, WWW logs, and DNS server logs. When users perform operations on a service application, these log files typically record relevant information about those operations. This information is highly useful for system security personnel, such as enabling source tracing. Therefore, log files are saved to facilitate tracing the source of problems should any issues arise in the service application later.
[0004] In the existing technology, in order to ensure the integrity of log files, log files are often stored directly in the storage space. As the service program runs for longer, the storage space occupied by the generated log files will also increase.
[0005] The existing technical solutions mentioned above have the following drawbacks: log files occupy a large amount of storage space. Summary of the Invention
[0006] To address the issue of large storage space requirements for log files, this application provides a non-intrusive log compression method, system, device, and storage medium.
[0007] In a first aspect of this application, an intrusion-free log compression method is provided. The method includes:
[0008] Get the call request for the logging function;
[0009] When the call request triggers a preset eBPF application, multiple parameters of the logging function are obtained;
[0010] The multiple parameters are compressed to obtain a compressed log, and the compressed log is stored.
[0011] As can be seen from the above technical solutions, when a request to call the logging function is received, it indicates that a log record will be generated. At the same time, the eBPF application will be triggered. The eBPF application can obtain multiple parameters of the logging function, compress the multiple parameters, and thus compress the log file, obtain the compressed log, and store it. This has the effect of reducing the storage space occupied by the log file.
[0012] In one possible implementation, obtaining multiple parameters of the logging function includes:
[0013] Obtain the execution location of the logging function;
[0014] The operating system is switched to kernel mode based on the execution location.
[0015] By setting a probe at the execution location, multiple parameters of the logging function can be obtained.
[0016] As can be seen from the above technical solutions, by triggering the eBPF application, the execution location of the logging function can be obtained. After obtaining the execution location, the operating system can be switched to kernel mode, and then multiple parameters of the logging function can be captured by the probe, providing a data foundation for the compression of log files.
[0017] In one possible implementation, switching the operating system to kernel mode based on the execution location includes:
[0018] When the execution reaches the location of the log function, the initial instruction at that location is modified to an interrupt instruction, and the operating system enters kernel mode.
[0019] In one possible implementation, after obtaining multiple parameters of the log function, the interrupt instruction is modified to an initial instruction, the operating system enters user mode, and the log function is called.
[0020] In one possible implementation, the parameters include a timestamp, a list of variable values, and a log type, wherein the list of variable values includes a dictionary list of variables and a non-dictionary list of variables.
[0021] In one possible implementation, compressing the plurality of parameters to obtain a compressed log and storing the compressed log includes:
[0022] Based on the parameters, the length, data type, and order of the parameters are obtained;
[0023] Based on the length and the data type, determine the compressed data after compression according to the parameters;
[0024] Based on the order of the multiple parameters, the multiple compressed data are sorted to obtain a compressed log and the compressed log is stored.
[0025] As can be seen from the above technical solutions, after obtaining the parameters, the parameters are compressed according to their length, data type, and order. Then, they are combined according to the compression order to obtain compressed logs and store them, thereby compressing the log files and reducing the storage space occupied by the log files.
[0026] In one possible implementation, the method further includes:
[0027] Obtain the text size of the compressed log;
[0028] Calculate the storage percentage of the compressed log, where the storage percentage = text size / preset space size;
[0029] When the storage percentage exceeds a preset value, a deletion prompt message is output, which includes the deletion time and log name.
[0030] Delete the compressed log corresponding to the log name based on the deletion time.
[0031] In a second aspect of this application, a non-intrusive log compression system is provided. The system includes:
[0032] The request retrieval module is used to retrieve call requests for the log function;
[0033] The parameter acquisition module acquires multiple parameters of the logging function when the call request triggers a preset eBPF application.
[0034] The log compression module is used to compress the multiple parameters to obtain compressed logs and store the compressed logs.
[0035] In a third aspect of this application, an electronic device is provided. The electronic device includes a memory and a processor, wherein the memory stores a computer program, and the processor executes the program to implement the method described above.
[0036] In a fourth aspect of this application, a computer-readable storage medium is provided having a computer program stored thereon that, when executed by a processor, implements the method according to the first aspect of this application.
[0037] In summary, this application includes at least one of the following beneficial technical effects:
[0038] 1. By obtaining the call request of the logging function, the eBPF application is triggered to obtain multiple parameters of the logging function. By compressing multiple parameters, the log file is compressed, and the compressed log is obtained and stored, which has the effect of reducing the storage space occupied by the log file.
[0039] 2. By triggering the eBPF application, the execution location of the logging function is obtained, and then the operating system is switched to kernel mode. Then, multiple parameters of the logging function are captured by the probe, providing a data foundation for the compression of the log file. Attached Figure Description
[0040] Figure 1 This is a flowchart illustrating the non-intrusive log compression method provided in this application.
[0041] Figure 2 This is a schematic diagram of the non-intrusive log compression system provided in this application.
[0042] Figure 3 This is a schematic diagram of the structure of the electronic device provided in this application.
[0043] In the diagram, 200 is the non-intrusive log compression system; 201 is the request acquisition module; 202 is the parameter acquisition module; 203 is the log compression module; 301 is the CPU; 302 is the ROM; 303 is the RAM; 304 is the I / O interface; 305 is the input section; 306 is the output section; 307 is the storage section; 308 is the communication section; 309 is the driver; and 310 is the removable media. Detailed Implementation
[0044] To make the objectives, technical solutions, and advantages of the embodiments of this application clearer, the technical solutions of the embodiments of this application will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of this application, not all embodiments. Based on the embodiments of this application, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of this application.
[0045] Furthermore, the term "and / or" in this article is merely a description of the relationship between related objects, indicating that three relationships can exist. For example, A and / or B can represent: A existing alone, A and B existing simultaneously, or B existing alone. Additionally, the character " / " in this article, unless otherwise specified, generally indicates that the preceding and following related objects have an "or" relationship.
[0046] Network devices, systems, and service programs all generate event records called logs, also known as log files, during operation. These logs record the program's running status, helping users and developers quickly identify problems when they occur. Log libraries allow various levels of logging for each statement (FATAL, ERROR, WARN, INFO, DEBUG, etc.). Multiple log files can be maintained.
[0047] As the service program runs, the number of log files increases. To limit the storage space occupied by log files, we currently reduce the storage space occupied by log files by periodically deleting or migrating them. However, this will result in the loss of some log files, which may negatively impact the subsequent problem tracing process.
[0048] For problems that occur during program execution, it's impossible to determine how much time prior to the problem's outbreak. For example, while users may not know when a problem will occur, its source can certainly be found in log files from the 24 hours prior to the outbreak. Therefore, retaining as many log files as possible helps in tracing the source of problems later. To save more log files and compress them, this application provides a non-intrusive log compression method.
[0049] The embodiments of this application will now be described in further detail with reference to the accompanying drawings.
[0050] This application provides a non-intrusion log compression method, the main process of which is described below.
[0051] like Figure 1 As shown:
[0052] Step S101: Obtain the call request for the logging function.
[0053] Specifically, when a user or developer runs a service program, each operation of the program triggers a logging function. Calling the logging function records the operation details. The aforementioned call request refers to the request to invoke the logging function during the execution of the service program.
[0054] Step S102: When the call request triggers the preset eBPF application, obtain multiple parameters of the logging function.
[0055] eBPF (extended Berkeley Packet Filter) is a technology that allows user-written programs to run within the Linux kernel without modifying the kernel code or loading kernel modules.
[0056] First, an eBPF program is written in C and compiled into an object file, i.e., an eBPF bytecode file, using Clang. Clang is a compiler that can compile eBPF programs. Then, an application needs to load the compiled eBPF bytecode into the kernel by calling a Linux kernel system call. During loading, the kernel verifies the eBPF program to ensure its safety; then it compiles the eBPF bytecode into machine code. Afterward, the kernel calls the corresponding functions according to the bound hook points. The aforementioned "calling the corresponding functions according to the bound hook points" refers to the logging function call request triggering the pre-defined eBPF application.
[0057] Specifically, the eBPF application mentioned above is pre-existing in the operating system kernel. When the eBPF application is triggered, the execution location of the logging function can be obtained. It's understandable that the logging function is a fixed function, but the parameters passed to it each time it's called are different, so the execution location obtained is the same each time the logging function is called. Normally, executing the instructions in the logging function in the order of the execution locations would complete the logging. However, to compress the log file, after obtaining the execution location, the instructions at that location are modified. The first instruction in the logging function, i.e., the initial instruction, is changed to an interrupt instruction. After being changed to an interrupt instruction, the program will execute the interrupt instruction, at which point the operating system switches from user mode to kernel mode. When the operating system enters kernel mode, it dispatches a probe, which can capture the parameters of the logging function. These parameters refer to the parameters passed to the logging function when it is called.
[0058] By setting a probe at the execution location, multiple parameters of the logging function are obtained. These parameters include a timestamp, a list of variable values, and a log type. The list of variable values includes both dictionary and non-dictionary variable lists. After obtaining these parameters, the interrupt instruction at the execution location is modified to an initial instruction, and the operating system enters user mode to continue calling the logging function.
[0059] When the operating system is in user mode, the memory space and objects a process can access are limited, and the processor it occupies can be preempted. When the operating system is in kernel mode, it can access all memory space and objects, and the processor it occupies cannot be preempted. Therefore, when retrieving parameters for the logging function, it is done in kernel mode. Only after the parameters are retrieved and the system switches back to user mode will the logging function continue execution.
[0060] The log type parameter in the log function parameters includes some repeated text content. Taking the log.Printf log function as an example: Define variables a and b, assign values to variables a and b, a is 0, and b is string. Call the log function and pass variables a and b into the log function, i.e., log.Printf("int = %v string = %v", a, b). Then, the output of the log function is 2023 / 05 / 17 02:20:21 int = 0 string = string. "2023 / 05 / 17 02:20:21" is the timestamp, indicating the time when the log function was called. Here, 0 and the second "string" are the values of variables a and b, and "timestamp int='value of variable a' string='value of variable b'" is the log type, which includes some repeated text, the repeated text being "int= string=". The Go standard library also has similar logging functions, such as `log.Fatalf` and `log.Panicf`. These differ from `log.Printf` in that they exit the process after printing the log. Similar logging functions exist in other third-party libraries, such as `go-logging.Infof`.
[0061] The following are some specific function call statements. The output of the following function calls is the same as the output of log.Printf above.
[0062] log.Fatalf("int = %v string = %v", a, b);
[0063] log.Panicf("int = %v string = %v", a, b);
[0064] logger := logging.MustGetLogger("example");
[0065] logger.Infof("int = %v string = %v", a, b);
[0066] S103: Compress multiple parameters to obtain compressed logs and store them.
[0067] Specifically, based on the aforementioned parameters, the length, data type, and order of the parameters are obtained. Based on the length and data type, the compressed data after compression is determined. These parameters include constants, variables, and timestamps, and storage is completed according to their data types. For example, consider a parameter 'c' with a value of 9223372036854775807. If the log is stored using the current log storage method, parameter 'c' would be saved as a string, requiring at least 19 bytes. However, if stored according to the parameter's data type (i.e., as a 64-bit integer), only 8 bytes are needed. Storing each parameter according to its corresponding data type achieves space-saving. Each data type has its standard storage format, which is well-known to those skilled in the art and will not be elaborated upon here. Storing data according to the format corresponding to the data type yields compressed data. Based on the order of the aforementioned parameters, the compressed data is sorted to obtain a compressed log, which is then stored. The parameters include parameter data. Specifically, the parameter data is obtained by getting the length of the parameter from the parameter address, i.e., how many bytes each parameter corresponds to, and then parsing it according to the parameter type to obtain the parameter data.
[0068] Understandably, most log files are text files with the .log extension, which can generally be opened and viewed using Notepad. Currently, if multiple log files exist, each containing multiple log records, compressing them typically involves creating a folder, placing all the log files within that folder, compressing the folder to create a compressed archive, thus reducing the storage space occupied by the log files.
[0069] File compression essentially establishes a mapping between the original text and the compressed text. If a log file containing multiple log records is compressed, the process requires iterating through the log file to identify duplicate entries and compressing them. However, this application obtains the parameters passed to the log function during the log function call. The data type of the parameters clearly identifies which parts are variables. Parameters within the variable value list are stored directly. When a parameter is a duplicate part of the log type, the index of that duplicate part is stored directly. For repeated entries, storing an integer index is more memory-efficient than storing large amounts of duplicate content. For log types with explicit duplicates, the compressed data (integer index) corresponding to the duplicate part can be directly retrieved and stored each time the log function is called, reducing the need for duplicate checks in the original compression method and improving the efficiency of log file compression.
[0070] It is understandable that when a parameter is in a list of variable values, it means that the value of the parameter is less likely to be repeated, so there is no need to compress it and it can be stored directly. When the parameter is a repeated part in a log type, the repeated part can be stored in a location, and when the repeated part is stored again, the corresponding integer index of the storage location of the repeated part can be directly obtained.
[0071] Current compression methods require developers to modify business logic code to achieve automatic, scheduled compression of log files. This application utilizes eBPF technology to monitor log function calls and then obtains log function parameters by switching the operating system from user mode to kernel mode. This achieves efficient log file compression without modifying the original business logic code and without intruding on the original code.
[0072] The compressed logs described above can be stored on local disk space or in cloud disk, and the storage location can be selected according to the user's actual needs.
[0073] Non-intrusion log compression methods also include:
[0074] Obtain the text size of the compressed log; calculate the storage percentage of the compressed log, where the storage percentage = text size / preset space size; when the storage percentage exceeds the preset value, output a deletion prompt message, which includes the deletion time and log name; delete the compressed log corresponding to the log name based on the deletion time.
[0075] Despite compressing log files before storage, the storage space required for log files still increases as the service program runs. Each time compressed logs are stored, the storage space occupied by the compressed log (i.e., the text size) is obtained, and the storage percentage is calculated. The preset space size is a value set by the user based on actual needs, representing the maximum storage space allowed for log files. It is then checked whether the storage percentage exceeds the preset percentage value. If it does, to limit the storage space occupied by log files within the preset size, historical log files need to be deleted, and a deletion prompt message is output to inform the user when and which log file will be deleted. The deletion time indicates when the system will delete the file, and the log name indicates the name of the log file to be deleted. This deletion time will be later than the time the prompt message is output. After receiving the prompt message, if the user needs to save the log file, they need to save it to another location. When the deletion time is reached, the compressed log corresponding to the specified log name is deleted. The above log names are obtained by sorting all compressed logs in the storage space in ascending order according to the storage time of the compressed logs, and then retrieving one or more log names based on the sorting results.
[0076] This application provides a non-intrusive log compression system 200, referring to... Figure 2 The non-intrusion log compression system 200 includes:
[0077] The request retrieval module 201 is used to retrieve call requests for the log function;
[0078] The parameter acquisition module 202 acquires multiple parameters of the logging function when the call request triggers a preset eBPF application.
[0079] The log compression module 203 is used to compress the multiple parameters to obtain a compressed log and store the compressed log.
[0080] Those skilled in the art will understand that, for the sake of convenience and brevity, the specific working process of the described module can be referred to the corresponding process in the foregoing method embodiments, and will not be repeated here.
[0081] This application discloses an electronic device. (Refer to...) Figure 3The electronic device includes a central processing unit (CPU) 301, which can perform various appropriate actions and processes based on a program stored in a read-only memory (ROM) 302 or a program loaded from a storage section 307 into a random access memory (RAM) 303. The RAM 303 also stores various programs and data required for system operation. The CPU 301, ROM 302, and RAM 303 are interconnected via a bus. An input / output (I / O) interface 304 is also connected to the bus.
[0082] The following components are connected to I / O interface 304: an input section 305 including a keyboard, mouse, etc.; an output section 306 including a cathode ray tube (CRT), liquid crystal display (LCD), etc., and speakers, etc.; a storage section 307 including a hard disk, etc.; and a communication section 308 including a network interface card such as a LAN card, modem, etc. The communication section 308 performs communication processing via a network such as the Internet. A drive 309 is also connected to I / O interface 304 as needed. A removable medium 310, such as a disk, optical disk, magneto-optical disk, semiconductor memory, etc., is installed on drive 309 as needed so that computer programs read from it can be installed into storage section 307 as needed.
[0083] Specifically, according to embodiments of this application, the flowchart above refers to... Figure 1 The described process can be implemented as a computer software program. For example, embodiments of this application include a computer program product comprising a computer program carried on a machine-readable medium, the computer program containing program code for performing the methods shown in the flowchart. In such embodiments, the computer program can be downloaded and installed from a network via communication section 308, and / or installed from removable medium 310. When the computer program is executed by central processing unit (CPU) 301, it performs the functions defined in the apparatus of this application.
[0084] It should be noted that the computer-readable medium shown in this application can be a computer-readable signal medium or a computer-readable storage medium, or any combination of the two. A computer-readable storage medium can be, for example,—but not limited to—an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof. More specific examples of a computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer disk, a hard disk, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination thereof. In this application, a computer-readable storage medium can be any tangible medium containing or storing a program that can be used by or in conjunction with an instruction execution system, apparatus, or device. In this application, a computer-readable signal medium can include a data signal propagated in baseband or as part of a carrier wave, carrying computer-readable program code. Such propagated data signals can take various forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination thereof. Computer-readable signal media can also be any computer-readable medium other than computer-readable storage media, which can send, propagate, or transmit a program for use by or in connection with an instruction execution system, apparatus, or device. The program code contained on the computer-readable medium can be transmitted using any suitable medium, including but not limited to: wireless, wire, optical fiber, RF, etc., or any suitable combination thereof.
[0085] The above description is merely a preferred embodiment of this application and an explanation of the technical principles employed. Those skilled in the art should understand that the scope of this application is not limited to technical solutions formed by specific combinations of the above-described technical features, but should also cover other technical solutions formed by arbitrary combinations of the above-described technical features or their equivalents without departing from the foregoing application concept. For example, technical solutions formed by substituting the above features with (but not limited to) technical features with similar functions claimed in this application.
Claims
1. A non-intrusive log compression method, characterized in that, include: Get the call request for the logging function; When the call request triggers a preset eBPF application, multiple parameters of the logging function are obtained; The multiple parameters are compressed to obtain a compressed log, and the compressed log is stored. After a log file containing multiple log records is created, the process of compressing the log file includes: Obtain the parameters passed to the logging function when the logging function is called; By knowing the data type of the parameters, we can identify which part of these parameters is a variable, and when a parameter is in the variable value list, it can be stored directly. When the parameter is a duplicate part in the log type, the index of the duplicate part will be stored. For parts that appear multiple times, an integer index will be saved.
2. The non-intrusion log compression method according to claim 1, characterized in that, The process of obtaining multiple parameters of the log function includes: Obtain the execution location of the logging function; The operating system is switched to kernel mode based on the execution location. By setting a probe at the execution location, multiple parameters of the logging function can be obtained.
3. The non-intrusion log compression method according to claim 2, characterized in that, The step of switching the operating system to kernel mode based on the execution location includes: When the execution reaches the location of the log function, the initial instruction at that location is modified to an interrupt instruction, and the operating system enters kernel mode.
4. The non-intrusion log compression method according to claim 3, characterized in that, After obtaining multiple parameters of the log function, the interrupt instruction is modified to an initial instruction, the operating system enters user mode, and the log function is called.
5. The non-intrusion log compression method according to claim 1, characterized in that, The parameters include timestamp, a list of variable values, and log type. The list of variable values includes a dictionary list of variables and a non-dictionary list of variables.
6. The non-intrusion log compression method according to claim 1, characterized in that, The step of compressing the multiple parameters to obtain a compressed log and storing the compressed log includes: Based on the parameters, the length, data type, and order of the parameters are obtained; Based on the length and the data type, determine the compressed data after compression according to the parameters; Based on the order of the multiple parameters, the multiple compressed data are sorted to obtain a compressed log and the compressed log is stored.
7. The non-intrusion log compression method according to claim 1, characterized in that, The method also includes: Obtain the text size of the compressed log; Calculate the storage percentage of the compressed log, where the storage percentage = text size / preset space size; When the storage percentage exceeds a preset value, a deletion prompt message is output, which includes the deletion time and log name. Delete the compressed log corresponding to the log name based on the deletion time.
8. A non-intrusive log compression system, characterized in that, include: The request retrieval module is used to retrieve call requests for the log function; The parameter acquisition module acquires multiple parameters of the logging function when the call request triggers a preset eBPF application. The log compression module is used to compress the multiple parameters to obtain a compressed log and store the compressed log; The process of compressing a log file after it has been created, which contains multiple log records, includes: Obtain the parameters passed to the logging function when the logging function is called; By knowing the data type of the parameters, we can identify which part of these parameters is a variable, and when a parameter is in the variable value list, it can be stored directly. When the parameter is a duplicate part in the log type, the index of the duplicate part will be stored. For parts that appear multiple times, an integer index will be saved.
9. An electronic device, characterized in that, It includes a memory and a processor, wherein the memory stores a computer program that can be loaded by the processor and executed according to any one of claims 1 to 7.
10. A computer-readable storage medium, characterized in that, The computer program is stored that can be loaded by a processor and executed according to any one of claims 1 to 7.