Compression, search and decompression of log messages

By using tokenization and lossless compression of log messages, the problem of high storage and search costs for massive log data is solved. It achieves efficient lossless compression and fast search, supports complex queries and custom analysis, and avoids dependence on source code.

CN112800008BActive Publication Date: 2026-06-16源维科技

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
源维科技
Filing Date
2020-11-16
Publication Date
2026-06-16

AI Technical Summary

Technical Problem

Existing technologies incur high storage and search costs when processing massive amounts of log data. Furthermore, compressed logs cannot be searched efficiently. Conventional compression tools result in slow decompression speeds, making it impossible to directly analyze compressed logs. Moreover, access to the source code of the program that generates the logs is required, leading to security and commercial limitations.

Method used

Log messages are tokenized and categorized into numeric and non-numeric expressions. They are then stored as compressed log messages using lossless compression technology, including timestamps, a dictionary of non-numeric expressions, and a dictionary of log types. The system supports searches without decompression and provides an API for custom analysis.

🎯Benefits of technology

It significantly reduces storage space requirements, increases search speed by 100-1000 times, supports complex queries and custom analysis, saves computing resources, and does not require access to the source code of the program that generates logs.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN112800008B_ABST
    Figure CN112800008B_ABST
Patent Text Reader

Abstract

Log messages are compressed, searched, and decompressed. A dictionary is used to store non-numeric expressions found in log messages. Numeric expressions and non-numeric expressions found in log messages are both represented by a placeholder, which is a string of log "type" information. Another dictionary is used to store the log type information. Compressed log messages contain a sequence of keys and values of the log type dictionary, which are keys and / or numeric values of the non-numeric dictionary. Searches can be performed by parsing the search query into subqueries against the dictionaries and / or contents of the compressed log messages. The dictionaries can reference segments containing many log messages, so all log messages need not be considered for some searches.
Need to check novelty before this filing date? Find Prior Art