A database monitoring operation and maintenance method and system based on time series data compression

By combining differential coding and Huffman coding, multi-level differential processing is performed on time-series data, which solves the problems of low compression rate and decompression complexity in existing technologies, and achieves efficient database monitoring and maintenance and real-time data management.

CN119512864BActive Publication Date: 2026-06-26BEIJING XINSHU TECH CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
BEIJING XINSHU TECH CO LTD
Filing Date
2024-11-04
Publication Date
2026-06-26

AI Technical Summary

Technical Problem

Existing time-series data compression technologies cannot meet the requirements of high compression ratio and fast decompression during database querying. The disk throughput of traditional databases has become a performance bottleneck, putting pressure on existing operation and maintenance management platforms. Furthermore, high compression ratio technologies cannot effectively handle useless fields and have weak aggregation capabilities.

Method used

A combination of differential coding and Huffman coding is used to perform multi-level differential processing on time-series data. By calculating the difference between adjacent data points and dynamic threshold coding, and combining relational database and distributed file system storage technologies, efficient data compression and fast decompression are achieved.

Benefits of technology

It improves data storage compression rate and processing efficiency, reduces storage costs, and enables real-time monitoring of database operation status and efficient data retrieval.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN119512864B_ABST
    Figure CN119512864B_ABST
Patent Text Reader

Abstract

The application provides a database monitoring operation and maintenance method and system based on time series data compression, which realizes real-time monitoring of various index data of a database platform through efficient data acquisition and compression technology. The system can efficiently collect various performance data generated during the operation of the database, and optimize the time series data through a compression algorithm to reduce the redundant storage of data. In the data storage and management process, the first-order difference sequence and the second-order difference sequence are calculated to reduce the amount of stored data. The use of difference removes the trend in the original data sequence, making the subsequent difference sequence easier to compress. At the same time, for non-constant difference, by setting a dynamic threshold, only the difference values greater than the threshold are recorded, and the small changes are ignored on the premise of not losing important information of the data, thereby improving the compression rate. On the other hand, by using S relation The regularity of the difference sequence is determined, and the compression strategy can be selectively applied, so that suitable compression methods can be applied to time series data with different characteristics, thereby improving the processing efficiency. Through the above design, the storage and processing efficiency of the compressed data are improved, the system can quickly decompress and analyze the data, and the real-time monitoring of the running state of the database is realized.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to a database monitoring and maintenance method and system based on time-series data compression, belonging to the field of database operation and maintenance. Background Technology

[0002] In today's information-driven world, an increasing amount of data is being automatically collected by various monitoring devices, resulting in a continuously expanding scale of time-series data. This places enormous pressure on the storage and management of this data. In actual production environments, the disk throughput of traditional databases gradually becomes the bottleneck for their overall performance. To ensure the read and write performance of database systems, using memory caching of recent data in conjunction with persistent disk storage has become the mainstream solution. Simultaneously, with the rapid increase in the number of monitoring devices, existing operation and maintenance management platforms are also facing new pressures and challenges.

[0003] Many compression techniques for time-series data exist, significantly reducing storage space and data transfer costs. However, these techniques cannot consistently maintain high compression ratios for all features and patterns. Existing high compression ratio techniques often require extensive decompression during database queries, sometimes even decompressing the entire dataset, failing to meet the actual retrieval needs of databases. Open TSDB uses a dictionary-like compression algorithm, reducing memory usage by encoding each tag in the index name and each sequence name. However, it still retains many useless fields that cannot be effectively compressed, and its aggregation capabilities are weak. The Delta-of-delta algorithm is a variable-length compression algorithm that must be calculated sequentially starting from the first address of the data block during decompression, increasing the complexity of decompression. Attached Figure Description

[0004] Figure 1 This is a system flowchart of the present invention. Summary of the Invention

[0005] To address the above problems, this invention proposes a database monitoring and maintenance method based on time-series data compression. Through efficient data acquisition and compression technologies, it enables real-time monitoring of various indicators of the database platform. The system can efficiently collect various performance data generated during database operation and optimize the time-series data using compression algorithms, reducing redundant data storage. The method includes the following steps:

[0006] 1. Data Collection: Collect relevant data from the database, including performance metrics, database activity, and log information.

[0007] 2. Database operation and maintenance management: Compress the collected time-series data, analyze the database operation status in real time, and promptly identify potential problems and issue early warnings.

[0008] 3. Display Operation and Maintenance Status: Display the database's operating status to users in a visual format.

[0009] Furthermore, in step (2), when the data storage and management submodule compresses the timestamps of time-series data, it includes the following steps:

[0010] 3.1 Input the original time series data T=[t1,t2,t3,…t n ,…t N Calculate the difference between adjacent data points, where N is the number of data points, n∈[1,N], to obtain the first-order difference sequence ΔT=[δ1,δ2,…,δ N-1 ], where δ i =t i+1 -t i , i = 1, 2, ..., N-1;

[0011] 3.2 Calculation in This represents the sample mean of the time series.

[0012] 3.3 If S relaction =1, the first-order difference sequence ΔT is a constant difference sequence, and the initial value t1, the difference δ1 of the first-order difference sequence, and the number of sequences N are recorded and stored;

[0013] 3.4 If S relaction ≠1, the first-order difference sequence ΔT is a non-constant difference, using Δ adaptive Encoded as a threshold: |δ i |≥Δ adaptive At that time, record δ i ,|δ i |<Δ adaptive When δ is ignored i ,in

[0014] Standard deviation of ΔT α and β are weight parameters; the second-level difference sequence ΔT2 = [γ1, γ2, ..., γ] is obtained by calculating the adjacent differences of the first-level difference sequence. n-2 ], where γ i =δ i+1 -δ i , i = 1, 2, ..., n-2; perform Huffman coding on the second-order difference sequence.

[0015] Based on the above method, this invention further proposes a database monitoring and maintenance system based on time-series data compression, which includes the following modules:

[0016] 1. Data Acquisition Module: This module collects relevant data from the database, including performance metrics, database activity, and log information.

[0017] 2. Database Operation and Maintenance Management Module: This module consists of a data storage and management submodule, a data decompression submodule, and a real-time analysis and early warning submodule. It compresses the collected time-series data, analyzes the database operation status in real time, and promptly detects potential problems and issues early warnings.

[0018] 3. Operation and Maintenance Status Display Module: This module displays the database's operating status to the user in a visual format.

[0019] Furthermore, in the database operation and maintenance management module, when the data storage and management submodule compresses the timestamps of time-series data, it includes the following steps:

[0020] 3.1 Input the original time series data T=[t1,t2,t3,…t n ,…t N Calculate the difference between adjacent data points, where N is the number of data points, n∈[1,N], to obtain the first-order difference sequence ΔT=[δ1,δ2,…,δ N-1 ], where δ i =t i+1 -t i , i = 1, 2, ..., N-1;

[0021] 3.2 Calculation in This represents the sample mean of the time series.

[0022] 3.3 If S relaction =1, the first-order difference sequence ΔT is a constant difference sequence, and the initial value t1, the difference δ1 of the first-order difference sequence, and the number of sequences N are recorded and stored;

[0023] 3.4 If S relaction ≠1, the first-order difference sequence ΔT is a non-constant difference, using Δ adaptive Encoded as a threshold: |δ i |≥Δ adaptive At that time, record δ i ,|δ i |<Δ adaptive When δ is ignored i ,in

[0024] Standard deviation of ΔT α and β are weight parameters; the second-level difference sequence ΔT2 = [γ1, γ2, ..., γ] is obtained by calculating the adjacent differences of the first-level difference sequence. n-2 ], where γ i =δ i+1 -δ i, i = 1, 2, ..., n-2; perform Huffman coding on the second-order difference sequence.

[0025] After completing the above compression algorithm, store the initial value t1 and the first difference δ1 of the first-level difference sequence. Use a suitable data storage structure, such as relational database, time-series database or distributed file system storage technology, to store the data in order to achieve efficient compressed data storage and fast retrieval.

[0026] This invention reduces the amount of data stored by calculating first-level and second-level difference sequences during data storage and management. It removes trends from the original data sequence using difference, making subsequent difference sequences easier to compress. Furthermore, for non-constant differences, a dynamic threshold is set to record only difference values ​​greater than the threshold, ignoring minor changes while ensuring that important data information is not lost, thus improving the compression ratio. On the other hand, through S... relation By identifying the regularity of the difference sequence, compression strategies can be selectively applied, allowing for the application of appropriate compression methods to time-series data with different characteristics, thereby improving processing efficiency. Through this design, the storage and processing efficiency of compressed data is improved, enabling the system to decompress and analyze data more quickly and achieve real-time monitoring of the database's operational status. Detailed Implementation

[0027] To achieve real-time monitoring of various indicators and data from the database platform and to compress time-series data to reduce redundant storage, this invention proposes a database monitoring and maintenance system based on time-series data compression. This platform mainly consists of the following components: data acquisition, responsible for obtaining various key indicators and operational status data from the database platform; database operation and maintenance management, responsible for receiving, storing, and analyzing data transmitted from the status acquisition devices; and operation and maintenance status display, a graphical interface for operation and maintenance personnel and managers, presenting complex monitoring data and analysis results through intuitive charts and dashboards. The main process of this system is as follows: Figure 1 As shown. The system flow design is as follows:

[0028] 1. Data Acquisition Module:

[0029] This module acquires various key metrics and operational status data from the database platform. It uses sensors, data acquisition devices, and monitoring probes to collect real-time information such as database performance parameters, resource usage, network traffic, and error logs.

[0030] 2. Database Operation and Maintenance Management Module

[0031] (1) Data storage and management

[0032] The collected data is categorized according to data type and purpose, including information such as performance parameters, resource usage, network traffic, and error logs. The collected data is then segmented based on a time standard.

[0033] Determine the time duration of each data segment. Starting from the start time of the time series data, collect data points belonging to that time period into a single data segment on a monthly basis. Place the first period at the start time. Starting from the start time, slide the window sequentially until the entire time range is covered. Each time the window moves one step, a new data segment is generated, and the data stream is processed continuously. For each data segment, record its start time, end time, and the number of data points it contains.

[0034] In addition, the platform's data storage and management module has designed a compression scheme for timestamps of time-series data, as shown below:

[0035] a. Input the original time series data T = [t1, t2, t3, ... t n ,…t N Calculate the difference between adjacent data points, where N represents the number of data points, n∈[1,N], to obtain the first-order difference sequence ΔT=[δ1,δ2,…,δ N-1 ], where δ i =t i+1 -t i , i = 1, 2, ..., N-1.

[0036] b. For a first-order difference sequence, determine the regularity of the differences by... This is the sample mean of the time series. The regularity of the difference series can be analyzed based on the results of this function.

[0037] c. If S relaction =1, the sequence is a constant difference sequence, then the initial value t1, the difference δ1 of the first-order difference sequence and the number of sequences N are recorded and stored.

[0038] d. If S relaction If the difference is ≠ 1, the sequence is a non-constant difference; therefore, Δ is used. adaptive Encoding as a threshold: when |δ i |≥Δ adaptive Then record δ i When |δ i |<Δ adaptive Ignore δ i The dynamic threshold formula is:

[0039] ,in The average value is α, and β are weighting parameters.

[0040] Then, the adjacent differences of the first-order difference sequence are calculated to obtain the second-order difference sequence:

[0041] ΔT2=[γ1,γ2,…,γ n-2 ], where γ i =δ i+1 -δ i Let i = 1, 2, ..., n-2. Huffman coding is applied to the second-order difference sequence to further reduce data redundancy.

[0042] After completing the above compression algorithm, store the initial value t1 and the first difference δ1 of the first-order difference sequence. Design a suitable data storage structure and use storage technologies such as relational databases, time-series databases, or distributed file systems.

[0043] (2) Data decompression

[0044] The steps for decompressing constant difference data are as follows: Input the initial compression information t1, difference δ1, and sequence length N;

[0045] a. Iteratively recover the sequence, iteratively calculating each subsequent data point: for each data point t i+1 Based on the previous data point t i Calculate: t i+1 =t i +δ1, repeat this step until all data points in the sequence are calculated.

[0046] b. After completing the above iterative calculations, output the recovered original sequence.

[0047] Non-constant difference data decompression: To recover the original time-series data from the compressed data, the following steps are required:

[0048] a. Input the initial value t1 of the compressed data storage, the first difference δ1 of the first-level difference sequence, and the second-level difference sequence after Huffman encoding.

[0049] b. Decode the Huffman encoded data to recover the second-order difference sequence ΔT2.

[0050] c. Restore the first-order difference sequence δ1=δ1,δ i+1 =δ i +γ i , where i = 1, 2, ..., n-2.

[0051] d. Restore the original time series data: t1 = t1, t i+1 =t i +δ i , where i = 1, 2, ..., n-1.

[0052] This algorithm dynamically adjusts the threshold based on data characteristics, enabling it to better adapt to data with different characteristics, improve compression efficiency and data restoration accuracy, and enhance data compression ratio and data fidelity.

[0053] (3) Real-time analysis and early warning

[0054] By leveraging machine learning algorithms and rule engines, the database's operational status can be analyzed in real time, and potential problems can be identified and warnings issued promptly.

[0055] 3. Operation and Maintenance Status Display Module

[0056] The intuitive charts and dashboard interface displays complex monitoring data and analysis results, mainly including the following sections:

[0057] Real-time monitoring: Displays the real-time operating status of the database, including key indicators such as CPU utilization, memory usage, disk I / O, and network traffic;

[0058] Historical data analysis: Trend analysis of historical data to show the patterns of database performance changes;

[0059] Warning notification: When the monitored data exceeds the preset threshold, the warning function will be displayed through the interface to notify relevant personnel.

[0060] To address the needs of in-memory time-series databases, the database operation and maintenance management module designs a timestamp compression scheme for time-series data. By dynamically adjusting thresholds and using multi-level differential encoding, it improves data compression efficiency and restoration accuracy, making it suitable for processing large amounts of high-frequency, complex, and changing time-series data, effectively reducing data storage costs. The operation and maintenance status display system uses intuitive charts and dashboards to display the real-time operating status of the database and trend analysis of historical data, helping operation and maintenance personnel understand the changing patterns of database performance. Through the organic combination of data acquisition magic, the database operation and maintenance management module, and the operation and maintenance status display, the database monitoring and operation and maintenance platform of this invention can achieve comprehensive, real-time, and intelligent monitoring and operation and maintenance management of the database platform. A more specific embodiment:

[0061] To better understand the practical application of database monitoring and maintenance systems, the following detailed explanation of the entire process is provided through specific examples, starting from data acquisition by the data acquisition module and continuing until data compression and decompression.

[0062] Suppose that status acquisition devices are running on the database platform, and these devices monitor the following key indicators in real time: CPU utilization, memory utilization, network traffic, etc.

[0063] In this example, assuming the collected metric is CPU utilization per minute (%), the raw time-series data obtained is as follows:

[0064] T=[("2024-05-2700:00:00",45),("2024-05-2700:01:00",48),("2024-05-2700:02:00",50),("2024-05-2700:03 :00",55),("2024-05-2700:04:00",60),("2024-05-2700:05:00",58),("2024-05-2700:06:00",56),("2024-05-2 700:07:00",54),("2024-05-2700:08:00",57),("2024-05-2700:09:00",59),("2024-05-2700:10:00",63),("2024-05-2700:11:00",65),("2024-05-2700:12:00",68),("2024-05-2700:13:00",70),("2024-05-2700:14:00",69)]

[0065] First, convert the timestamps to seconds from the start time, then calculate the first-order difference sequence of the timestamps.

[0066] The start time "2024-05-27 00:00:00" is converted to a timestamp of 0 seconds, and other timestamps are incremented by 60 seconds in sequence.

[0067] T seconds =[0,60,120,180,240,300,360,420,480,540,600,660,720,780,840]

[0068] Calculate the first-order difference sequence of timestamps: ΔT ts =[60,60,60,60,60,60,60,60,60,60,60,60,60,60]

[0069] The first-order difference sequence of the calculated value is: ΔV = [3,2,5,5,-2,-2,-2,3,2,4,2,3,2,-1]

[0070] Calculate the dynamic threshold Δ adaptive Assuming α = 0.5 and β = 0.5, calculate... and σ T ≈7.43, thus obtaining Δ adaptive ≈2.

[0071] Encoding a first-order difference sequence, when |δ i |≥Δ adaptiveThen record δ i That is, ΔT′=[3,5,5,-2,-2,3,-5,3,4,3]; calculate the headphone difference sequence ΔT2=[2,0,-7,0,5,-8,8,1]; perform Huffman coding on the second-order difference sequence ΔT2 to reduce data redundancy. This part of the coding process can be implemented using the Huffman coding algorithm.

[0072] The operations and maintenance status display system will graphically display the data obtained from the operations and maintenance management platform. It will utilize various charts and dashboards to display data such as CPU utilization, memory utilization, network traffic, and error logs.

[0073] This invention achieves efficient data compression and transmission through the above embodiments. It introduces a dynamic threshold, adjusting relevant parameter values ​​based on actual data fluctuations. For example, assuming α = 0.5 and β = 0.5 in the embodiments, this improves the data compression ratio. Compared to traditional time-series data compression methods and database operation and maintenance systems, the data compression method of this invention is fast. This invention only needs to store the Huffman-coded second-order difference sequence ΔT2, while the original time-series data is:

[0074] T=[("2024-05-2700:00:00",45),("2024-05-2700:01:00",48),("2024-05-2700:02:00",50),("2024-05-2700:03 :00",55),("2024-05-2700:04:00",60),("2024-05-2700:05:00",58),("2024-05-2700:06:00",56),("2024-05-27 00:07:00",54),("2024-05-27 00:08:00",57),("2024-05-27 00:09:00",59),("2024-05-27 00:10:00",63),("2024-05-27 00:11:00",65),("2024-05-27 00:12:00",68),("2024-05-27 00:13:00",70),("2024-05-27 00:14:00",69)]. In comparison, this algorithm significantly reduces memory requirements while substantially improving the performance and efficiency of the database monitoring and maintenance system.

[0075] The units, devices, or modules described in the above embodiments can be implemented by computer chips or physical entities, or by products with certain functions. For ease of description, the above devices are described by dividing them into various modules according to their functions. Of course, in implementing this application, the functions of each module can be implemented in one or more software and / or hardware, or the module that implements the same function can be implemented by a combination of multiple sub-modules or sub-units, etc. The device embodiments described above are merely illustrative. For example, the division of units is only a logical functional division, and there may be other division methods in actual implementation. For example, multiple units or components can be combined or integrated into another system, or some features can be ignored or not executed. Furthermore, the coupling or direct coupling or communication connection between the devices or units shown or discussed can be through some interfaces, and the indirect coupling or communication connection between the devices or units can be electrical, mechanical, or other forms.

[0076] Those skilled in the art will also know that, besides implementing the controller using purely computer-readable program code, the same functions can be achieved by logically programming the method steps, making the controller function as logic gates, switches, application-specific integrated circuits (ASICs), programmable logic controllers (PLCs), and embedded microcontrollers. Therefore, such a controller can be considered a hardware component, and the devices within it used to implement various functions can also be considered structures within that hardware component. Alternatively, the devices used to implement various functions can be considered as both software modules implementing the method and structures within a hardware component.

[0077] This application can be described in the general context of computer-executable instructions, such as program modules, that are executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, classes, etc., that perform a specific task or implement a specific abstract data type. This application can also be practiced in distributed computing environments where tasks are performed by remote processing devices connected via a communication network. In distributed computing environments, program modules can reside in local and remote computer storage media, including storage devices.

[0078] As can be seen from the above description of the embodiments, those skilled in the art can clearly understand that this application can be implemented by means of software plus necessary general-purpose hardware platforms. Based on this understanding, the technical solution of this application, in essence, or the part that contributes to the prior art, can be embodied in the form of a software product. This computer software product can be stored in a storage medium, such as ROM / RAM, magnetic disk, optical disk, etc., and includes several instructions to cause a computer device (which may be a personal computer, mobile terminal, server, or network device, etc.) to execute the methods described in various embodiments or some parts of the embodiments of this application.

[0079] The various embodiments in this specification are described in a progressive manner. Similar or identical parts between embodiments can be referred to interchangeably. Each embodiment focuses on its differences from other embodiments. This application can be used in numerous general-purpose or special-purpose computer system environments or configurations. Examples include: personal computers, server computers, handheld or portable devices, tablet devices, multiprocessor systems, microprocessor-based systems, set-top boxes, programmable electronic devices, network PCs, minicomputers, mainframe computers, and distributed computing environments including any of the above systems or devices, etc.

[0080] The specific embodiments described above further illustrate the purpose, technical solution, and beneficial effects of this application. It should be understood that the above descriptions are merely specific embodiments of this application and are not intended to limit the scope of protection of this application. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of this application should be included within the scope of protection of this application.

Claims

1. A database monitoring and maintenance method based on time-series data compression, characterized in that: The method includes the following steps: (1) Data collection: Collect relevant data from the database, including performance metrics, database activity and log information; (2) Database operation and maintenance management: Compress the collected time-series data, analyze the database operation status in real time, and promptly identify potential problems and issue early warnings; (3) Display the operation and maintenance status: Display the database operation status to users in a visual form; In step (2), when the data storage and management submodule compresses the timestamps of time-series data, it includes the following steps: (2.1) Input raw timing data Calculate the difference between adjacent data points, where N is the number of data points. The first-order difference sequence is obtained. ,in, , ; (2.2) Calculation ,in This represents the sample mean of the time series. (2.3) If First-order difference sequence For a constant difference sequence, the initial values ​​are recorded and stored. Difference of first-order difference sequences Number of sequences N; (2.4) If First-order difference sequence For non-constant difference sequences, use Encoded as a threshold: At that time, record , At that time, ignore ,in , Standard deviation , , The weight parameters are used to calculate the adjacent differences in the first-order difference sequence to obtain the second-order difference sequence. ,in , Huffman coding is performed on the second-order difference sequence.

2. A database monitoring and maintenance system based on time-series data compression, characterized in that: The system includes the following modules: (1) Data acquisition module: This module collects relevant data from the database, including performance indicators, database activity and log information; (2) Database operation and maintenance management module: This module consists of a data storage and management sub-module, a data decompression sub-module, and a real-time analysis and early warning module. It compresses the collected time-series data, analyzes the database operation status in real time, and promptly detects potential problems and issues early warnings. (3) Operation and maintenance status display module: This module displays the database's operating status to the user in a visual form; In the database operation and maintenance management module, when the data storage and management submodule compresses the timestamps of time-series data, it includes the following steps: (2.1) Input raw timing data Calculate the difference between adjacent data points, where N is the number of data points. The first-order difference sequence is obtained. ,in, , ; (2.2) Calculation ,in This represents the sample mean of the time series. (2.3) If First-order difference sequence For a constant difference sequence, the initial values ​​are recorded and stored. Difference of first-order difference sequences Number of sequences N; (2.4) If First-order difference sequence For non-constant difference sequences, use Encoded as a threshold: At that time, record , At that time, ignore ,in , Standard deviation , , The weight parameters are used to calculate the adjacent differences in the first-order difference sequence to obtain the second-order difference sequence. ,in , Huffman coding is performed on the second-order difference sequence.