System and method for accelerating reading compressed files based on a virtual file system

By introducing a unified interface between AVFS and ZFS into the ZNBase database, compressed file data can be read directly, solving the problem of high decompression overhead in ZNBase when processing time-series data, improving system performance and processing capabilities, and supporting the parallel execution of multiple compression algorithms.

CN115757284BActive Publication Date: 2026-06-23上海沄熹科技有限公司

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
上海沄熹科技有限公司
Filing Date
2022-10-24
Publication Date
2026-06-23

Smart Images

  • Figure CN115757284B_ABST
    Figure CN115757284B_ABST
Patent Text Reader

Abstract

The application discloses a system and method for accelerating reading compressed files based on A Virtual File System, and belongs to the technical field of time series databases; the technical problem to be solved by the application is how to realize column format compressed storage and reading of time series data types, reduce IO, memory and CPU consumption caused by decompression, and improve the ability of the system to process time series data; the technical scheme adopted is as follows: in the time series engine of the open database ZNBase, an execution engine generates an execution plan according to SQL semantic analysis, calls a storage layer interface of ZNBase, and queries accurate time series history partition compressed data; the storage layer completes fast execution of the query in the storage layer based on various compressed query interfaces provided by AVFS, and accelerates the query of the database; the specific steps are as follows: pre-analysis; compressed file pre-reading; reading compressed data.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of time-series database technology, specifically a system and method for accelerating the reading of compressed files based on a Virtual File System. Background Technology

[0002] The rapid development of the digital industry and the deep integration of new infrastructure have provided a favorable environment for the accelerated development of the database industry. In the era of the Industrial Internet, the scale of data is exploding, and data storage structures are becoming increasingly flexible and diverse, driving the continuous evolution of database technology. In the existing business database ZNBase, the storage engine is based on key-value storage, primarily oriented towards transaction processing, i.e., OLTP (Online Transaction Processing) services. As the application scenarios of the Industrial Internet become increasingly widespread, a large amount of time-series data will be generated. Therefore, how to achieve column-formatted compression storage and retrieval of time-series data types, reduce the IO, memory, and CPU consumption caused by decompression, and improve the system's ability to process time-series data is a pressing technical problem that needs to be solved. Summary of the Invention

[0003] The technical objective of this invention is to provide a system and method for accelerating the reading of compressed files based on a Virtual File System, in order to solve the problem of how to achieve column-format compression storage and reading of time-series data types, reduce the IO, memory, and CPU consumption caused by decompression, and improve the system's ability to process time-series data.

[0004] The technical objective of this invention is achieved as follows: a system for accelerating the reading of compressed files based on a Virtual File System (AVFS). This system includes a ZNBase execution layer and a ZNBase storage layer. The ZNBase execution layer reads compressed files based on AVFS, accelerates SQL querying, and parses and analyzes the SQL statements. It passes the table and partition range to be queried to the ZNBase storage layer by calling the ZNBase storage layer's query interface. The ZNBase storage layer obtains the table and corresponding partition file to be queried through the metadata passed from the execution layer, calculates the elements at the starting position of the read operation, calls the ZFS Read operation, reads the relevant data, converts the column format data to row format, and returns the result to the ZNBase execution layer.

[0005] As a preferred approach, AVFS defines a series of file read / write interfaces that can directly read compressed file data without decompressing the compressed file and return the same data format as after decompression.

[0006] As a preferred option, ZFS is a unified encapsulation interface for reading and writing underlying files in the ZNBase storage layer, and it encapsulates the read and write operations of compressed and uncompressed files into a unified interface;

[0007] The structure of ZFS is defined using pure virtual functions in C++.

[0008] Even better, AVFS defines multiple interfaces for files, including virt_open, virt_lseek, and virt_read, and is fully compatible with the operating system's open, lseek, and read system calls. It also supports compressed files using various common compression algorithms such as zip, gzip, tar, and rar.

[0009] A method for accelerating the reading of compressed files based on a Virtual File System (AVFS) is proposed. This method involves the execution engine of the ZNBase time-series database generating an execution plan based on SQL semantic analysis, calling the ZNBase storage layer interface, and retrieving accurate time-series historical partition compressed data. The storage layer utilizes various compression query interfaces provided by AVFS to achieve rapid execution of queries within the storage layer, thus accelerating database queries. The details are as follows:

[0010] Pre-parse;

[0011] Pre-reading of compressed files;

[0012] Read compressed data.

[0013] As a preferred option, the pre-parsing is as follows:

[0014] The AVFS-provided open file interface generates the corresponding filename according to the AVFS filename format requirements;

[0015] The relevant metadata information of the compressed file is read by calling open and / or read through parse, and stored in the corresponding data structure in memory as the basis for subsequent data reading.

[0016] As a preferred method, the pre-reading of compressed files is as follows:

[0017] Based on the offset specified by Read, read the compressed data sequentially from the beginning of the compressed file and decompress the data into memory;

[0018] Determine if the total length of the decompressed data exceeds the offset:

[0019] If not, continue reading compressed data until the total length of the decompressed data exceeds the offset;

[0020] If so, copy the remaining decompressed data from memory to the result set.

[0021] As a preferred method, the compressed data is read as follows:

[0022] Copy the remaining data pre-read from the compressed file to the result set;

[0023] Determine if reading has finished or the end of the file has been reached:

[0024] If the number of bytes to be read has been met or the end of the compressed file has been reached, the result set is returned directly.

[0025] If not, continue reading and decompressing the compressed file until the conditions are met and a result set is returned.

[0026] Even better, when storing time-series data, the time-series engine compresses records according to columnar storage format and stores them in data files. At the same time, it partitions and compresses the data files according to time periods (the compression algorithm is optional) to generate compressed files of historical partitions.

[0027] When a query requires reading historical partition compressed data files, instead of decompressing the entire file, the compressed file is read directly through the compression query interface provided by AVFS. This reduces the decompression time of compressed files and reduces I / O, thereby improving processing efficiency.

[0028] A computer-readable storage medium storing a computer program that can be executed by a processor to implement the method described above for accelerating the reading of compressed files based on a Virtual File System.

[0029] The system and method for accelerating the reading of compressed files based on a Virtual File System, as described in this invention, have the following advantages:

[0030] (i) This invention improves query performance by adding a time-series engine to the service database storage engine to achieve column-format compression storage and retrieval of time-series data types;

[0031] (ii) The ZNBase execution layer of the present invention reads relevant data from different partitions by calling the ZNBase storage layer interface through the execution plan, thereby reducing the IO, memory and CPU consumption caused by decompression, supporting parallel execution and ultimately improving the system's ability to process time-series data;

[0032] (III) The ZNBase execution layer of this invention parses the input SQL statement and calls the storage layer interface to read the data according to the data range to be queried by the SQL. The storage layer reads compressed files based on AVFS, which accelerates the query of SQL statements. The specific advantages of using AVFS are as follows:

[0033] ①AVFS defines interfaces such as virt_open, virt_lseek, and virt_read, which have the same parameters as system calls such as open, lseek, and read. Compressed files can be read directly through these interfaces, and the format of the read results is the same as the format of the decompressed data.

[0034] ② The relevant interfaces can support concurrent execution, and can make full use of the advantages of modern CPUs to perform multi-threaded parallel execution;

[0035] ③ Supports multiple compressed file formats, including zip, gzip, tar, rar and other commonly used compression algorithms;

[0036] (iv) AVFS defines a series of file read and write interfaces, which can directly read compressed files to obtain decompressed data without decompressing them. It can make full use of the advantages of memory and CPU to achieve parallel processing, while reducing a lot of IO and improving system performance. At the same time, AVFS supports compressed files with various compression algorithms, which can support applications in various different scenarios.

[0037] (v) The ZNBase storage layer uses a unified ZFS interface to implement a unified read and write interface, which solves the problem of interface redundancy caused by the difference between reading compressed and uncompressed files at the underlying level. It is very user-friendly for interface callers, shields the complex internal implementation, and simplifies the complexity of the system. In subsequent development, it is not necessary to consider whether the underlying file being operated on is a compressed or uncompressed file.

[0038] (vi) This invention shields the storage layer from the interface differences caused by the distinction between reading and writing compressed and uncompressed files, thus simplifying the interface complexity. Attached Figure Description

[0039] The invention will be further described below with reference to the accompanying drawings.

[0040] Appendix Figure 1 This is a schematic diagram of the structure of a system that accelerates the reading of compressed files based on a Virtual File System;

[0041] Appendix Figure 2 This is a flowchart illustrating the process of accelerating the reading of compressed files based on a Virtual File System. Detailed Implementation

[0042] The system and method for accelerating the reading of compressed files based on a Virtual File System according to the present invention will be described in detail below with reference to the accompanying drawings and specific embodiments.

[0043] Example 1:

[0044] As attached Figure 1 As shown, this embodiment provides a system for accelerating the reading of compressed files based on A Virtual File System. The system includes a ZNBase execution layer and a ZNBase storage layer. The ZNBase execution layer reads compressed files based on AVFS (A Virtual File System), accelerates SQL statement queries, and parses and analyzes the SQL statements. By calling the ZNBase storage layer query interface, the table and partition range to be queried are passed to the ZNBase storage layer. The ZNBase storage layer obtains the table and corresponding partition file to be queried through the metadata passed from the execution layer, calculates the elements at the starting position of the read, calls the ZFS Read operation, reads the relevant data, converts the column format data into row format, and returns the result to the ZNBase execution layer.

[0045] In this embodiment, AVFS defines a series of file read and write interfaces that can directly read compressed file data without decompressing the compressed file and return the same data format as after decompression.

[0046] In this embodiment, ZFS is a unified encapsulation interface for reading and writing underlying files in the ZNBase storage layer. The key code is as follows:

[0047]

[0048] ZFS encapsulates the read and write operations of compressed and uncompressed files into a unified interface. The structure of ZFS is defined using pure virtual functions in C++. Its advantages are that it shields the storage layer from the interface differences caused by the difference between reading and writing compressed and uncompressed files, thus simplifying the interface complexity.

[0049] In this embodiment, AVFS defines multiple interfaces for files, including `virt_open`, `virt_lseek`, and `virt_read`, and is fully compatible with the operating system's `open`, `lseek`, and `read` system calls. It also supports compressed files using various common compression algorithms such as zip, gzip, tar, and rar. AVFS is an implementation that encapsulates the POSIX `open`, `lseek`, and `read` interfaces to directly read uncompressed files. From a system architecture perspective, AVFS can directly read decompressed uncompressed data from compressed files, simplifying the system architecture, improving parallelism, and enhancing overall system performance.

[0050] Example 2:

[0051] As attached Figure 2As shown, this embodiment provides a method for accelerating the reading of compressed files based on a Virtual File System. This method involves the execution engine in the ZNBase time-series database generating an execution plan based on SQL semantic analysis, calling the ZNBase storage layer interface, and querying accurate time-series historical partition compressed data. The storage layer, based on various compression query interfaces provided by AVFS, completes the fast execution of queries within the storage layer, accelerating database queries. Specifically:

[0052] S1, the AVFS-provided open file interface generates the corresponding filename according to the AVFS filename format requirements; and reads the relevant metadata information of the compressed file by calling open and / or read through parse, and stores the relevant metadata information in the corresponding data structure in memory as the basis for subsequent data reading;

[0053] S2. Based on the offset specified by Read, read the compressed data sequentially from the beginning of the compressed file and decompress the data into memory;

[0054] S3. Determine if the total length of the decompressed data exceeds the offset:

[0055] ① If not, continue reading the compressed data until the total length of the decompressed data exceeds the offset;

[0056] ② If so, proceed to step S4;

[0057] S4. Copy the remaining data pre-read from the compressed file to the result set;

[0058] S5. Determine whether reading has been completed or the end of the file has been reached:

[0059] ① If the number of bytes to be read has been met or the end of the compressed file has been reached, then proceed to step S6;

[0060] ② If not, continue reading and decompressing the compressed file data until the conditions are met and a result set is returned;

[0061] S6. Directly return the result set.

[0062] In this embodiment, the time series engine compresses and stores the records in a columnar storage format to a data file when storing time series data. At the same time, it partitions and compresses the data file according to time periods (the compression algorithm is optional) to generate compressed files of historical partitions.

[0063] When a query requires reading historical partition compressed data files, instead of decompressing the entire file, the compressed file is read directly through the compression query interface provided by AVFS. This reduces the decompression time of compressed files and reduces I / O, thereby improving processing efficiency.

[0064] Example 3:

[0065] This embodiment also provides a computer-readable storage medium storing multiple instructions, which are loaded by a processor to cause the processor to execute the method for accelerating the reading of compressed files based on a Virtual File System according to any embodiment of the present invention. Specifically, a system or apparatus equipped with a storage medium may be provided, on which software program code implementing the functions of any of the above embodiments is stored, and the computer (or CPU or MPU) of the system or apparatus may read and execute the program code stored in the storage medium.

[0066] In this case, the program code read from the storage medium can itself implement the function of any of the above embodiments, and therefore the program code and the storage medium storing the program code constitute part of the present invention.

[0067] Storage media embodiments for providing program code include floppy disks, hard disks, magneto-optical disks, optical disks (such as CD-ROM, CD-R, CD-RW, DVD-ROM, DVD-RYM, DVD-RW, DVD+RW), magnetic tapes, non-volatile memory cards, and ROMs. Alternatively, program code can be downloaded from a server computer via a communication network.

[0068] Furthermore, it should be clear that not only can the program code read by the computer be executed, but also the operating system or other components operating on the computer can be instructed based on the program code to perform some or all of the actual operations, thereby realizing the function of any of the embodiments described above.

[0069] Furthermore, it is understood that the program code read from the storage medium is written to the memory set in the expansion board inserted into the computer or to the memory set in the expansion unit connected to the computer. Then, based on the instructions of the program code, the CPU or other components installed on the expansion board or expansion unit execute some and all of the actual operations, thereby realizing the function of any of the embodiments described above.

[0070] Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention, and not to limit them; although the present invention has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that modifications can still be made to the technical solutions described in the foregoing embodiments, or equivalent substitutions can be made to some or all of the technical features; and these modifications or substitutions do not cause the essence of the corresponding technical solutions to deviate from the scope of the technical solutions of the embodiments of the present invention.

Claims

1. A system for accelerating the reading of compressed files based on a Virtual File System, characterized in that, The system includes a ZNBase execution layer and a ZNBase storage layer. The ZNBase execution layer reads compressed files based on AVFS to accelerate SQL querying and parses and analyzes the SQL statements. By calling the query interface of the ZNBase storage layer, the table and partition range to be queried are passed to the ZNBase storage layer. The ZNBase storage layer obtains the table and corresponding partition file to be queried through the metadata passed by the execution layer, calculates the elements at the starting position of the read, calls the ZFS Read operation, reads the relevant data, converts the column format data into row format, and returns the result to the ZNBase execution layer. AVFS defines a series of file read and write interfaces, which can directly read compressed file data without decompressing the compressed file and return the data in the same format as after decompression; at the same time, AVFS supports compressed files with multiple compression algorithms and can support applications in a variety of different scenarios. ZFS is a unified encapsulation interface for reading and writing underlying files in the ZNBase storage layer, and it encapsulates the read and write operations of compressed and uncompressed files into a unified interface.

2. The system for accelerating the reading of compressed files based on a Virtual File System according to claim 1, characterized in that, The structure of ZFS is defined using pure virtual functions in C++.

3. The system for accelerating the reading of compressed files based on a Virtual File System according to claim 1 or 2, characterized in that, AVFS defines multiple interfaces for files, including virt_open, virt_lseek, and virt_read, and is fully compatible with the operating system's open, lseek, and read system calls. It also supports compressed files using various common compression algorithms such as zip, gzip, tar, and rar.

4. A method for accelerating the reading of compressed files based on a Virtual File System, characterized in that, This method utilizes the time-series engine of the ZNBase database. The execution engine generates an execution plan based on SQL semantic analysis, calls the ZNBase storage layer interface, and retrieves accurate time-series historical partition compressed data. The storage layer leverages various compressed query interfaces provided by AVFS to achieve rapid query execution, accelerating database queries. Specifically: Pre-parse; Pre-reading of compressed files; Read compressed data; AVFS defines a series of file read and write interfaces, which can directly read compressed file data without decompressing the compressed file and return the data in the same format as after decompression; at the same time, AVFS supports compressed files with multiple compression algorithms and can support applications in a variety of different scenarios. ZFS is a unified encapsulation interface for reading and writing underlying files in the ZNBase storage layer, and it encapsulates the read and write operations of compressed and uncompressed files into a unified interface.

5. The method for accelerating the reading of compressed files based on a Virtual File System according to claim 4, characterized in that, The pre-parsing is as follows: The AVFS-provided open file interface generates the corresponding filename according to the AVFS filename format requirements; The `parse` function calls `open` and / or `read` to read the relevant metadata information of the compressed file and stores this metadata information in the corresponding data structure in memory as the basis for subsequent data reading.

6. The method for accelerating the reading of compressed files based on a Virtual File System according to claim 4, characterized in that, The pre-reading of compressed files is as follows: Based on the offset specified by Read, read the compressed data sequentially from the beginning of the compressed file and decompress the data into memory; Determine if the total length of the decompressed data exceeds the offset: If not, continue reading compressed data until the total length of the decompressed data exceeds the offset; If so, copy the remaining decompressed data from memory to the result set.

7. The method for accelerating the reading of compressed files based on a Virtual File System according to claim 4, characterized in that, The specific steps for reading compressed data are as follows: Copy the remaining data pre-read from the compressed file to the result set; Determine if reading has finished or the end of the file has been reached: If the number of bytes to be read has been met or the end of the compressed file has been reached, the result set is returned directly. If not, continue reading and decompressing the compressed file until the conditions are met and a result set is returned.

8. The method for accelerating the reading of compressed files based on a Virtual File System according to any one of claims 4 to 7, characterized in that, When storing time-series data, the time-series engine compresses records according to columnar storage format and stores them in data files. At the same time, it partitions and compresses the data files according to time periods to generate compressed files of historical partitions. When a query requires reading historical partition compressed data files, the compressed files can be read directly through the compression query interface provided by AVFS.

9. A computer-readable storage medium, characterized in that, The computer-readable storage medium stores a computer program that can be executed by a processor to implement the method for accelerating the reading of compressed files based on the Virtual File System as described in any one of claims 4 to 8.