Data compression
By employing a two-stage data compression method that combines software and hardware technologies, the problem of low storage efficiency in existing data compression schemes has been solved, achieving high compression ratios and high-performance data storage.
Patent Information
- Authority / Receiving Office
- WO · WO
- Patent Type
- Applications
- Current Assignee / Owner
- CLOUD INTELLIGENCE ASSETS HOLDING (SINGAPORE) PTE LTD
- Filing Date
- 2025-12-02
- Publication Date
- 2026-07-02
AI Technical Summary
Existing data compression schemes suffer from write amplification, space amplification, and read amplification issues, resulting in low storage efficiency and an inability to achieve high compression ratios for data storage while ensuring performance.
A two-stage data compression method is adopted. First, the business data is compressed according to the preset storage sector size, and then the padding data is further compressed at the hardware level. Compact storage is achieved by using a variable-length FTL layer.
It improves the data compression ratio, reduces read/write amplification and space waste, and provides a high-performance data storage solution.
Smart Images

Figure CN2025139484_02072026_PF_FP_ABST
Abstract
Description
Data compression Technical Field
[0001] This disclosure relates to the field of database technology, and in particular to data compression. Background Technology
[0002] With the development of computer technology, the amount of data generated is becoming increasingly massive. Storing such large amounts of data inevitably consumes a significant amount of storage space, leading to high storage costs. In the field of data storage, performance and cost are the core competitive advantages. How to achieve high data compression ratios and reduce storage space while ensuring high-performance data processing has become a pressing technical problem for engineers. Summary of the Invention
[0003] In view of this, embodiments of this disclosure provide a data compression method. One or more embodiments of this disclosure also relate to a data compression apparatus, an electronic device, a computer-readable storage medium, and a computer program product, to address the technical deficiencies existing in the prior art.
[0004] According to a first aspect of the present disclosure, a data compression method is provided, applied to a smart storage device, comprising: acquiring at least one business data to be processed; compressing each business data to be processed according to a preset storage sector size to obtain a storage block to be compressed corresponding to each business data to be processed, wherein the storage block to be compressed includes at least one storage sector; compressing fill data in each storage block to be compressed to obtain a target storage block, wherein the target storage block includes compressed business data corresponding to each business data to be processed, and the fill data is used to fill free space in the storage sector.
[0005] According to a second aspect of the present disclosure, a data compression apparatus is provided, applied to a smart storage device, comprising: an acquisition module configured to acquire at least one piece of business data to be processed; a first compression module configured to compress each piece of business data to be processed according to a preset storage sector size to obtain a storage block to be compressed corresponding to each piece of business data to be processed, wherein the storage block to be compressed includes at least one storage sector; and a second compression module configured to compress fill data in each storage block to be compressed to obtain a target storage block, wherein the target storage block includes compressed business data corresponding to each piece of business data to be processed, and the fill data is used to fill free space in the storage sector.
[0006] According to a third aspect of the present disclosure, an electronic device is provided, comprising: a memory and a processor; the memory is used to store computer programs / instructions, and the processor is used to execute the computer programs / instructions, wherein the computer programs / instructions, when executed by the processor, implement the steps of the above-described method.
[0007] According to a fourth aspect of the present disclosure, a computer-readable storage medium is provided that stores a computer program / instructions that, when executed by a processor, implement the steps of the above-described method.
[0008] According to a fifth aspect of the present disclosure, a computer program product is provided, including a computer program / instructions that, when executed by a processor, implement the steps of the above-described method.
[0009] According to a sixth aspect of the present disclosure, a database product is provided, including a computer program / instructions that, when executed by a processor, perform the following steps: acquiring at least one piece of business data to be processed; compressing each piece of business data to be processed according to a preset storage sector size to obtain a storage block to be compressed corresponding to each piece of business data to be processed, wherein the storage block to be compressed includes at least one storage sector; compressing fill data in each storage block to be compressed to obtain a target storage block, wherein the target storage block includes compressed business data corresponding to each piece of business data to be processed, and the fill data is used to fill free space in the storage sector.
[0010] The data compression method provided in this disclosure compresses the original business data at a preset storage sector size granularity, aligns the compressed business data at the storage sector granularity, and fills any empty spaces with padding data. During the hardware compression stage, the padding data in each storage sector is compressed, improving the data compression ratio. Attached Figure Description
[0011] Figure 1 is a flowchart of a data compression method provided in an embodiment of this disclosure.
[0012] Figure 2 is a schematic diagram of data compression provided in one embodiment of this disclosure.
[0013] Figure 3 is a schematic diagram of hardware compression provided in an embodiment of this disclosure.
[0014] Figure 4 is a schematic diagram of space reuse provided in one embodiment of this disclosure.
[0015] Figure 5 is a schematic diagram of data compression of a data compression method provided in an embodiment of this disclosure.
[0016] Figure 6 is a schematic diagram of the structure of a data compression device provided in an embodiment of this disclosure.
[0017] Figure 7 is a structural block diagram of an electronic device provided in an embodiment of this disclosure. Detailed Implementation
[0018] Numerous specific details are set forth in the following description to provide a full understanding of this disclosure. However, this disclosure can be implemented in many other ways than those described herein, and those skilled in the art can make similar extensions without departing from the spirit of this disclosure. Therefore, this disclosure is not limited to the specific implementations disclosed below.
[0019] The terminology used in one or more embodiments of this disclosure is for the purpose of describing particular embodiments only and is not intended to be limiting of the one or more embodiments of this disclosure. The singular forms “a,” “the,” and “the” as used in one or more embodiments of this disclosure and the appended claims are also intended to include the plural forms unless the context clearly indicates otherwise. It should also be understood that the term “and / or” as used in one or more embodiments of this disclosure refers to and includes any or all possible combinations of one or more associated listed items.
[0020] It should be understood that although the terms first, second, etc., may be used to describe various information in one or more embodiments of this disclosure, such information should not be limited to these terms. These terms are only used to distinguish information of the same type from one another. For example, first may also be referred to as second without departing from the scope of one or more embodiments of this disclosure, and similarly, second may also be referred to as first. Depending on the context, the word “if” as used herein may be interpreted as “when”, “in response to a determination”, or “when…”.
[0021] It should be noted that the user information (including but not limited to user device information, user personal information, etc.) and data (including but not limited to data used for analysis, data stored, data displayed, etc.) involved in this disclosure are all information and data authorized by the user or fully authorized by all parties. Furthermore, the collection, use and processing of the relevant data must comply with the relevant laws, regulations and standards of the relevant regions, and corresponding operation portals are provided for users to choose to authorize or refuse.
[0022] First, the terms and concepts involved in one or more embodiments of this disclosure will be explained.
[0023] CSD: an abbreviation for computational storage drives, which are hard drives with computing capabilities that can offload compressed data organization and computation to the disk.
[0024] LSM-Tree: short for log-structured merge-tree, is a storage structure that uses sequential writes to improve performance.
[0025] Compression ratio: The ratio between the size of the data before compression and the size of the data after compression.
[0026] Sector size: The smallest unit of data storage that can be written to a hard drive, typically 4096 bytes in solid-state drives.
[0027] With the development of computer technology, the amount of data generated is also increasing. This necessitates the use of substantial storage space. Performance and cost are the core competitive advantages of data storage. Data storage must provide high compression ratios while maintaining controllable performance impact, and it's also crucial to flexibly choose between performance and compression ratios for different business scenarios.
[0028] Current data compression schemes include LSM-Tree-like compression schemes, where compressed data is compactly arranged and appended to disk. This requires periodic garbage collection of invalid data blocks. By retrieving valid data from a block and appending it to a new block, the old block is released. This compression scheme inevitably suffers from write amplification, space amplification, and read amplification. Write amplification occurs when writing the last 4KB (preset storage sector size) of non-aligned data requires in-place overwriting. For writes crossing the 4KB boundary, an additional storage sector needs to be written. For example, if the compressed data size is 6KB, and this 6KB crosses a 4KB boundary, it requires two storage sectors (8KB) for storage. Space amplification occurs because invalid data cannot be promptly reclaimed or reused. Only periodic garbage collection removes valid data for reuse, resulting in lower overall compression. Read amplification occurs when non-4KB aligned I / O writes require an additional 4KB of data to be read for data crossing 4KB boundaries.
[0029] Another current data compression scheme aligns data according to the storage sector size. For example, with a sector size of 4KB, the compressed data is aligned upwards according to the sector size, and invalid space can be reclaimed through reuse or trim, eliminating the need for periodic garbage collection. However, this compression method has relatively poor compression performance and results in significant space waste.
[0030] Based on this, a data compression method is provided in this disclosure. This disclosure also relates to a data compression apparatus, an electronic device, a computer-readable storage medium, and a computer program product, which will be described in detail in the following embodiments.
[0031] Referring to Figure 1, Figure 1 shows a flowchart of a data compression method according to an embodiment of the present disclosure, which is applied to a smart storage device and specifically includes the following steps.
[0032] Step 102: Obtain at least one piece of business data to be processed.
[0033] In this context, intelligent storage devices are storage devices with data processing capabilities. The business data to be processed can be understood as business data that needs to be persistently stored, i.e., business data that needs to be written to the hard disk. In one or more specific embodiments provided in this disclosure, the business data to be processed can be understood as raw business data that has not yet been compressed. The business data to be processed can be data corresponding to any business scenario, such as image data in an image processing scenario, text data in a text processing scenario, audio data in a speech processing scenario, etc. The method provided in the embodiments of this disclosure does not limit the specific business scenario of the business data to be processed.
[0034] There are many ways to obtain at least one piece of business data to be processed. It can be obtained from the Internet, transmitted from other terminals, or obtained from a removable storage medium connected to the current terminal. In the embodiments provided in this disclosure, the method of obtaining the business data to be processed is not limited, and the actual application shall prevail.
[0035] In the specific embodiments provided in this disclosure, taking image processing services as an example for explanation, a user operates the current terminal to access the Internet and can obtain multiple images to be processed from the Internet. These multiple images to be processed are the data to be processed in the service.
[0036] Step 104: Compress each business data to be processed according to the preset storage sector size to obtain the storage block to be compressed corresponding to each business data to be processed, wherein the storage block to be compressed includes at least one storage sector.
[0037] The preset storage sector size can be understood as sector size. Sector size is a basic concept in disk storage. A sector is the smallest unit of physical storage on a hard disk. Each sector contains a certain amount of data, which is stored contiguously on the disk. The sector size determines the minimum amount of data for disk read and write operations. The preset storage sector size is the sector size. For example, in a specific embodiment provided in this disclosure, the preset storage sector size is 4KB, meaning that one storage sector can store 4KB of data.
[0038] Typically, the preset storage sector size of a terminal's storage device is fixed. During the data writing process, data is sequentially written to consecutive storage sectors, thus obtaining the storage blocks to be compressed corresponding to each piece of business data to be processed. These storage blocks to be compressed can be understood as a collection of storage sectors used to store the compressed business data to be processed.
[0039] For example, referring to Figure 2, which illustrates the data compression method provided in this disclosure. As shown in Figure 2, the explanation is based on an example of 10KB of business data to be processed and a preset storage sector size of 4KB. After data compression, the business data to be processed can be compressed to 7KB. This 7KB data can be stored in two storage sectors, which together form the storage block to be compressed corresponding to the business data to be processed. This storage block includes two storage sectors, which store the compressed data corresponding to the business data to be processed.
[0040] Specifically, the process of compressing each piece of business data to be processed according to a preset storage sector size to obtain the corresponding storage block to be compressed includes the following steps.
[0041] S1042. Determine the target business data to be processed, wherein the target business data to be processed is any one of at least one business data to be processed.
[0042] In practical applications, there may be one, two, or more business data to be processed. To better explain the method provided in this embodiment, this explanation uses one piece of business data to be processed as an example. That is, a target piece of business data to be processed is selected from at least one piece of business data to be processed for explanation.
[0043] S1044. Compress the target business data to be processed based on a preset compression algorithm to obtain compressed business data, and obtain the compression capacity information corresponding to the compressed business data.
[0044] The preset compression algorithm can be understood as a commonly used data compression algorithm, such as the Zstandard algorithm, the LZ4 algorithm (Lempel-Ziv-Markov chain Algorithm 4), the GZIP algorithm (GNU zip), and the BZIP2 algorithm (Burrows-Wheeler Transform-based compression algorithm). In the method provided in the embodiments of this disclosure, different preset compression algorithms can be flexibly selected according to the data characteristics and performance requirements.
[0045] Based on the data characteristics of the target business data to be processed, a corresponding preset compression algorithm is selected. The target business data to be processed is compressed using the preset compression algorithm to obtain compressed business data, and at the same time, the compression capacity information corresponding to the compressed business data is obtained. The compression capacity information can be understood as the data size of the compressed business data.
[0046] For example, let's take a target business data A with a size of 16KB as an example for explanation. By using a preset compression algorithm, the data is compressed to obtain compressed business data A1. The compressed capacity of business data A1 is 5KB, meaning that after data compression, the target business data A's size is reduced from 16KB to 5KB.
[0047] For example, taking a target business data B with a size of 16KB as an example, it is compressed using a preset compression algorithm to obtain compressed business data B1, which has a compression capacity of 10KB. That is, after data compression, the target business data B is compressed from 16KB to 10KB.
[0048] S1046. Determine at least one target storage sector corresponding to the compressed service data based on the compressed capacity information and the preset storage sector size.
[0049] After the above processing, at least one target storage sector can be determined for storing the compressed business data based on the compression capacity information and the preset storage sector size. The preset storage sector size is the size of the storage sector used for storing data as specified by the system. The target storage sector is the storage sector used to store the compressed business data.
[0050] In the method provided in this embodiment, after determining the compression capacity information corresponding to the compressed service data, and combining it with the preset storage sector size, it is possible to determine how many target storage sectors are needed to store the compressed service data.
[0051] For example, continuing with the previous example, assuming the compressed capacity of compressed business data A1 is 5KB, and the preset storage sector size is 4KB, then we can determine that two target storage sectors are needed to store compressed business data A1. That is, two target storage sectors are required to store compressed business data A1.
[0052] For example, if the compressed capacity of compressed business data B1 is 10KB, and the preset storage sector size is 4KB, then the target storage sectors for storing compressed business data B1 are determined to be 3, that is, three target storage sectors are needed to store compressed business data B1.
[0053] S1048. Save the compressed service data to the at least one target storage sector, and determine the at least one target storage sector as the storage block to be compressed corresponding to the target service data to be processed.
[0054] After determining at least one target storage sector corresponding to the compressed business data, data writing can be performed to write the compressed business data into the corresponding at least one target storage sector. For ease of subsequent understanding and description, the at least one target storage sector used to store the compressed business data is determined as the storage block to be compressed corresponding to the target business data to be processed, and is also the storage block to be compressed corresponding to the compressed business data.
[0055] In practical applications, during the process of writing compressed business data to the target storage sector, since the capacity of the target storage sector is fixed (usually 4KB; in this disclosure, the preset storage sector size is not limited and is subject to actual application), and the compressed business data may not completely fill the space in the target storage sector, there is still a possibility of data compression for the target storage sector that is not completely filled. Thus, the target storage sector used to store the compressed business data is determined as the storage block to be compressed.
[0056] In one specific embodiment provided in this disclosure, saving the compressed service data to the at least one target storage sector includes: saving the compressed service data to at least one target storage sector in storage order; and writing fill data into the empty space when it is detected that there is free space in the last target storage sector.
[0057] In practical applications, the storage block to be compressed consists of compressed business data and filler data. Furthermore, the compressed business data is first written sequentially to at least one target storage sector according to storage order. Typically, this is done in the order of the target storage sectors. After all the compressed business data has been written to the target storage sectors, it is necessary to check if there is any free space in the last target storage sector. If so, filler data is written into the free space.
[0058] The padding data can be understood as the data that was compressed during hardware compression in subsequent processing. In a specific embodiment provided in this disclosure, the padding data can be 0 or 1. Using padding data to fill empty spaces facilitates hardware compression in subsequent processes, thereby further improving the compression ratio.
[0059] For example, taking the compressed capacity of business data A1 as 5KB, and the preset storage sector size as 4KB, we can determine that there are two target storage sectors for storing compressed business data A1. The 5KB compressed business data A1 is stored in two target storage sectors. The first target storage sector stores 4KB of the data, and the second target storage sector stores 1KB of the data. There are then 3KB of free space in the second target storage sector. We can write filler data (0s) into this 3KB of free space.
[0060] After performing the compression process described above on each piece of pending business data, a storage block to be compressed corresponding to each piece of business data can be obtained. Each storage block contains at least one storage sector, and the last storage sector in each block may contain padding data. This padding data can be compressed in subsequent hardware compression to further improve the compression ratio.
[0061] Through the above processing, we can obtain compressed storage blocks for storing each piece of pending business data; more specifically, we can obtain compressed storage blocks for storing the corresponding compressed business data for each piece of pending business data. Then, hardware compression can be used to further compress the data in these compressed storage blocks.
[0062] Hardware data compression is a technology that uses dedicated hardware to compress data, aiming to reduce the size of stored data, save storage space, and improve data transmission efficiency.
[0063] Hardware data compression reduces data size based on data redundancy and pattern recognition. Through specific hardware circuits and algorithms, data is transformed into a more compact representation. This can significantly reduce the space required for storage and transmission without compromising data integrity and availability.
[0064] During hardware-based data compression, padding data within the storage blocks to be compressed can be compressed, writing the data to the hardware medium in a more compact layout and further improving the compression ratio. Therefore, after obtaining each storage block to be compressed, it is necessary to further identify whether padding data exists in each block, as well as the location and size of the padding data, to avoid compressing the actual business data.
[0065] In one specific embodiment provided in this disclosure, identifying the fill data in each storage block to be compressed includes: determining a target storage block to be compressed, wherein the target storage block to be compressed is any one of the storage blocks to be compressed; obtaining storage metadata corresponding to the target storage block to be compressed, and determining the compressed service data corresponding to the target storage block to be compressed based on the storage metadata; and determining the fill data corresponding to the target storage block to be compressed based on the target storage block to be compressed and the compressed service data.
[0066] In practical applications, each compressed service data item has corresponding metadata stored together with the compressed service data in the storage block to be compressed. For better explanation, this embodiment uses a target storage block to be compressed as an example, where the target storage block to be compressed is any one of the various storage blocks to be compressed.
[0067] After identifying the target storage block to be compressed, its corresponding storage metadata can be obtained from it. This storage metadata consists of the metadata corresponding to the compressed service data within the target storage block, including the start and end positions of the compressed service data within that block. This storage metadata allows us to determine the compressed service data within the target storage block.
[0068] Once the compressed service data is determined, the filling data in the target storage block to be compressed can be determined based on the compressed service data and the target storage block to be compressed.
[0069] For example, if the target storage block to be compressed includes two storage sectors, the size of the compressed business data can be determined to be 5KB using storage metadata. Then, the filling data can be further identified as 3KB of data in the last storage sector.
[0070] Step 106: Compress the fill data in each storage block to be compressed to obtain the target storage block, wherein the target storage block includes compressed service data corresponding to each service data to be processed, and the fill data is used to fill the free space in the storage sector.
[0071] The method provided in this disclosure is applied to intelligent storage devices with data processing capabilities, such as Smart SSDs (intelligent solid-state drives). A Smart SSD is a solid-state drive that integrates data processing functions, designed to improve system performance and energy efficiency. It incorporates an FPGA (Field-Programmable Gate Array), supporting high-speed computation at the data storage location, thereby improving data processing speed and efficiency.
[0072] Smart SSDs integrate memory and computing power into flash memory devices, providing high-performance compression capabilities through built-in FPGA cards or ASIC acceleration chips. In a specific embodiment provided in this disclosure, the data filling in each storage block to be compressed is compressed through a data translation layer in the smart storage device. Specifically, this can be achieved by modifying the FTL (Flash Translation Layer) to support variable-length data organization, allowing the compressed data to be stored compactly on the NAND medium.
[0073] FTL stands for Flash Translation Layer, which translates or maps the host's (or user's, host's) logical address space to the flash memory's physical address space. Each time an SSD writes user logical data into the flash memory address space, it records the mapping relationship between that logical address and the physical address. When the host wants to read that data, the SSD uses this mapping to read the data from the flash memory and then returns it to the user. FTL has the ability to organize data; in the method provided in this disclosure, hardware compression of data can be performed using FTL.
[0074] In the steps described above, compressed business data is written to storage sectors of a storage device with data processing capabilities. In this current step, the data processing capabilities of the storage device are used to compress the fill data in each storage sector, thereby achieving data compression on the hardware level. After the fill data in each storage block to be compressed is compressed, the target storage block can be obtained.
[0075] In one specific embodiment provided in this disclosure, compressing the padding data in each storage block to be compressed to obtain a target storage block includes: determining a first storage block to be compressed and a second storage block to be compressed according to the storage block order of each storage block to be compressed; if the first storage block to be compressed contains first padding data, compressing the first padding data and concatenating the compressed service data in the second storage block to be compressed with the compressed service data in the first storage block to be compressed to obtain a reference storage block; using the reference storage block as the first storage block to be compressed, using the next storage block to be compressed as the second storage block to be compressed, and continuing to perform the operation of compressing the first padding data and concatenating the compressed service data in the second storage block to be compressed with the compressed service data in the first storage block to be compressed to obtain a reference storage block if the first padding data exists in the first storage block to be compressed; and determining the reference storage block after the last storage block to be compressed is the target storage block.
[0076] In this embodiment, the compression of the data filling each storage block to be compressed is further explained. The storage locations of each storage block to be compressed in the Smart SSD have a specific order. The first and second storage blocks to be compressed are determined based on their storage block order. The first storage block to be compressed can be understood as the one with the earlier storage block order, and the second storage block to be compressed can be understood as the one with the later storage block order.
[0077] It should be noted that, in the method provided in this disclosure embodiment, compressing each storage block to be compressed refers to the process of compressing the filling data in each storage block to be compressed.
[0078] After determining the first storage block to be compressed, it is determined whether there is first padding data in the first storage block to be compressed. The first padding data can be understood as the padding data in the first storage block to be compressed. If the first padding data exists, it can be compressed by hardware, and then the data in the second storage block to be compressed is concatenated with the compressed service data in the first storage block to be compressed. A reference storage block is obtained, which can be understood as the storage block obtained after the first padding data in the first storage block to be compressed is compressed, and then the compressed service data in the first storage block to be compressed is concatenated with the second storage block to be compressed.
[0079] Determine if the second block to be compressed is the last block to be compressed. If not, designate the reference block as the new first block to be compressed, designate the next block to be compressed as the second block to be compressed, and continue the above process. If the second block to be compressed is the last block to be compressed, then the reference block can be compressed, and the reference block after the last block to be compressed is designated as the target block.
[0080] Referring to Figure 3, which illustrates a hardware compression schematic of an embodiment of this disclosure, the explanation uses three storage blocks to be compressed as an example. Specifically, storage block 1 is used as an example. Storage block 1 has two target storage sectors, and the size of the compressed service data is 7KB, therefore, storage block 1 contains 1KB of fill data. Storage block 1 is designated as the first storage block to be compressed, and storage block 2 is designated as the second storage block to be compressed. The fill data in storage block 1 is compressed and released. The compressed service data in storage block 1 is then concatenated with the compressed service data in storage block 1 to obtain reference storage block 1.
[0081] If the storage block to be compressed, 2, is not the last storage block to be compressed, then the reference storage block 1 is used as the new first storage block to be compressed, and the storage block to be compressed, 3, is used as the second storage block to be compressed. The fill data in the new first storage block to be compressed (i.e., the fill data in storage block to be compressed, 2) is compressed, and the compressed business data in storage block to be compressed, 3, is concatenated with the compressed business data in reference storage block 1 to obtain reference storage block 2.
[0082] At this point, storage block 3 is the last storage block to be compressed. Then, the padding data in reference storage block 2 (i.e., the padding data in storage block 3) is compressed to obtain the final target storage block.
[0083] It should be noted that the above is only an example of one hardware compression method. In practical applications, each storage block to be compressed can be compressed in parallel to fill the data, thereby writing it onto the hardware medium with a more compact layout and further improving the compression ratio.
[0084] The data compression method provided in this disclosure employs a two-stage compression process: the first stage uses software compression, and the second stage, based on the software compression, performs further data compression via hardware. This combined software and hardware approach provides a data compression solution with high compression ratios, high performance, and low read / write amplification.
[0085] In the software compression stage, the original business data is compressed at a preset storage sector size. The compressed business data is then aligned to the storage sector size, and any empty spaces are filled with padding data. In the hardware compression stage, the padding data in each storage sector is compressed, and variable-length FTL is fully utilized to make the data compactly arranged. This improves the data compression ratio.
[0086] In another specific embodiment provided in this disclosure, the method further includes: receiving a data adjustment instruction, wherein the data adjustment instruction includes data identification information; determining a reference storage block to be compressed based on the data identification information, wherein the reference storage block to be compressed stores compressed service data corresponding to the data identification information; and adjusting the compressed service data in the reference storage block to be compressed.
[0087] Before hardware compression but after software compression, data adjustments still occur. Specifically, in this embodiment, a data adjustment instruction for business data is received. This instruction carries data identification information, which determines which part of the business data is being adjusted.
[0088] Upon receiving a data adjustment instruction, the reference storage block to be compressed can be found based on the data identification information in the instruction. The reference storage block to be compressed can be understood as the storage block to be compressed that contains the compressed business data corresponding to the data identification information.
[0089] Once the reference storage block to be compressed is determined, the compressed business data stored in that reference storage block can be adjusted.
[0090] In practical applications, data adjustment commands can be data deletion commands, data update commands, etc. If the size of the adjusted compressed business data still meets the storage space requirements of the reference storage block to be compressed, it can be directly stored in the original reference storage block. If the size of the adjusted compressed business data does not meet the storage space requirements of the reference storage block to be compressed, or if a deletion command is received and the compressed business data in the reference storage block to be compressed is deleted, then the data in the reference storage block to be compressed is in an invalid state, meaning that there is invalid space in the reference storage block to be compressed. In this case, to better utilize this space, relevant processing can be performed on the invalid space to improve space utilization.
[0091] In another specific embodiment provided in this disclosure, the method further includes: obtaining reference storage space information of the reference storage block to be compressed; determining reference storage information based on the reference storage space information, and determining the compressed data to be written based on the reference storage information; and writing the compressed data to be written into the reference storage block to be compressed.
[0092] In this embodiment, if it is determined that there is invalid space in the reference storage block to be compressed, reference storage space information of the reference storage block to be compressed can be obtained. The reference storage space information can be understood as storage information about the reference storage block to be compressed. For example, it may include the number of storage sectors in the reference storage block to be compressed, or the storage space of the reference storage block to be compressed.
[0093] After determining the reference storage space information, the reference storage information that can be written to the storage block to be compressed can be determined based on this information. This reference storage information is then used to further determine the compressed data to be written to the reference storage block to be compressed. The compressed data to be written can be understood as new compressed service data.
[0094] It should be noted that the compressed data to be written here refers to compressed service data that conforms to the reference storage information corresponding to the reference storage space information. In a specific embodiment provided in this disclosure, determining the reference storage information based on the reference storage space information includes: determining the number of reference storage sectors or the reference storage space range based on the reference storage space information; correspondingly, determining the compressed data to be written based on the reference storage information includes: determining the compressed data to be written based on the number of reference storage sectors or the reference storage space range.
[0095] In this embodiment, as described above, the reference storage information may be the number of reference storage sectors or the reference storage space range.
[0096] The reference storage sector number refers to the number of storage sectors corresponding to the reference storage block to be compressed, and the reference storage space range refers to the range of data size that the reference storage block to be compressed can hold.
[0097] For example, if there are 3 reference memory sectors in the reference memory block to be compressed, and each reference memory sector is 4KB in size, then the reference memory space information could be 3 reference memory sectors, with a reference memory space range of 8KB-12KB.
[0098] The compressed data to be written can be determined based on either the number of reference storage sectors or the reference storage space range.
[0099] Specifically, determining the compressed data to be written based on the number of reference storage sectors or the reference storage space range includes: determining compressed service data whose data size meets the reference storage space range as compressed data to be written; or, determining compressed service data whose number is less than or equal to the number of reference storage sectors and whose data size is less than or equal to a preset storage sector size as compressed data to be written.
[0100] Taking a reference storage space range as an example, we can determine that compressed business data whose size meets the requirements of that reference storage space range is the data to be written for compression. For example, continuing with the previous example, if the reference storage space range is 8KB-12KB, we can determine that compressed business data with a size of 8KB-12KB is the data to be written for compression.
[0101] Taking the number of reference storage sectors as an example, compressed business data whose quantity is less than or equal to the number of reference storage sectors and whose data size is less than or equal to the preset storage sector size is identified as the compressed data to be written. For example, continuing with the previous example, if the reference storage space information is 3 reference storage sectors, then 3 compressed business data with a capacity of less than or equal to 4KB can be identified as the compressed data to be written and written to the three reference storage sectors respectively.
[0102] Referring to Figure 4, which illustrates a schematic diagram of space reuse according to an embodiment of this disclosure, the shaded area represents a reference storage block to be compressed, comprising three reference storage sectors with a capacity of 12KB. Alternatively, one method can be chosen as shown in Figure 4, i.e., writing compressed service data with a capacity greater than 8KB and less than 12KB; or another method can be chosen as shown in Figure 4, i.e., writing two compressed service data, one with a size greater than 4KB and less than 8KB, and the other with a size less than or equal to 4KB.
[0103] In another specific embodiment provided in this disclosure, the method further includes: receiving a storage release request for the reference storage block to be compressed, and releasing the storage space corresponding to the reference storage block to be compressed based on the storage release request; or, writing fill data into the reference storage block to be compressed.
[0104] In the method provided in this disclosure, in addition to reusing the reference storage block to be compressed, it can also be released at the physical hardware level. Specifically, a storage release request for the reference storage block to be compressed can be received. The storage space corresponding to the reference storage block to be compressed can be released based on the storage release request. Physical space on the physical disk can be released by sending a trim request. Alternatively, padding data can be written into the reference storage block to be compressed, allowing it to be compressed in a subsequent hardware compression stage.
[0105] Whether using the trim method, the padding data writing method, or the space reuse method, none of these methods require data movement, resulting in minimal read / write operations and a simple mechanism. Space reuse minimizes fragmentation, and existing fragments can be reclaimed by releasing or writing padding data. This simplifies the data compression process and improves the compression ratio.
[0106] The data compression method provided by the present disclosure will be further explained below with reference to Figure 5, which shows a schematic diagram of data compression provided by an embodiment of the present disclosure.
[0107] As shown in Figure 5, the intelligent storage device acquires three data points to be processed, each 16KB in size. The device compresses these three data points using a pre-defined compression algorithm corresponding to each data point, resulting in three compressed data points. These compressed data points are then written to the storage sectors of the intelligent storage device. Specifically, the first compressed data point is 5KB in size, and its corresponding storage block 1 occupies two storage sectors; the second compressed data point is 10KB in size, and its corresponding storage block 2 occupies three storage sectors; and the third compressed data point is 3KB in size, and its corresponding storage block 3 occupies one storage sector.
[0108] The intelligent storage device performs hardware-level data compression on the filling data in the three storage blocks to be compressed, thereby merging the three compressed business data into one, resulting in the three compressed business data occupying only 5 storage sectors. Compared to the 6 storage sectors occupied by software compression and the 12 storage sectors occupied by the original business data to be processed, this significantly saves disk space and improves the data compression rate.
[0109] The data compression method provided in this embodiment creates a prerequisite for a high compression ratio through large-block compression in the software compression stage; the compressed data is aligned to 4KB, which simplifies data organization, reduces data organization costs, and minimizes read / write amplification and space amplification; the entire solution does not require periodic garbage collection, achieving high performance; the aligned data is padded with zeros (filled data), allowing software and hardware compression to work closely together, with the hardware compression layer pressing out the filled data to obtain a better compression ratio; the software compression layer allows for free selection of compression algorithms and compression granularity, providing greater flexibility, and parameters and algorithms can be selected according to performance requirements.
[0110] Corresponding to the above method embodiments, this disclosure also provides a data compression device embodiment. Figure 6 shows a schematic diagram of the structure of a data compression device provided in one embodiment of this disclosure. As shown in Figure 6, the device is applied to an intelligent storage device and includes: an acquisition module 602 configured to acquire at least one business data to be processed; a first compression module 604 configured to compress each business data to be processed according to a preset storage sector size to obtain a storage block to be compressed corresponding to each business data to be processed, wherein the storage block to be compressed includes at least one storage sector; and a second compression module 606 configured to compress the fill data in each storage block to be compressed to obtain a target storage block, wherein the target storage block includes compressed business data corresponding to each business data to be processed, and the fill data is used to fill the free space in the storage sector.
[0111] Optionally, the first compression module 604 is further configured to: determine target business data to be processed, wherein the target business data to be processed is any one of at least one business data to be processed; compress the target business data to be processed based on a preset compression algorithm to obtain compressed business data, and obtain compression capacity information corresponding to the compressed business data; determine at least one target storage sector corresponding to the compressed business data according to the compression capacity information and the preset storage sector size; save the compressed business data to the at least one target storage sector, and determine the at least one target storage sector as the storage block to be compressed corresponding to the target business data to be processed.
[0112] Optionally, the first compression module 604 is further configured to: save the compressed service data to at least one target storage sector in storage order; and write fill data into the empty space if free space is detected in the last target storage sector.
[0113] Optionally, the second compression module 606 is further configured to: determine a first storage block to be compressed and a second storage block to be compressed according to the storage block order of each storage block to be compressed; if there is first padding data in the first storage block to be compressed, compress the first padding data, and concatenate the compressed service data in the second storage block to be compressed with the compressed service data in the first storage block to be compressed to obtain a reference storage block; use the reference storage block as the first storage block to be compressed, use the next storage block to be compressed as the second storage block to be compressed, and continue to perform the operation of compressing the first padding data and concatenating the compressed service data in the second storage block to be compressed with the compressed service data in the first storage block to be compressed to obtain a reference storage block if there is first padding data in the first storage block to be compressed; and determine the reference storage block after the last storage block to be compressed is the target storage block.
[0114] Optionally, the device further includes an adjustment module configured to: receive a data adjustment instruction, wherein the data adjustment instruction includes data identification information; determine a reference storage block to be compressed based on the data identification information, wherein the reference storage block to be compressed stores compressed service data corresponding to the data identification information; and adjust the compressed service data in the reference storage block to be compressed.
[0115] Optionally, the device further includes a rewrite module configured to: acquire reference storage space information of the reference storage block to be compressed; determine reference storage information based on the reference storage space information, and determine the compressed data to be written based on the reference storage information; and write the compressed data to be written to the reference storage block to be compressed.
[0116] Optionally, the rewrite module is further configured to: determine the number of reference storage sectors or the reference storage space range based on the reference storage space information; and determine the compressed data to be written based on the number of reference storage sectors or the reference storage space range.
[0117] Optionally, the rewrite module is further configured to: determine compressed service data whose data size meets the reference storage space range as compressed data to be written; or, determine compressed service data whose number is less than or equal to the number of reference storage sectors and whose data size is less than or equal to the preset storage sector size as compressed data to be written.
[0118] Optionally, the device further includes a release module configured to: receive a storage release request for the reference storage block to be compressed, and release the storage space corresponding to the reference storage block to be compressed based on the storage release request; or, write fill data into the reference storage block to be compressed.
[0119] The data compression apparatus provided in this disclosure employs a two-stage data compression process. The first stage uses software compression, and the second stage, based on the software compression, performs further data compression via hardware. This combined hardware and software approach provides a data compression solution with high compression ratios, high performance, and low read / write amplification.
[0120] In the software compression stage, the original business data is compressed at a preset storage sector size. The compressed business data is then aligned to the storage sector size, and any empty spaces are filled with padding data. In the hardware compression stage, the padding data in each storage sector is compressed, and variable-length FTL is fully utilized to make the data compactly arranged. This improves the data compression ratio.
[0121] The above is an illustrative scheme of a data compression device according to this embodiment. It should be noted that the technical solution of this data compression device and the technical solution of the data compression method described above belong to the same concept. For details not described in detail in the technical solution of the data compression device, please refer to the description of the technical solution of the data compression method described above.
[0122] Figure 7 shows a structural block diagram of an electronic device 700 according to an embodiment of this application. The components of the electronic device 700 include, but are not limited to, a memory 710 and a processor 720. The processor 720 is connected to the memory 710 via a bus 730, and a database 750 is used to store data.
[0123] Electronic device 700 also includes access device 740, which enables electronic device 700 to communicate via one or more networks 760. Examples of these networks include Public Switched Telephone Network (PSTN), Local Area Network (LAN), Wide Area Network (WAN), Personal Area Network (PAN), or combinations of communication networks such as the Internet. Access device 740 may include one or more of any type of wired or wireless network interface (e.g., network interface controller (NIC)), such as an IEEE 802.11 Wireless Local Area Network (WLAN) wireless interface, a Wi-MAX (Worldwide Interoperability for Microwave Access) interface, an Ethernet interface, a Universal Serial Bus (USB) interface, a cellular network interface, a Bluetooth interface, a Near Field Communication (NFC) interface, and so on.
[0124] In one embodiment of this application, the aforementioned components of the electronic device 700, as well as other components not shown in FIG. 7, may be interconnected, for example, via a bus. It should be understood that the electronic device structural block diagram shown in FIG. 7 is merely for illustrative purposes and is not intended to limit the scope of this application. Those skilled in the art can add or replace other components as needed.
[0125] Electronic device 700 can be any type of stationary or mobile electronic device, including mobile computers or mobile electronic devices (e.g., tablet computers, personal digital assistants, laptop computers, notebook computers, netbooks, etc.), mobile phones (e.g., smartphones), wearable electronic devices (e.g., smartwatches, smart glasses, etc.) or other types of mobile devices, or stationary electronic devices such as desktop computers or personal computers (PCs). Electronic device 700 can also be a mobile or stationary server.
[0126] The processor 720 is used to execute the following computer program / instructions, which, when executed by the processor, implement the steps of the above-described data compression method.
[0127] The above is an illustrative scheme of an electronic device according to this embodiment. It should be noted that the technical solution of this electronic device and the technical solution of the data compression method described above belong to the same concept. For details not described in detail in the technical solution of the electronic device, please refer to the description of the technical solution of the data compression method described above.
[0128] An embodiment of this disclosure also provides a computer-readable storage medium storing a computer program / instructions that, when executed by a processor, implement the steps of the above-described data compression method.
[0129] The various embodiments in this disclosure are described in a progressive manner. Similar or identical parts between embodiments can be referred to mutually. Each embodiment focuses on describing the differences from other embodiments. In particular, the computer-readable storage medium embodiments are basically similar to the data compression method embodiments, so the description is relatively simple; relevant parts can be referred to the description of the data compression method embodiments.
[0130] An embodiment of this disclosure also provides a computer program product, including a computer program / instructions that, when executed by a processor, implement the steps of the above-described data compression method.
[0131] The above is an illustrative scheme of a computer program product according to this embodiment. It should be noted that the technical solution of this computer program product and the technical solution of the data compression method described above belong to the same concept. For details not described in detail in the technical solution of the computer program product, please refer to the description of the technical solution of the data compression method described above.
[0132] One embodiment of this disclosure also provides a database product, including a computer program / instructions. When executed by a processor, the computer program / instructions perform the following steps: acquiring at least one business data to be processed; compressing each business data to be processed according to a preset storage sector size to obtain a storage block to be compressed corresponding to each business data to be processed, wherein the storage block to be compressed includes at least one storage sector; compressing the fill data in each storage block to be compressed to obtain a target storage block, wherein the target storage block includes compressed business data corresponding to each business data to be processed, and the fill data is used to fill the free space in the storage sector.
[0133] The above is an illustrative scheme of a database product according to this embodiment. It should be noted that the technical solution of this database product and the technical solution of the data compression method described above belong to the same concept. For details not described in detail in the technical solution of the database product, please refer to the description of the technical solution of the data compression method described above.
[0134] The foregoing has described specific embodiments of this disclosure. Other embodiments are within the scope of the appended claims. In some cases, the actions or steps recited in the claims may be performed in a different order than that shown in the embodiments and may still achieve the desired results. Furthermore, the processes depicted in the drawings do not necessarily require the specific or sequential order shown to achieve the desired results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.
[0135] The computer instructions include computer program code, which may be in the form of source code, object code, executable file, or certain intermediate forms. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording media, USB flash drive, portable hard drive, magnetic disk, optical disk, computer memory, read-only memory (ROM), random access memory (RAM), electrical carrier signals, telecommunication signals, and software distribution media, etc. It should be noted that the content included in the computer-readable medium may be appropriately added or removed according to the requirements of patent practice. For example, in some regions, according to patent practice, computer-readable media may not include electrical carrier signals and telecommunication signals.
[0136] It should be noted that the above description describes specific embodiments of this disclosure. Other embodiments are within the scope of the appended claims. In some cases, the actions or steps recorded in the claims can be performed in a different order than that shown in the embodiments and still achieve the desired results. Furthermore, the processes depicted in the drawings do not necessarily require a specific or sequential order to achieve the desired results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous. Secondly, those skilled in the art should also understand that the embodiments described in the specification are preferred embodiments, and the actions and modules involved are not necessarily essential to the embodiments of this disclosure.
[0137] In the above embodiments, the descriptions of each embodiment have different focuses. For parts not described in detail in a certain embodiment, please refer to the relevant descriptions of other embodiments.
[0138] The preferred embodiments disclosed above are merely illustrative of this disclosure. The optional embodiments do not exhaustively describe all details, nor do they limit the invention to the specific implementations described. Clearly, many modifications and variations can be made based on the embodiments of this disclosure. These embodiments are selected and specifically described in this disclosure to better explain the principles and practical applications of the embodiments of this disclosure, thereby enabling those skilled in the art to better understand and utilize this disclosure. This disclosure is limited only by the claims and their full scope and equivalents.
Claims
1. A data compression method, applied to an intelligent storage device, comprising: Obtain at least one piece of pending business data; Compress each business data to be processed according to the preset storage sector size to obtain the storage block to be compressed corresponding to each business data to be processed, wherein the storage block to be compressed includes at least one storage sector. Compress the fill data in each storage block to be compressed to obtain a target storage block, wherein the target storage block includes compressed service data corresponding to each service data to be processed, and the fill data is used to fill the free space in the storage sector.
2. The method as described in claim 1, wherein each piece of pending service data is compressed according to a preset storage sector size to obtain a storage block to be compressed corresponding to each piece of pending service data, comprising: Determine the target business data to be processed, wherein the target business data to be processed is any one of at least one business data to be processed; The target business data to be processed is compressed based on a preset compression algorithm to obtain compressed business data, and the compression capacity information corresponding to the compressed business data is obtained. Determine at least one target storage sector corresponding to the compressed service data based on the compressed capacity information and the preset storage sector size; The compressed service data is saved to the at least one target storage sector, and the at least one target storage sector is determined to be the storage block to be compressed corresponding to the target service data to be processed.
3. The method as described in claim 2, wherein saving the compressed service data to the at least one target storage sector comprises: The compressed business data is saved to at least one target storage sector in the order of storage. If free space is detected in the last target storage sector, fill the free space with data.
4. The method as described in claim 1, compressing the padding data in each storage block to be compressed to obtain the target storage block, includes: The first and second storage blocks to be compressed are determined according to the storage block order of each storage block to be compressed. If the first padding data exists in the first storage block to be compressed, the first padding data is compressed, and the compressed service data in the second storage block to be compressed is concatenated with the compressed service data in the first storage block to be compressed to obtain a reference storage block; The reference storage block is used as the first storage block to be compressed, the next storage block to be compressed is used as the second storage block to be compressed, and the operation of compressing the first padding data when the first storage block to be compressed contains the first padding data is continued, and the compressed service data in the second storage block to be compressed is concatenated with the compressed service data in the first storage block to be compressed to obtain the reference storage block is performed. The reference block after the last block to be compressed is determined as the target block.
5. The method as described in claim 1, wherein compressing the padding data in each storage block to be compressed comprises: The data conversion layer in the intelligent storage device compresses the fill data in each storage block to be compressed.
6. The method of claim 1, further comprising: Receive a data adjustment instruction, wherein the data adjustment instruction includes data identification information; A reference storage block to be compressed is determined based on the data identification information, wherein the reference storage block to be compressed stores the compressed service data corresponding to the data identification information; Adjust the compressed service data in the reference storage block to be compressed.
7. The method of claim 6, further comprising: Obtain the reference storage space information of the reference storage block to be compressed; Reference storage information is determined based on the reference storage space information, and compressed data to be written is determined based on the reference storage information. The compressed data to be written is written to the reference compressed storage block.
8. The method of claim 7, wherein determining the reference storage information based on the reference storage space information includes: The number of reference storage sectors or the range of reference storage space are determined based on the reference storage space information. Accordingly, the compressed data to be written is determined based on the reference storage information, including: The compressed data to be written is determined based on the number of reference storage sectors or the reference storage space range.
9. The method of claim 8, wherein determining the compressed data to be written based on the number of reference storage sectors or the reference storage space range, comprises: Compressed business data whose size meets the reference storage space range is identified as the data to be written. or, Compressed service data whose quantity is less than or equal to the number of reference storage sectors and whose data size is less than or equal to the preset storage sector size are identified as the data to be written.
10. The method of claim 6, further comprising: Receive a storage release request for the reference storage block to be compressed, and release the storage space corresponding to the reference storage block to be compressed based on the storage release request; or, Write padding data into the reference storage block to be compressed.
11. A data compression apparatus, applied to an intelligent storage device, comprising: The acquisition module is configured to acquire at least one piece of business data to be processed. The first compression module is configured to compress each business data to be processed according to a preset storage sector size to obtain a storage block to be compressed corresponding to each business data to be processed, wherein the storage block to be compressed includes at least one storage sector. The second compression module is configured to compress the fill data in each storage block to be compressed to obtain a target storage block, wherein the target storage block includes compressed service data corresponding to each service data to be processed, and the fill data is used to fill the free space in the storage sector.
12. An electronic device, comprising: Memory and processor; The memory is used to store computer programs / instructions, and the processor is used to execute the computer programs / instructions, which, when executed by the processor, implement the steps of the method according to any one of claims 1 to 10.
13. A computer-readable storage medium storing a computer program / instructions that, when executed by a processor, implement the steps of the method according to any one of claims 1 to 10.
14. A computer program product comprising a computer program / instructions that, when executed by a processor, implement the steps of the method according to any one of claims 1 to 10.
15. A database product comprising a computer program / instructions that, when executed by a processor, perform the following steps: Obtain at least one piece of pending business data; According to a preset storage sector size, each to-be-processed service data is compressed to obtain a to-be-compressed storage block corresponding to each to-be-processed service data, wherein, The block to be compressed includes at least one storage sector; Compress the fill data in each storage block to be compressed to obtain a target storage block, wherein the target storage block includes compressed service data corresponding to each service data to be processed, and the fill data is used to fill the free space in the storage sector.
16. The database product as described in claim 15, comprising compressing each piece of pending business data according to a preset storage sector size to obtain a storage block to be compressed corresponding to each piece of pending business data, including: Determine the target business data to be processed, wherein the target business data to be processed is any one of at least one business data to be processed; The target business data to be processed is compressed based on a preset compression algorithm to obtain compressed business data, and the compression capacity information corresponding to the compressed business data is obtained. Determine at least one target storage sector corresponding to the compressed service data based on the compressed capacity information and the preset storage sector size; The compressed service data is saved to the at least one target storage sector, and the at least one target storage sector is determined to be the storage block to be compressed corresponding to the target service data to be processed.
17. The database product of claim 16, wherein saving the compressed business data to the at least one target storage sector comprises: The compressed business data is saved to at least one target storage sector in the order of storage. If free space is detected in the last target storage sector, fill the free space with data.
18. The database product of claim 15, compressing the fill data in each storage block to be compressed to obtain the target storage block, comprising: The first and second storage blocks to be compressed are determined according to the storage block order of each storage block to be compressed. If the first padding data exists in the first storage block to be compressed, the first padding data is compressed, and the compressed service data in the second storage block to be compressed is concatenated with the compressed service data in the first storage block to be compressed to obtain a reference storage block; The reference storage block is used as the first storage block to be compressed, the next storage block to be compressed is used as the second storage block to be compressed, and the operation of compressing the first padding data when the first storage block to be compressed contains the first padding data is continued, and the compressed service data in the second storage block to be compressed is concatenated with the compressed service data in the first storage block to be compressed to obtain the reference storage block is performed. The reference block after the last block to be compressed is determined as the target block.
19. The database product of claim 15, comprising compressing the fill data in each storage block to be compressed, including: The data filling data in each storage block to be compressed is compressed through the data conversion layer in the intelligent storage device.
20. The database product of claim 15, wherein the step further includes: Receive a data adjustment instruction, wherein the data adjustment instruction includes data identification information; determining a reference to-be-compressed storage block according to the data identification information, wherein the reference to-be-compressed storage block stores compressed service data corresponding to the data identification information; adjusting the compressed service data in the reference to-be-compressed storage block.