Data compression method and apparatus

By employing a two-stage data compression method, combining software and hardware technologies, the problem of insufficient compression ratio and storage efficiency in existing data storage solutions has been solved, achieving efficient data storage compression and performance improvement.

CN122308703APending Publication Date: 2026-06-30ALIBABA CLOUD COMPUTING CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
ALIBABA CLOUD COMPUTING CO LTD
Filing Date
2024-12-27
Publication Date
2026-06-30

Smart Images

  • Figure CN122308703A_ABST
    Figure CN122308703A_ABST
Patent Text Reader

Abstract

This specification provides a data compression method and apparatus. The data compression method is applied to an intelligent storage device and includes: acquiring at least one piece of business data to be processed; compressing each piece of business data to be processed according to a preset storage sector size to obtain a storage block to be compressed corresponding to each piece of business data to be processed, wherein the storage block to be compressed includes at least one storage sector; compressing the fill data in each storage block to be compressed to obtain a target storage block, wherein the target storage block includes compressed business data corresponding to each piece of business data to be processed, and the fill data is used to fill the free space in the storage sector. This method compresses the original business data at a preset storage sector size granularity, aligns the compressed business data at the storage sector granularity, and fills the empty space with fill data. In the hardware compression stage, the fill data in each storage sector is compressed, improving the data compression ratio.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] The embodiments in this specification relate to the field of database technology, and in particular to a data compression method. Background Technology

[0002] With the development of computer technology, the amount of data generated is becoming increasingly massive. Storing such large amounts of data inevitably consumes a significant amount of storage space, leading to high storage costs. In the field of data storage, performance and cost are the core competitive advantages. How to achieve high data compression ratios and reduce storage space while ensuring high-performance data processing has become a pressing technical problem for engineers. Summary of the Invention

[0003] In view of this, embodiments of this specification provide a data compression method. One or more embodiments of this specification also relate to a data compression apparatus, an electronic device, a computer-readable storage medium, and a computer program product, to address the technical deficiencies existing in the prior art.

[0004] According to a first aspect of the embodiments of this specification, a data compression method is provided, applied to a smart storage device, comprising: Obtain at least one piece of pending business data; Compress each business data to be processed according to the preset storage sector size to obtain the storage block to be compressed corresponding to each business data to be processed, wherein the storage block to be compressed includes at least one storage sector. Compress the fill data in each storage block to be compressed to obtain a target storage block, wherein the target storage block includes compressed service data corresponding to each service data to be processed, and the fill data is used to fill the free space in the storage sector.

[0005] According to a second aspect of the embodiments of this specification, a data compression apparatus is provided, applied to a smart storage device, comprising: The acquisition module is configured to acquire at least one piece of business data to be processed. The first compression module is configured to compress each business data to be processed according to a preset storage sector size to obtain a storage block to be compressed corresponding to each business data to be processed, wherein the storage block to be compressed includes at least one storage sector. The second compression module is configured to compress the fill data in each storage block to be compressed to obtain a target storage block, wherein the target storage block includes compressed service data corresponding to each service data to be processed, and the fill data is used to fill the free space in the storage sector.

[0006] According to a third aspect of the embodiments of this specification, an electronic device is provided, comprising: Memory and processor; The memory is used to store computer programs / instructions, and the processor is used to execute the computer programs / instructions, which, when executed by the processor, implement the steps of the above method.

[0007] According to a fourth aspect of the embodiments of this specification, a computer-readable storage medium is provided that stores a computer program / instructions that, when executed by a processor, implement the steps of the above-described method.

[0008] According to a fifth aspect of the embodiments of this specification, a computer program product is provided, including a computer program / instructions that, when executed by a processor, implement the steps of the above-described method.

[0009] According to a sixth aspect of the embodiments of this specification, a database product is provided, including a computer program / instructions that, when executed by a processor, perform the following steps: Obtain at least one piece of pending business data; Compress each business data to be processed according to the preset storage sector size to obtain the storage block to be compressed corresponding to each business data to be processed, wherein the storage block to be compressed includes at least one storage sector. Compress the fill data in each storage block to be compressed to obtain a target storage block, wherein the target storage block includes compressed service data corresponding to each service data to be processed, and the fill data is used to fill the free space in the storage sector.

[0010] The data compression method provided in the embodiments of this specification compresses the original business data at a preset storage sector size. The compressed business data is then aligned to the storage sector size, and any empty spaces are filled with padding data. During the hardware compression stage, the padding data in each storage sector is compressed, improving the data compression ratio. Attached Figure Description

[0011] Figure 1 This is a flowchart illustrating a data compression method provided in one embodiment of this specification; Figure 2 This is a schematic diagram illustrating data compression provided in one embodiment of this specification; Figure 3 This is a schematic diagram of hardware compression provided in one embodiment of this specification; Figure 4 This is a schematic diagram of space reuse provided in one embodiment of this specification; Figure 5 This is a schematic diagram of data compression according to an embodiment of the data compression method provided in this specification; Figure 6This is a schematic diagram of the structure of a data compression device provided in one embodiment of this specification; Figure 7 This is a structural block diagram of an electronic device provided in one embodiment of this specification. Detailed Implementation

[0012] Many specific details are set forth in the following description to provide a full understanding of this specification. However, this specification can be implemented in many other ways than those described herein, and those skilled in the art can make similar extensions without departing from the spirit of this specification. Therefore, this specification is not limited to the specific implementations disclosed below.

[0013] The terminology used in one or more embodiments of this specification is for the purpose of describing particular embodiments only and is not intended to be limiting of the one or more embodiments of this specification. The singular forms “a,” “described,” and “the” as used in one or more embodiments of this specification and the appended claims are also intended to include the plural forms unless the context clearly indicates otherwise. It should also be understood that the term “and / or” as used in one or more embodiments of this specification refers to and includes any or all possible combinations of one or more associated listed items.

[0014] It should be understood that although the terms first, second, etc., may be used to describe various information in one or more embodiments of this specification, such information should not be limited to these terms. These terms are only used to distinguish information of the same type from one another. For example, first may also be referred to as second without departing from the scope of one or more embodiments of this specification, and similarly, second may also be referred to as first. Depending on the context, the word "if" as used herein may be interpreted as "when," "when," or "in response to a determination."

[0015] It should be noted that the user information (including but not limited to user device information, user personal information, etc.) and data (including but not limited to data used for analysis, data stored, data displayed, etc.) involved in this manual are all information and data authorized by the user or fully authorized by all parties. Furthermore, the collection, use and processing of related data must comply with the relevant laws, regulations and standards of the relevant regions, and corresponding operation portals are provided for users to choose to authorize or refuse.

[0016] First, the terms and concepts used in one or more embodiments of this specification will be explained.

[0017] CSD: an abbreviation for computational storage drives, which are hard drives with computing capabilities that can offload compressed data organization and computation to the disk.

[0018] LSM-Tree: short for log-structured merge-tree, is a storage structure that uses sequential writes to improve performance.

[0019] Compression ratio: The ratio between the size of the data before compression and the size of the data after compression.

[0020] sectorsize: The storage sector size, the smallest unit of writing on a hard drive, typically 4096 bytes in solid-state drives.

[0021] With the development of computer technology, the amount of data generated is also increasing. This necessitates the use of substantial storage space. Performance and cost are the core competitive advantages of data storage. Data storage must provide high compression ratios while maintaining controllable performance impact, and it's also crucial to flexibly choose between performance and compression ratios for different business scenarios.

[0022] Current data compression schemes include LSM-Tree-like compression schemes, where compressed data is compactly arranged and appended to disk. This requires periodic garbage collection of invalid data blocks. By moving data, valid data from a block is reread and appended to a new block, releasing the old block. This compression scheme inevitably suffers from write amplification, space amplification, and read amplification. Write amplification occurs when writing the last 4KB (preset storage sector size) of non-aligned data requires in-place overwriting. For writes crossing the 4KB boundary, an additional storage sector needs to be written. For example, if the compressed data size is 6KB, and this 6KB crosses a 4KB boundary, it requires two storage sectors (8KB) ​​for storage. Space amplification occurs when invalid data cannot be promptly reclaimed and reused. Only periodic garbage collection removes valid data for subsequent reclamation and reuse, resulting in lower overall compression. Read amplification occurs when non-4KB aligned I / O writes require an additional 4KB of data to be read for data crossing 4KB boundaries.

[0023] Another current data compression scheme aligns data based on the storage sector size. For example, with a sector size of 4KB, the compressed data is aligned upwards according to the sector size. Invalid space can be reclaimed through reuse or trim, eliminating the need for periodic garbage collection. However, this compression method has relatively poor compression performance and results in significant space waste.

[0024] Based on this, a data compression method is provided in this specification. This specification also relates to a data compression device, an electronic device, a computer-readable storage medium, and a computer program product, which will be described in detail in the following embodiments.

[0025] See Figure 1 , Figure 1 A flowchart of a data compression method according to an embodiment of this specification is shown. The method is applied to a smart storage device and specifically includes the following steps.

[0026] Step 102: Obtain at least one piece of business data to be processed.

[0027] In this context, intelligent storage devices are storage devices with data processing capabilities. The business data to be processed can be understood as business data that needs to be persistently stored, i.e., business data that needs to be written to the hard disk. In one or more specific embodiments provided in this specification, the business data to be processed can be understood as raw business data that has not yet been compressed. The business data to be processed can be data corresponding to any business scenario, such as image data in an image processing scenario, text data in a text processing scenario, audio data in a speech processing scenario, etc. The methods provided in the embodiments of this specification do not limit the specific business scenario of the business data to be processed.

[0028] There are many ways to obtain at least one piece of business data to be processed. It can be obtained from the Internet, transmitted from other terminals, or obtained from a removable storage medium connected to the current terminal. In the embodiments provided in this specification, the method of obtaining the business data to be processed is not limited, and the actual application shall prevail.

[0029] In the specific implementation provided in this specification, image processing service is used as an example for explanation. When a user operates the current terminal to access the Internet, multiple images to be processed can be obtained from the Internet. These multiple images to be processed are the data to be processed in the service.

[0030] Step 104: Compress each business data to be processed according to the preset storage sector size to obtain the storage block to be compressed corresponding to each business data to be processed, wherein the storage block to be compressed includes at least one storage sector.

[0031] The preset storage sector size can be understood as sectorsize. Sectorsize is a basic concept in disk storage. A sector is the smallest unit of physical storage on a hard disk. Each sector contains a certain amount of data, which is stored contiguously on the disk. The sector size determines the minimum amount of data for disk read and write operations. The preset storage sector size is the sector size. For example, in a specific embodiment provided in this specification, the preset storage sector size is 4KB, meaning that one storage sector can store 4KB of data.

[0032] Typically, the preset storage sector size of a terminal's storage device is fixed. During the data writing process, data is sequentially written to consecutive storage sectors, thus obtaining the storage blocks to be compressed corresponding to each piece of business data to be processed. These storage blocks to be compressed can be understood as a collection of storage sectors used to store the compressed business data to be processed.

[0033] For example, see Figure 2 , Figure 2 A schematic diagram illustrating the data compression provided in this specification is shown. For example... Figure 2 As shown, taking a 10KB data set to be processed and a preset storage sector size of 4KB as an example, the data can be compressed to 7KB. This 7KB data can be stored in two storage sectors, which together form the compressed storage block corresponding to the data set. This compressed storage block contains two storage sectors, and these two sectors store the compressed data corresponding to the data set to be processed.

[0034] Specifically, each piece of business data to be processed is compressed according to a preset storage sector size to obtain the corresponding storage block to be compressed, including: S1042. Determine the target business data to be processed, wherein the target business data to be processed is any one of at least one business data to be processed.

[0035] In practical applications, there may be one, two, or more business data to be processed. To better explain the method provided in the embodiments of this specification, this embodiment uses one piece of business data to be processed as an example for explanation. That is, a target piece of business data to be processed is selected from at least one piece of business data to be processed for explanation.

[0036] S1044. Compress the target business data to be processed based on a preset compression algorithm to obtain compressed business data, and obtain the compression capacity information corresponding to the compressed business data.

[0037] The preset compression algorithm can be understood as a commonly used data compression algorithm, such as the Zstandard algorithm, the LZ4 algorithm (Lempel-Ziv-Markov chain Algorithm 4), the GZIP algorithm (GNU zip), and the BZIP2 algorithm (Burrows-Wheeler Transform-based compression algorithm). In the methods provided in the embodiments of this specification, different preset compression algorithms can be flexibly selected according to the data characteristics and performance requirements.

[0038] Based on the data characteristics of the target business data to be processed, a corresponding preset compression algorithm is selected. The target business data to be processed is compressed using the preset compression algorithm to obtain compressed business data, and at the same time, the compression capacity information corresponding to the compressed business data is obtained. The compression capacity information can be understood as the data size of the compressed business data.

[0039] For example, let's take a target business data A with a size of 16KB as an example for explanation. By using a preset compression algorithm, the data is compressed to obtain compressed business data A1. The compressed capacity of business data A1 is 5KB, meaning that after data compression, the target business data A's size is reduced from 16KB to 5KB.

[0040] For example, taking a target business data B with a size of 16KB as an example, it is compressed using a preset compression algorithm to obtain compressed business data B1, which has a compression capacity of 10KB. That is, after data compression, the target business data B is compressed from 16KB to 10KB.

[0041] S1046. Determine at least one target storage sector corresponding to the compressed service data based on the compressed capacity information and the preset storage sector size.

[0042] After the above processing, at least one target storage sector can be determined for storing the compressed business data based on the compression capacity information and the preset storage sector size. The preset storage sector size is the size of the storage sector used for storing data as specified by the system. The target storage sector is the storage sector used to store the compressed business data.

[0043] In the method provided in the embodiments of this specification, after determining the compression capacity information corresponding to the compressed business data, and combining it with the preset storage sector size, it is possible to determine how many target storage sectors are needed to store the compressed business data.

[0044] For example, continuing with the previous example, assuming the compressed capacity of compressed business data A1 is 5KB, and the preset storage sector size is 4KB, then we can determine that two target storage sectors are needed to store compressed business data A1. That is, two target storage sectors are required to store compressed business data A1.

[0045] For example, if the compressed capacity of compressed business data B1 is 10KB, and the preset storage sector size is 4KB, then the target storage sectors for storing compressed business data B1 are determined to be 3, that is, three target storage sectors are needed to store compressed business data B1.

[0046] S1048. Save the compressed service data to the at least one target storage sector, and determine the at least one target storage sector as the storage block to be compressed corresponding to the target service data to be processed.

[0047] After determining at least one target storage sector corresponding to the compressed business data, data writing can be performed to write the compressed business data into the corresponding at least one target storage sector. For ease of subsequent understanding and description, the at least one target storage sector used to store the compressed business data is determined as the storage block to be compressed corresponding to the target business data to be processed, and is also the storage block to be compressed corresponding to the compressed business data.

[0048] In practical applications, during the process of writing compressed business data to the target storage sector, since the capacity of the target storage sector is fixed (usually 4KB; in this specification, the preset storage sector size is not limited and is subject to actual application), and the compressed business data may not completely fill the space in the target storage sector, there is still a possibility of data compression for the target storage sector that is not completely filled. Thus, the target storage sector used to store the compressed business data is determined as the storage block to be compressed.

[0049] In one specific embodiment provided in this specification, saving the compressed service data to the at least one target storage sector includes: The compressed business data is saved to at least one target storage sector in the order of storage. If free space is detected in the last target storage sector, fill the free space with data.

[0050] In practical applications, the storage block to be compressed consists of compressed business data and filler data. Furthermore, the compressed business data is first written sequentially to at least one target storage sector according to storage order. Typically, this is done in the order of the target storage sectors. After all the compressed business data has been written to the target storage sectors, it is necessary to check if there is any free space in the last target storage sector. If so, filler data is written into the free space.

[0051] The padding data can be understood as the data that was compressed during hardware compression in subsequent processing. In a specific embodiment provided in this specification, the padding data can be 0 or 1. Using padding data to fill empty spaces facilitates hardware compression in subsequent processes, thereby further improving the compression ratio.

[0052] For example, taking the compressed capacity of business data A1 as 5KB, and the preset storage sector size as 4KB, we can determine that there are two target storage sectors for storing compressed business data A1. The 5KB compressed business data A1 is stored in two target storage sectors. The first target storage sector stores 4KB of the data, and the second target storage sector stores 1KB of the data. There are then 3KB of free space in the second target storage sector. We can write filler data (0s) into this 3KB of free space.

[0053] After performing the compression process described above on each piece of pending business data, a storage block to be compressed corresponding to each piece of business data can be obtained. Each storage block contains at least one storage sector, and the last storage sector in each block may contain padding data. This padding data can be compressed in subsequent hardware compression to further improve the compression ratio.

[0054] Through the above processing, we can obtain compressed storage blocks for storing each piece of pending business data; more specifically, we can obtain compressed storage blocks for storing the corresponding compressed business data for each piece of pending business data. Then, hardware compression can be used to further compress the data in these compressed storage blocks.

[0055] Hardware data compression is a technology that uses dedicated hardware to compress data, aiming to reduce the size of stored data, save storage space, and improve data transmission efficiency.

[0056] Hardware data compression reduces data size based on data redundancy and pattern recognition. Through specific hardware circuits and algorithms, data is transformed into a more compact representation. This can significantly reduce the space required for storage and transmission without compromising data integrity and availability.

[0057] During hardware-based data compression, padding data within the storage blocks to be compressed can be compressed, writing the data to the hardware medium in a more compact layout and further improving the compression ratio. Therefore, after obtaining each storage block to be compressed, it is necessary to further identify whether padding data exists in each block, as well as the location and size of the padding data, to avoid compressing the actual business data.

[0058] In one specific embodiment provided in this specification, identifying the padding data in each storage block to be compressed includes: Determine the target storage block to be compressed, wherein the target storage block to be compressed is any one of the storage blocks to be compressed; Obtain the storage metadata corresponding to the target storage block to be compressed, and determine the compression service data corresponding to the target storage block to be compressed based on the storage metadata; The filling data corresponding to the target storage block to be compressed is determined based on the target storage block to be compressed and the compressed service data.

[0059] In practical applications, each compressed service data item has corresponding metadata stored together with the compressed service data in the storage block to be compressed. For better explanation, this embodiment uses a target storage block to be compressed as an example, where the target storage block to be compressed is any one of the various storage blocks to be compressed.

[0060] After identifying the target storage block to be compressed, its corresponding storage metadata can be obtained from it. This storage metadata consists of the metadata corresponding to the compressed service data within the target storage block, including the start and end positions of the compressed service data within that block. This storage metadata allows us to determine the compressed service data within the target storage block.

[0061] Once the compressed service data is determined, the filling data in the target storage block to be compressed can be determined based on the compressed service data and the target storage block to be compressed.

[0062] For example, if the target storage block to be compressed includes two storage sectors, the size of the compressed business data can be determined to be 5KB using storage metadata. Then, the filling data can be further identified as 3KB of data in the last storage sector.

[0063] Step 106: Compress the fill data in each storage block to be compressed to obtain the target storage block, wherein the target storage block includes compressed service data corresponding to each service data to be processed, and the fill data is used to fill the free space in the storage sector.

[0064] The methods provided in the embodiments of this specification are applied to intelligent storage devices with data processing capabilities, i.e., storage devices with data processing capabilities, such as Smart SSDs (intelligent solid-state drives). A Smart SSD is a solid-state drive that integrates data processing functions, designed to improve system performance and energy efficiency. Smart SSDs incorporate FPGAs (Field-Programmable Gate Arrays), supporting high-speed computation at the data storage location, thereby improving data processing speed and efficiency.

[0065] Smart SSDs integrate memory and computing power into flash memory devices, providing high-performance compression capabilities through built-in FPGA cards or ASIC acceleration chips. In a specific embodiment provided in this specification, the data filling in each storage block to be compressed is compressed through a data translation layer in the smart storage device. Specifically, this can be achieved by modifying the FTL (Flash Translation Layer) to support variable-length data organization, allowing the compressed data to be stored compactly on the NAND medium.

[0066] FTL stands for Flash Translation Layer, which translates or maps the host's (or user's, host's) logical address space to the flash memory's physical address space. Each time an SSD writes user logical data into the flash memory address space, it records the mapping between that logical address and the physical address. When the host wants to read that data, the SSD uses this mapping to read the data from the flash memory and then returns it to the user. FTL has the ability to organize data; in the methods provided in the embodiments of this specification, hardware compression of data can be performed using FTL.

[0067] In the steps described above, compressed business data is written to storage sectors of a storage device with data processing capabilities. In this current step, the data processing capabilities of the storage device are used to compress the fill data in each storage sector, thereby achieving data compression on the hardware level. After the fill data in each storage block to be compressed is compressed, the target storage block can be obtained.

[0068] In one specific embodiment provided in this specification, compressing the padding data in each storage block to be compressed to obtain the target storage block includes: The first and second storage blocks to be compressed are determined according to the storage block order of each storage block to be compressed. If the first padding data exists in the first storage block to be compressed, the first padding data is compressed, and the compressed service data in the second storage block to be compressed is concatenated with the compressed service data in the first storage block to be compressed to obtain a reference storage block; The reference storage block is used as the first storage block to be compressed, the next storage block to be compressed is used as the second storage block to be compressed, and the operation of compressing the first padding data when the first storage block to be compressed contains the first padding data is continued, and the compressed service data in the second storage block to be compressed is concatenated with the compressed service data in the first storage block to be compressed to obtain the reference storage block is performed. The reference block after the last block to be compressed is determined as the target block.

[0069] In this embodiment, the compression of the data filling each storage block to be compressed is further explained. The storage locations of each storage block to be compressed in the Smart SSD have a specific order. The first and second storage blocks to be compressed are determined based on their storage block order. The first storage block to be compressed can be understood as the one with the earlier storage block order, and the second storage block to be compressed can be understood as the one with the later storage block order.

[0070] It should be noted that, in the method provided in the embodiments of this specification, compressing each storage block to be compressed refers to the process of compressing the padding data in each storage block to be compressed.

[0071] After determining the first storage block to be compressed, it is determined whether there is first padding data in the first storage block to be compressed. The first padding data can be understood as the padding data in the first storage block to be compressed. If the first padding data exists, it can be compressed by hardware, and then the data in the second storage block to be compressed is concatenated with the compressed service data in the first storage block to be compressed. A reference storage block is obtained, which can be understood as the storage block obtained after the first padding data in the first storage block to be compressed is compressed, and then the compressed service data in the first storage block to be compressed is concatenated with the second storage block to be compressed.

[0072] Determine if the second block to be compressed is the last block to be compressed. If not, designate the reference block as the new first block to be compressed, designate the next block to be compressed as the second block to be compressed, and continue the above process. If the second block to be compressed is the last block to be compressed, then the reference block can be compressed, and the reference block after the last block to be compressed is designated as the target block.

[0073] See Figure 3 , Figure 3 A schematic diagram of hardware compression provided in one embodiment of this specification is shown, such as... Figure 3The following explanation uses three storage blocks to be compressed as an example. Specifically, storage block 1 is used as an example. Storage block 1 has two target storage sectors, and the size of the compressed business data is 7KB. Therefore, storage block 1 contains 1KB of fill data. Storage block 1 is designated as the first storage block to be compressed, and storage block 2 is designated as the second storage block to be compressed. The fill data in storage block 1 is compressed and released. The compressed business data in storage block 1 is then concatenated with the data in storage block 2 to obtain reference storage block 1.

[0074] If the storage block to be compressed, 2, is not the last storage block to be compressed, then the reference storage block 1 is used as the new first storage block to be compressed, and the storage block to be compressed, 3, is used as the second storage block to be compressed. The fill data in the new first storage block to be compressed (i.e., the fill data in storage block to be compressed, 2) is compressed, and the compressed business data in storage block to be compressed, 3, is concatenated with the compressed business data in reference storage block 1 to obtain reference storage block 2.

[0075] At this point, storage block 3 is the last storage block to be compressed. The padding data in reference storage block 2 (i.e., the padding data in storage block 3) is then compressed to obtain the final target storage block.

[0076] It should be noted that the above is only an example of one hardware compression method. In practical applications, each storage block to be compressed can be compressed in parallel, thereby writing the data onto the hardware medium in a more compact layout and further improving the compression ratio.

[0077] The data compression method provided in the embodiments of this specification employs a two-stage data compression approach. The first stage uses software compression, and the second stage, based on the software compression, performs further data compression using hardware. This combination of software and hardware provides a data compression solution with high compression ratios, high performance, and low read / write amplification.

[0078] In the software compression stage, the original business data is compressed at a preset storage sector size. The compressed business data is then aligned to the storage sector size, and any empty spaces are filled with padding data. In the hardware compression stage, the padding data in each storage sector is compressed, and variable-length FTL is fully utilized to make the data compactly arranged. This improves the data compression ratio.

[0079] In another specific embodiment provided in this specification, the method further includes: Receive a data adjustment instruction, wherein the data adjustment instruction includes data identification information; A reference storage block to be compressed is determined based on the data identification information, wherein the reference storage block to be compressed stores the compressed service data corresponding to the data identification information; Adjust the compressed service data in the reference storage block to be compressed.

[0080] Before hardware compression but after software compression, data adjustments still occur. Specifically, in this embodiment, a data adjustment instruction for business data is received. This instruction carries data identification information, which determines which part of the business data is being adjusted.

[0081] Upon receiving a data adjustment instruction, the reference storage block to be compressed can be found based on the data identification information in the instruction. The reference storage block to be compressed can be understood as the storage block to be compressed that contains the compressed business data corresponding to the data identification information.

[0082] Once the reference storage block to be compressed is determined, the compressed business data stored in that reference storage block can be adjusted.

[0083] In practical applications, data adjustment commands can be data deletion commands, data update commands, etc. If the size of the adjusted compressed business data still meets the storage space requirements of the reference storage block to be compressed, it can be directly stored in the original reference storage block. If the size of the adjusted compressed business data does not meet the storage space requirements of the reference storage block to be compressed, or if a deletion command is received and the compressed business data in the reference storage block to be compressed is deleted, then the data in the reference storage block to be compressed is in an invalid state, meaning that there is invalid space in the reference storage block to be compressed. In this case, to better utilize this space, relevant processing can be performed on the invalid space to improve space utilization.

[0084] In yet another specific embodiment provided in this specification, the method further includes: Obtain the reference storage space information of the reference storage block to be compressed; Reference storage information is determined based on the reference storage space information, and compressed data to be written is determined based on the reference storage information. The compressed data to be written is written to the reference compressed storage block.

[0085] In this embodiment, if it is determined that there is invalid space in the reference storage block to be compressed, reference storage space information of the reference storage block to be compressed can be obtained. The reference storage space information can be understood as storage information about the reference storage block to be compressed. For example, it may include the number of storage sectors in the reference storage block to be compressed, or the storage space of the reference storage block to be compressed, etc.

[0086] After determining the reference storage space information, the reference storage information that can be written to the storage block to be compressed can be determined based on this information. This reference storage information is then used to further determine the compressed data to be written to the reference storage block to be compressed. The compressed data to be written can be understood as new compressed service data.

[0087] It should be noted that the compressed data to be written here refers to compressed business data that conforms to the reference storage information corresponding to the reference storage space information. In a specific embodiment provided in this specification, determining the reference storage information based on the reference storage space information includes: The number of reference storage sectors or the range of reference storage space are determined based on the reference storage space information. Accordingly, the compressed data to be written is determined based on the reference storage information, including: The compressed data to be written is determined based on the number of reference storage sectors or the reference storage space range.

[0088] In this embodiment, as described above, the reference storage information may be the number of reference storage sectors or the reference storage space range.

[0089] The reference storage sector number refers to the number of storage sectors corresponding to the reference storage block to be compressed, and the reference storage space range refers to the range of data size that the reference storage block to be compressed can hold.

[0090] For example, if there are 3 reference memory sectors in the reference memory block to be compressed, and each reference memory sector is 4KB in size, then the reference memory space information could be 3 reference memory sectors, with a reference memory space range of 8KB-12KB.

[0091] The compressed data to be written can be determined based on either the number of reference storage sectors or the reference storage space range.

[0092] Specifically, determining the compressed data to be written based on the number of reference storage sectors or the reference storage space range includes: Compressed service data whose size meets the reference storage space range is determined as the data to be written for compression; or, Compressed service data whose quantity is less than or equal to the number of reference storage sectors and whose data size is less than or equal to the preset storage sector size are identified as the data to be written.

[0093] Taking a reference storage space range as an example, we can determine that compressed business data whose size meets the requirements of that reference storage space range is the data to be written for compression. For example, continuing with the previous example, if the reference storage space range is 8 KB - 12 KB, we can determine that compressed business data with a size of 8 KB - 12 KB is the data to be written for compression.

[0094] Taking the number of reference storage sectors as an example, compressed business data whose quantity is less than or equal to the number of reference storage sectors and whose data size is less than or equal to the preset storage sector size is identified as the compressed data to be written. For example, continuing with the previous example, if the reference storage space information is 3 reference storage sectors, then 3 compressed business data with a capacity of less than or equal to 4KB can be identified as the compressed data to be written and written to the three reference storage sectors respectively.

[0095] See Figure 4 , Figure 4 A schematic diagram of space reuse provided in one embodiment of this specification is shown, such as... Figure 4 As shown, the shaded area represents the reference storage block to be compressed, which includes three reference storage sectors with a capacity of 12KB. For this, you can choose... Figure 4 The first method involves writing compressed business data with a size greater than 8KB and less than 12KB; alternatively, you can choose... Figure 4 The second method involves writing two compressed service data sets, one of which is larger than 4KB and smaller than 8KB, and the other is smaller than or equal to 4KB.

[0096] In yet another specific embodiment provided in this specification, the method further includes: Receive a storage release request for the reference storage block to be compressed, and release the storage space corresponding to the reference storage block to be compressed based on the storage release request; or, Write padding data into the reference storage block to be compressed.

[0097] In the methods provided in the embodiments of this specification, in addition to reusing the reference storage block to be compressed, it can also be released at the physical hardware level. Specifically, a storage release request for the reference storage block to be compressed can be received. The storage space corresponding to the reference storage block to be compressed can be released based on the storage release request. Physical space on the physical disk can be released by sending a trim request. Alternatively, padding data can be written into the reference storage block to be compressed, allowing it to be compressed in a subsequent hardware compression stage.

[0098] Whether using the trim method, the padding data writing method, or the space reuse method, none of these methods require data movement, resulting in minimal read / write operations and a simple mechanism. Space reuse minimizes fragmentation, and existing fragments can be reclaimed by releasing or writing padding data. This simplifies the data compression process and improves the compression ratio.

[0099] The following is combined with Figure 5The data compression method provided in the embodiments of this specification will be further explained and described. Figure 5 A schematic diagram of data compression is shown for a data compression method provided in one embodiment of this specification.

[0100] like Figure 5 As shown, the intelligent storage device acquires three data points to be processed, each 16KB in size. It then compresses these three data points using a pre-defined compression algorithm, resulting in three compressed data points. These compressed data points are then written to the storage sectors of the intelligent storage device. Specifically, the first compressed data point is 5KB in size, occupying two storage sectors (storage block 1); the second compressed data point is 10KB in size, occupying three storage sectors (storage block 2); and the third compressed data point is 3KB in size, occupying one storage sector (storage block 3).

[0101] The intelligent storage device performs hardware-level data compression on the filling data in the three storage blocks to be compressed, thereby merging the three compressed business data into one, resulting in the three compressed business data occupying only 5 storage sectors. Compared to the 6 storage sectors occupied by software compression and the 12 storage sectors occupied by the original business data to be processed, this significantly saves disk space and improves the data compression rate.

[0102] The data compression method provided in this embodiment creates a prerequisite for a high compression ratio through large-block compression in the software compression stage; the compressed data is aligned to 4KB, which simplifies data organization, reduces data organization costs, and minimizes read / write amplification and space amplification; the entire solution does not require periodic garbage collection, achieving high performance; the aligned data is padded with zeros (filled data), allowing software and hardware compression to work closely together, with the hardware compression layer pressing out the filled data to obtain a better compression ratio; the software compression layer allows for free selection of compression algorithms and compression granularity, providing greater flexibility, and parameters and algorithms can be selected according to performance requirements.

[0103] Corresponding to the above method embodiments, this specification also provides embodiments of a data compression apparatus. Figure 6 A schematic diagram of a data compression apparatus according to one embodiment of this specification is shown. Figure 6 As shown, this device is used in intelligent storage devices and includes: The acquisition module 602 is configured to acquire at least one piece of business data to be processed. The first compression module 604 is configured to compress each business data to be processed according to a preset storage sector size to obtain a storage block to be compressed corresponding to each business data to be processed, wherein the storage block to be compressed includes at least one storage sector. The second compression module 606 is configured to compress the fill data in each storage block to be compressed to obtain a target storage block, wherein the target storage block includes compressed service data corresponding to each service data to be processed, and the fill data is used to fill the free space in the storage sector.

[0104] Optionally, the first compression module 604 is further configured to: Determine the target business data to be processed, wherein the target business data to be processed is any one of at least one business data to be processed; The target business data to be processed is compressed based on a preset compression algorithm to obtain compressed business data, and the compression capacity information corresponding to the compressed business data is obtained. Determine at least one target storage sector corresponding to the compressed service data based on the compressed capacity information and the preset storage sector size; The compressed service data is saved to the at least one target storage sector, and the at least one target storage sector is determined to be the storage block to be compressed corresponding to the target service data to be processed.

[0105] Optionally, the first compression module 604 is further configured to: The compressed business data is saved to at least one target storage sector in the order of storage. If free space is detected in the last target storage sector, fill the free space with data.

[0106] Optionally, the second compression module 606 is further configured to: The first and second storage blocks to be compressed are determined according to the storage block order of each storage block to be compressed. If the first padding data exists in the first storage block to be compressed, the first padding data is compressed, and the compressed service data in the second storage block to be compressed is concatenated with the compressed service data in the first storage block to be compressed to obtain a reference storage block; The reference storage block is used as the first storage block to be compressed, the next storage block to be compressed is used as the second storage block to be compressed, and the operation of compressing the first padding data when the first storage block to be compressed contains the first padding data is continued, and the compressed service data in the second storage block to be compressed is concatenated with the compressed service data in the first storage block to be compressed to obtain the reference storage block is performed. The reference block after the last block to be compressed is determined as the target block.

[0107] Optionally, the device further includes an adjustment module configured to: Receive a data adjustment instruction, wherein the data adjustment instruction includes data identification information; A reference storage block to be compressed is determined based on the data identification information, wherein the reference storage block to be compressed stores the compressed service data corresponding to the data identification information; Adjust the compressed service data in the reference storage block to be compressed.

[0108] Optionally, the device further includes a rewrite module configured to: Obtain the reference storage space information of the reference storage block to be compressed; Reference storage information is determined based on the reference storage space information, and compressed data to be written is determined based on the reference storage information. The compressed data to be written is written to the reference compressed storage block.

[0109] Optionally, the rewrite module is further configured to: The number of reference storage sectors or the range of reference storage space are determined based on the reference storage space information. The compressed data to be written is determined based on the number of reference storage sectors or the reference storage space range.

[0110] Optionally, the rewrite module is further configured to: Compressed service data whose size meets the reference storage space range is determined as the data to be written for compression; or, Compressed service data whose quantity is less than or equal to the number of reference storage sectors and whose data size is less than or equal to the preset storage sector size are identified as the data to be written.

[0111] Optionally, the device further includes a release module configured to: Receive a storage release request for the reference storage block to be compressed, and release the storage space corresponding to the reference storage block to be compressed based on the storage release request; or, Write padding data into the reference storage block to be compressed.

[0112] The data compression device provided in the embodiments of this specification employs a two-stage data compression process. The first stage uses software compression, and the second stage, based on the software compression, performs further data compression through hardware. This combination of software and hardware provides a data compression solution with high compression ratio, high performance, and low read / write amplification.

[0113] In the software compression stage, the original business data is compressed at a preset storage sector size. The compressed business data is then aligned to the storage sector size, and any empty spaces are filled with padding data. In the hardware compression stage, the padding data in each storage sector is compressed, and variable-length FTL is fully utilized to make the data compactly arranged. This improves the data compression ratio.

[0114] The above is an illustrative scheme of a data compression device according to this embodiment. It should be noted that the technical solution of this data compression device and the technical solution of the data compression method described above belong to the same concept. For details not described in detail in the technical solution of the data compression device, please refer to the description of the technical solution of the data compression method described above.

[0115] Figure 7 A structural block diagram of an electronic device 700 according to an embodiment of this application is shown. The components of the electronic device 700 include, but are not limited to, a memory 710 and a processor 720. The processor 720 is connected to the memory 710 via a bus 730, and a database 750 is used to store data.

[0116] Electronic device 700 also includes access device 740, which enables electronic device 700 to communicate via one or more networks 760. Examples of these networks include Public Switched Telephone Network (PSTN), Local Area Network (LAN), Wide Area Network (WAN), Personal Area Network (PAN), or combinations of communication networks such as the Internet. Access device 740 may include one or more of any type of wired or wireless network interface (e.g., network interface card (NIC)), such as an IEEE 802.11 Wireless Local Area Network (WLAN) wireless interface, Wi-MAX (Worldwide Interoperability for Microwave Access) interface, Ethernet interface, Universal Serial Bus (USB) interface, cellular network interface, Bluetooth interface, Near Field Communication (NFC) interface, and so on.

[0117] In one embodiment of this application, the above-described components of the electronic device 700 and Figure 7Other components, not shown, can also be connected to each other, for example, via a bus. It should be understood that... Figure 7 The illustrated electronic device block diagram is for illustrative purposes only and is not intended to limit the scope of this application. Those skilled in the art can add or replace other components as needed.

[0118] Electronic device 700 can be any type of stationary or mobile electronic device, including mobile computers or mobile electronic devices (e.g., tablet computers, personal digital assistants, laptop computers, notebook computers, netbooks, etc.), mobile phones (e.g., smartphones), wearable electronic devices (e.g., smartwatches, smart glasses, etc.) or other types of mobile devices, or stationary electronic devices such as desktop computers or personal computers (PCs). Electronic device 700 can also be a mobile or stationary server.

[0119] The processor 720 is used to execute the following computer program / instructions, which, when executed by the processor, implement the steps of the above-described data compression method.

[0120] The above is an illustrative scheme of an electronic device according to this embodiment. It should be noted that the technical solution of this electronic device and the technical solution of the data compression method described above belong to the same concept. For details not described in detail in the technical solution of the electronic device, please refer to the description of the technical solution of the data compression method described above.

[0121] An embodiment of this specification also provides a computer-readable storage medium storing a computer program / instructions that, when executed by a processor, implement the steps of the data compression method described above.

[0122] The various embodiments in this specification are described in a progressive manner. Similar or identical parts between embodiments can be referred to mutually. Each embodiment focuses on describing the differences from other embodiments. In particular, the computer-readable storage medium embodiments are basically similar to the data compression method embodiments, so the description is relatively simple; relevant parts can be referred to the description of the data compression method embodiments.

[0123] An embodiment of this specification also provides a computer program product, including a computer program / instructions that, when executed by a processor, implement the steps of the above-described data compression method.

[0124] The above is an illustrative scheme of a computer program product according to this embodiment. It should be noted that the technical solution of this computer program product and the technical solution of the data compression method described above belong to the same concept. For details not described in detail in the technical solution of the computer program product, please refer to the description of the technical solution of the data compression method described above.

[0125] An embodiment of this specification also provides a database product, including a computer program / instructions that, when executed by a processor, perform the following steps: Obtain at least one piece of pending business data; Compress each business data to be processed according to the preset storage sector size to obtain the storage block to be compressed corresponding to each business data to be processed, wherein the storage block to be compressed includes at least one storage sector. Compress the fill data in each storage block to be compressed to obtain a target storage block, wherein the target storage block includes compressed service data corresponding to each service data to be processed, and the fill data is used to fill the free space in the storage sector.

[0126] The above is an illustrative scheme of a database product according to this embodiment. It should be noted that the technical solution of this database product and the technical solution of the data compression method described above belong to the same concept. For details not described in detail in the technical solution of the database product, please refer to the description of the technical solution of the data compression method described above.

[0127] The foregoing has described specific embodiments of this specification. Other embodiments are within the scope of the appended claims. In some cases, the actions or steps recited in the claims may be performed in a different order than that shown in the embodiments and may still achieve the desired result. Furthermore, the processes depicted in the drawings do not necessarily require the specific or sequential order shown to achieve the desired result. In some embodiments, multitasking and parallel processing are possible or may be advantageous.

[0128] The computer instructions include computer program code, which may be in the form of source code, object code, executable file, or certain intermediate forms. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording media, USB flash drive, portable hard drive, magnetic disk, optical disk, computer memory, read-only memory (ROM), random access memory (RAM), electrical carrier signals, telecommunication signals, and software distribution media, etc. It should be noted that the content included in the computer-readable medium may be appropriately added or removed according to the requirements of patent practice. For example, in some regions, according to patent practice, computer-readable media may not include electrical carrier signals and telecommunication signals.

[0129] It should be noted that the above description describes specific embodiments of this specification. Other embodiments are within the scope of the appended claims. In some cases, the actions or steps recorded in the claims can be performed in a different order than that shown in the embodiments and still achieve the desired results. Furthermore, the processes depicted in the drawings do not necessarily require a specific or sequential order to achieve the desired results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous. Secondly, those skilled in the art should also understand that the embodiments described in the specification are preferred embodiments, and the actions and modules involved are not necessarily essential to the embodiments of this specification.

[0130] In the above embodiments, the descriptions of each embodiment have different focuses. For parts not described in detail in a certain embodiment, please refer to the relevant descriptions of other embodiments.

[0131] The preferred embodiments disclosed above are merely illustrative of this specification. The optional embodiments do not exhaustively describe all details, nor do they limit the invention to the specific implementations described. Clearly, many modifications and variations can be made based on the embodiments described herein. These embodiments are selected and specifically described in this specification to better explain the principles and practical applications of the embodiments, thereby enabling those skilled in the art to better understand and utilize this specification. This specification is limited only by the claims and their full scope and equivalents.

Claims

1. A data compression method, applied to an intelligent storage device, comprising: Obtain at least one piece of pending business data; Compress each business data to be processed according to the preset storage sector size to obtain the storage block to be compressed corresponding to each business data to be processed, wherein the storage block to be compressed includes at least one storage sector. Compress the fill data in each storage block to be compressed to obtain a target storage block, wherein the target storage block includes compressed service data corresponding to each service data to be processed, and the fill data is used to fill the free space in the storage sector.

2. The method as described in claim 1, wherein each piece of pending service data is compressed according to a preset storage sector size to obtain a storage block to be compressed corresponding to each piece of pending service data, comprising: Determine the target business data to be processed, wherein the target business data to be processed is any one of at least one business data to be processed; The target business data to be processed is compressed based on a preset compression algorithm to obtain compressed business data, and the compression capacity information corresponding to the compressed business data is obtained. Determine at least one target storage sector corresponding to the compressed service data based on the compressed capacity information and the preset storage sector size; The compressed service data is saved to the at least one target storage sector, and the at least one target storage sector is determined to be the storage block to be compressed corresponding to the target service data to be processed.

3. The method as described in claim 2, wherein saving the compressed service data to the at least one target storage sector comprises: The compressed business data is saved to at least one target storage sector in the order of storage. If free space is detected in the last target storage sector, fill the free space with data.

4. The method as described in claim 1, compressing the padding data in each storage block to be compressed to obtain the target storage block, includes: The first and second storage blocks to be compressed are determined according to the storage block order of each storage block to be compressed. If the first padding data exists in the first storage block to be compressed, the first padding data is compressed, and the compressed service data in the second storage block to be compressed is concatenated with the compressed service data in the first storage block to be compressed to obtain a reference storage block; The reference storage block is used as the first storage block to be compressed, the next storage block to be compressed is used as the second storage block to be compressed, and the operation of compressing the first padding data when the first storage block to be compressed contains the first padding data is continued, and the compressed service data in the second storage block to be compressed is concatenated with the compressed service data in the first storage block to be compressed to obtain the reference storage block is performed. The reference block after the last block to be compressed is determined as the target block.

5. The method as described in claim 1, wherein compressing the padding data in each storage block to be compressed comprises: The data conversion layer in the intelligent storage device compresses the fill data in each storage block to be compressed.

6. The method of claim 1, further comprising: Receive a data adjustment instruction, wherein the data adjustment instruction includes data identification information; A reference storage block to be compressed is determined based on the data identification information, wherein the reference storage block to be compressed stores the compressed service data corresponding to the data identification information; Adjust the compressed service data in the reference storage block to be compressed.

7. The method of claim 6, further comprising: Obtain the reference storage space information of the reference storage block to be compressed; Reference storage information is determined based on the reference storage space information, and compressed data to be written is determined based on the reference storage information. The compressed data to be written is written to the reference storage block to be compressed.

8. The method of claim 7, wherein determining the reference storage information based on the reference storage space information includes: The number of reference storage sectors or the range of reference storage space are determined based on the reference storage space information. Accordingly, the compressed data to be written is determined based on the reference storage information, including: The compressed data to be written is determined based on the number of reference storage sectors or the reference storage space range.

9. The method of claim 8, wherein determining the compressed data to be written based on the number of reference storage sectors or the reference storage space range, comprises: Compressed business data whose size meets the reference storage space range is identified as the data to be written. or, Compressed service data whose quantity is less than or equal to the number of reference storage sectors and whose data size is less than or equal to the preset storage sector size are identified as the data to be written.

10. The method of claim 6, further comprising: Receive a storage release request for the reference storage block to be compressed, and release the storage space corresponding to the reference storage block to be compressed based on the storage release request; or, Write padding data into the reference storage block to be compressed.

11. A data compression apparatus, applied to an intelligent storage device, comprising: The acquisition module is configured to acquire at least one piece of business data to be processed. The first compression module is configured to compress each business data to be processed according to a preset storage sector size to obtain a storage block to be compressed corresponding to each business data to be processed, wherein the storage block to be compressed includes at least one storage sector. The second compression module is configured to compress the fill data in each storage block to be compressed to obtain a target storage block, wherein the target storage block includes compressed service data corresponding to each service data to be processed, and the fill data is used to fill the free space in the storage sector.

12. An electronic device, comprising: Memory and processor; The memory is used to store computer programs / instructions, and the processor is used to execute the computer programs / instructions, which, when executed by the processor, implement the steps of the method according to any one of claims 1 to 10.

13. A computer-readable storage medium storing a computer program / instructions that, when executed by a processor, implement the steps of the method according to any one of claims 1 to 10.

14. A computer program product comprising a computer program / instructions that, when executed by a processor, implement the steps of the method according to any one of claims 1 to 10.

15. A database product comprising a computer program / instructions that, when executed by a processor, perform the following steps: Obtain at least one piece of pending business data; Compress each piece of business data to be processed according to a preset storage sector size to obtain the corresponding storage block to be compressed for each piece of business data. The block to be compressed includes at least one storage sector; Compress the fill data in each storage block to be compressed to obtain a target storage block, wherein the target storage block includes compressed service data corresponding to each service data to be processed, and the fill data is used to fill the free space in the storage sector.