A hardware compression and decompression method based on Huffman decoding table

By dynamically generating Huffman decoding tables and dictionaries in the hardware system, the problems of low compression ratio and slow decoding rate in hardware Gzip compression are solved, achieving efficient data compression and decompression.

CN115189696BActive Publication Date: 2026-06-23HANGZHOU DIANZI UNIV

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
HANGZHOU DIANZI UNIV
Filing Date
2022-08-01
Publication Date
2026-06-23

AI Technical Summary

Technical Problem

The existing hardware Gzip compression method has a low compression ratio, consumes too much hardware storage resources, and has a slow decoding speed during decompression.

Method used

A dynamic Huffman decoding table is adopted. A dictionary is generated by sliding window and hash function. The LZ77 compression result is dynamically counted to generate a Huffman decoding table suitable for hardware decompression. The table is stored according to the hardware storage format. The decompression process uses dynamic lookup.

Benefits of technology

It improves the compression ratio, saves storage resources, and enhances decompression efficiency and accuracy.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN115189696B_ABST
    Figure CN115189696B_ABST
Patent Text Reader

Abstract

The application discloses a hardware compression and decompression method based on Huffman decoding table, wherein original data is compressed on a PC end first, a Huffman decoding table and compressed data are dynamically generated in the compression process, and the two parts are burned into a hardware storage device; a decompression process is carried out in an embedded system, the embedded system sends corresponding instructions and addresses to a hardware decompression module, the decompression module addresses and searches the Huffman decoding table according to the corresponding instructions to decompress the compressed data, and the decompressed data is sent into the embedded system. The open source Gzip compression code is improved, a dynamic Huffman decoding table suitable for the hardware decompression module of the application is generated in the compression process, and the dynamic Huffman decoding table is stored in a corresponding format, so that the decoding table is directly used in the decompression process, and the decompression efficiency and accuracy are greatly improved.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention belongs to the field of hardware data processing, and specifically relates to a hardware compression and decompression method based on a Huffman decoding table. Background Technology

[0002] In some embedded systems, the operation of the system or communication between systems requires a large amount of data. The size of the data and the transmission speed of inter-system communication often affect the system's efficiency. Therefore, data compression can improve system efficiency. As is well known, many existing compression methods use the Deflate compression algorithm. The most crucial parts of the Deflate compression algorithm are LZ77 compression and Huffman compression. Improving these two parts can significantly improve compression efficiency. The LZ77 compression algorithm is a dictionary-based compression algorithm. In a data set containing many repeated strings, LZ77 compression places these repeated strings into a hash linked list. When the same string appears again, it searches the hash linked list for matching, thus eliminating many repeated strings. It uses a combination of matching distance and matching length to replace these repeated strings, achieving a compression effect. Therefore, building an efficient hash linked list can improve the compression ratio of LZ77. The data compressed using LZ77 already achieves a good compression effect compared to the original data, but further compression is still possible. The Deflate compression algorithm uses Huffman compression to further compress the LZ77 compressed data. This involves the concepts of dynamic and static encoding. Static encoding uses one or more predefined encoding sets by Gzip for compression and decompression. However, using the same compression encoding for different original data can result in varying and low compression ratios. Furthermore, static encoding encodes certain missing characters, match lengths, or match distances, wasting significant resources. Storing these in hardware storage devices would also consume additional storage space, impacting compression and decompression efficiency. Dynamic encoding, on the other hand, re-encodes based on the frequency of each character or match length and distance based on the LZ77 compression results. High-frequency data is encoded with shorter code lengths, while low-frequency data is encoded with longer code lengths. It avoids encoding missing characters, match lengths, or match distances, saving resources. Different original data generate different Huffman trees, and the dynamically generated Huffman tree is utilized... The tree algorithm further encodes the LZ77 compressed data to generate the final compressed result. It is undoubtedly the most suitable encoding method and achieves good compression results.

[0003] One existing technology employs a software compression and hardware decompression method. This method uses software compression on the PC side, ensuring data accuracy during the compression process. First, the original data is compressed using LZ77, then further compressed using a static Huffman compression method, encoding the LZ77 compression result using predefined Huffman codes. After compression, the compressed data is burned into a storage device in the hardware system, such as flash memory. This completes the data compression and storage. The decompression process uses hardware decompression. The compressed data is first decompressed using a static Huffman decompression module, and then restored to the original data using an LZ77 decompression module.

[0004] Existing technical solutions generally employ static Huffman compression. On one hand, static Huffman compression requires storing multiple pre-generated Huffman trees and Huffman decoding tables in the hardware storage module, consuming significant storage resources. During compression, all Huffman trees are compressed simultaneously, and the set with the highest compression ratio is selected as the final result, undoubtedly consuming considerable time and resulting in low efficiency. Furthermore, static compression does not achieve high compression ratios for large amounts of text, requiring even more storage space for the compressed data. On the other hand, the structure of the compressed data and Huffman decoding tables is not suitable for the current hardware decompression storage structure. Utilizing this format's decoding table during decompression is inefficient, failing to effectively leverage the advantages of hardware decompression. Moreover, selecting the corresponding decoding table from multiple sets of decoding tables makes the decompression process extremely cumbersome, resulting in low decompression speed and efficiency. Summary of the Invention

[0005] In view of this, the present invention mainly addresses the technical problems of existing hardware Gzip compression methods, such as low compression ratio, excessive hardware storage resource consumption, and slow decoding speed during decompression. Specifically, it provides a hardware compression method based on a Huffman decoding table, including the following steps:

[0006] S11, read the input raw data into a sliding window;

[0007] S12, in the sliding window, every 3 bytes are input into a hash function for calculation and used as the hash value; strings with the same first three bytes have the same hash value, and they are chained together to form a hash linked list, i.e., a dictionary;

[0008] S13, update the dictionary and check if there is a matching string. If a matching string exists, check if it is the best match.

[0009] S14: Dynamically count the frequency of unmatched characters and the sum of matching distance and matching length during LZ77 compression, and build a dynamic Huffman tree based on the results of the dynamic statistics. Characters that appear more frequently than the preset range are encoded with shorter code lengths, and characters that appear less frequently than the preset range are encoded with longer code lengths.

[0010] S15, dynamically generate a Huffman decoding table suitable for hardware decompression, according to the format of codeword length, starting address pointer, and decompression data storage range;

[0011] S16: Encode the LZ77 compressed result using a dynamically generated Huffman tree to generate the final compressed result. Store the compressed result in a Huffman decoding table and burn the compressed result into the hardware storage device.

[0012] Preferably, step S13 specifically includes updating the dictionary information every three new input bytes. First, the original bytes are added to the corresponding chain. Then, the hash chain is searched to check if the currently input string has a matching string. The chain is traversed to see if it is the best matching string. If it is a matching string, a pair of matching information is used to describe the string. After all the data is processed, the original data becomes two forms: character literal and matching distance + matching length, i.e., distance + length, thus completing LZ77 compression.

[0013] Preferably, the matching information is a combination of distance and matching length.

[0014] Preferably, the structure of the Huffman decoding table is as follows: Let 0x00 be the starting address, and store the codeword length of the character or match length starting from address 0x00. The dynamically generated Huffman tree is stored in order of codeword length from shortest to longest, and the number of codeword lengths is stored as many as there are.

[0015] Starting at address 0x10, a pointer to the address corresponding to the codeword length of the character or match is stored. This pointer points to the starting address where the decoded data of the character or match length is stored, corresponding to the current codeword length.

[0016] Address 0x2000 begins storing the decoded data of the character or match length, stored sequentially in ascending order of codeword length;

[0017] Starting at address 0x4000, the codeword length of the matching distance is stored. The dynamically generated Huffman tree stores the codeword lengths in ascending order, and stores as many codeword lengths as there are available.

[0018] Address 0x4010 is used to store the address pointer corresponding to the codeword length. This address pointer points to the starting address of the data after decoding the character or match length corresponding to the current codeword length.

[0019] The data after decoding the matching distance is stored starting at address 0x6000, and is stored sequentially in order of codeword length from shortest to longest.

[0020] The Huffman decoding table is dynamically generated based on the hardware storage format and address after dynamic statistics of the LZ77 compressed results, and it is a set of tables corresponding to the current compressed data.

[0021] To achieve the above objectives, the present invention also provides a hardware decompression method based on a Huffman decoding table. Corresponding to the compression method described above, the decompression method includes the following steps:

[0022] S31 uses a dynamic lookup method to find the corresponding pointer information for the current binary codeword length;

[0023] S32, find the starting address of the decoded data corresponding to the codeword length based on the pointer information; traverse the interval of the decoded data corresponding to the codeword length, restore the binary code to the character, match length or match distance, and after all binary codes are restored, the current data is decompressed into the format of character and match length plus match distance;

[0024] S33. Send these two parts of data into the LZ77 decompression module, traverse the matching length and matching distance, and restore them to the corresponding strings, thus completing the LZ77 decompression and generating the original data.

[0025] The beneficial effects of this invention are as follows: In terms of compression, this invention employs dynamic compression, which significantly improves the compression ratio and saves storage space. Simultaneously, it dynamically generates a Huffman decoding table suitable for hardware decompression, eliminating the need to store multiple sets of compressed Huffman trees and decoding tables simultaneously in the hardware storage module, thus saving substantial storage resources. The decompression process utilizes the decoding table generated during compression, employing a dynamic lookup method to avoid word-by-word lookups in the decoding table, thereby improving decompression efficiency and accuracy. Attached Figure Description

[0026] To make the objectives, technical solutions, and beneficial effects of this invention clearer, the following figures are provided for illustration:

[0027] Figure 1 This is a schematic diagram of a hardware compression and decompression method based on a Huffman decoding table according to an embodiment of the present invention;

[0028] Figure 2This is a flowchart illustrating the steps of a hardware compression method based on a Huffman decoding table according to an embodiment of the present invention.

[0029] Figure 3 This is a schematic diagram of the Huffman decoding table structure for a hardware compression and decompression method based on the Huffman decoding table according to an embodiment of the present invention.

[0030] Figure 4 This is a flowchart illustrating the steps of a hardware decompression method based on a Huffman decoding table according to an embodiment of the present invention. Detailed Implementation

[0031] The preferred embodiments of the present invention will now be described in detail with reference to the accompanying drawings.

[0032] See Figure 1 The diagram shows a hardware compression and decompression method based on a Huffman decoding table according to an embodiment of the present invention, which includes the following steps:

[0033] S10 performs compression on the PC, and during the compression process, it dynamically generates the Huffman decoding table and the compressed data.

[0034] S20, burn the above two parts into the hardware storage device;

[0035] S30, the decompression process is carried out in the embedded system. The embedded system will send the corresponding instructions and addresses to the hardware decompression module. The decompression module will address according to the corresponding instructions and look up the Huffman decoding table to decompress the compressed data.

[0036] S40, the decompressed data is sent to the embedded system.

[0037] The compression process is completed on the PC. Here, the invention improves the open-source Gzip compression code so that it generates a dynamic Huffman decoding table suitable for the hardware decompression module of the invention during the compression process and stores it in the corresponding format. Thus, the decoding table can be used directly during the decompression process, which greatly improves the decompression efficiency and accuracy.

[0038] Compressed Specific Implementation Examples

[0039] See Figure 2 The compression method includes the following steps:

[0040] S11, read the input raw data into a 4k sliding window;

[0041] S12, in the sliding window, every 3 bytes are input into a hash function for calculation and used as the hash value; strings with the same first three bytes have the same hash value, and they are chained together to form a hash linked list, i.e., a dictionary;

[0042] S13: Update the dictionary information every three new bytes of input. First, add yourself to the corresponding chain. Then, check if the current input string has a matching string by searching the hash chain. Iterate through the hash chain to see if it is the best matching string. If it is a matching string, use a pair of matching information (a combination of distance and matching length) to describe the string. After processing all the data, the original data is transformed into two forms: characters (literals) and matching distance + matching length (distance + length), thus completing LZ77 compression.

[0043] S14: Dynamically count the frequency of unmatched characters and the sum of matching distance and matching length during LZ77 compression, and build a dynamic Huffman tree based on the results. Characters with a frequency higher than a preset range are encoded with shorter code lengths, while characters with a frequency lower than a preset range are encoded with longer code lengths. The code length is determined by the frequency of unmatched characters and the sum of matching distance and matching length, arranged in descending order of frequency. High-frequency characters are encoded with shorter codes, such as 001, while low-frequency characters are encoded with longer codes, such as 01010101. This is a dynamic process based on the current compressed data. Short and long code lengths are relative, and different compressed data have different code lengths.

[0044] S15, dynamically generate a Huffman decoding table suitable for hardware decompression, according to the format of codeword length, starting address pointer, and decompression data storage range;

[0045] S16: Encode the LZ77 compressed result using a dynamically generated Huffman tree to generate the final compressed result. Store the compressed result in a Huffman decoding table and burn the compressed result into the hardware storage device.

[0046] Undoubtedly, the dynamically generated Huffman tree and Huffman decoding table are the most suitable for this compression, so its compression ratio must be the highest.

[0047] For the Huffman decoding table generated for hardware decompression and the data storage structure of the compressed data, please refer to [link / reference]. Figure 3Let 0x00 be the starting address. The codeword length of the character or match is stored starting at address 0x00. The dynamically generated Huffmantree stores the codewords in ascending order of length, storing as many codeword lengths as there are available.

[0048] Starting at address 0x10, a pointer to the address corresponding to the codeword length of the character or match is stored. This pointer points to the starting address where the decoded data of the character or match length is stored, corresponding to the current codeword length.

[0049] Address 0x2000 begins storing the decoded data of the character or match length, stored sequentially in ascending order of codeword length;

[0050] Starting at address 0x4000, the codeword length of the matching distance is stored. The dynamically generated Huffman tree stores the codeword lengths in ascending order, and stores as many codeword lengths as there are available.

[0051] Address 0x4010 is used to store the address pointer corresponding to the codeword length. This address pointer points to the starting address of the data after decoding the character or match length corresponding to the current codeword length.

[0052] The data after decoding the matching distance is stored starting at address 0x6000, and is stored sequentially in order of codeword length from shortest to longest.

[0053] The Huffman decoding table is dynamically generated based on the hardware storage format and address after dynamically calculating the LZ77 compression results, and it corresponds to a set of tables for the current compressed data. Therefore, there is no need to select from multiple tables, which improves decompression efficiency and makes it more practical and efficient.

[0054] To achieve the above objectives, the present invention also provides a hardware decompression method based on a Huffman decoding table. Corresponding to the compression method described above, the data to be processed is a complete compressed data packet. Each different binary code may correspond to a character, matching length, or matching distance. When a binary code is encountered, the table is looked up according to the structure of the dynamic Huffman decoding table generated during the compression process. Since this decoding table avoids encoding characters, matching lengths, or distances that have not appeared, and adopts a storage structure suitable for hardware, the lookup efficiency is higher.

[0055] For detailed decompression instructions, please refer to [link / reference]. Figure 4 This includes the following steps:

[0056] S31 uses a dynamic lookup method to find the corresponding pointer information for the current binary codeword length;

[0057] S32, find the starting address of the decoded data corresponding to the codeword length based on the pointer information; traverse the interval of the decoded data corresponding to the codeword length, restore the binary code to the character, match length or match distance, and after all binary codes are restored, the current data is decompressed into the format of character and match length plus match distance;

[0058] S33. Send these two parts of data into the LZ77 decompression module, traverse the matching length and matching distance, and restore them to the corresponding strings, thus completing the LZ77 decompression and generating the original data.

[0059] The compression process of this invention employs a dynamic compression method. The compression process can dynamically generate a Huffman decoding table in hardware storage format and store the decoding table after the compression result. Only one set of decoding tables needs to be stored, and the decompression method uses dynamic lookup.

[0060] Finally, it should be noted that the above preferred embodiments are only used to illustrate the technical solutions of the present invention and are not intended to limit it. Although the present invention has been described in detail through the above preferred embodiments, those skilled in the art should understand that various changes can be made to it in form and detail without departing from the scope defined by the claims of the present invention.

Claims

1. A hardware compression method based on a Huffman decoding table, characterized in that, Includes the following steps: S11, read the input raw data into a sliding window; S12, in the sliding window, every 3 bytes are input into a hash function for calculation and used as the hash value; strings with the same first three bytes have the same hash value, and they are chained together to form a hash linked list, i.e., a dictionary; S13, update the dictionary and check if there is a matching string. If a matching string exists, check if it is the best match. S14: Dynamically count the frequency of unmatched characters and the sum of matching distance and matching length during LZ77 compression, and build a dynamic Huffman tree based on the results of the dynamic statistics. Characters that appear more frequently than the preset range are encoded with shorter code lengths, and characters that appear less frequently than the preset range are encoded with longer code lengths. S15, dynamically generate a Huffman decoding table suitable for hardware decompression, according to the format of codeword length, starting address pointer, and decompression data storage range; S16: Encode the LZ77 compressed result using a dynamically generated Huffman tree to generate the final compressed result. Store the compressed result in a Huffman decoding table and burn the compressed result into the hardware storage device. S13 specifically includes updating the dictionary information every three new input bytes. First, the original bytes are added to the corresponding chain. Then, the hash chain is searched to check if the current input string has a matching string. The chain is traversed to see if it is the best matching string. If it is a matching string, a pair of matching information is used to describe the string. After all the data is processed, the original data becomes two forms: character literal and matching distance + matching length, i.e., distance + length, which completes LZ77 compression. The structure of the Huffman decoding table is as follows: Let 0x00 be the starting address. The codeword length of the character or match length is stored starting from address 0x00. The dynamically generated Huffman tree is stored in order of codeword length from shortest to longest. The number of codeword lengths is stored as many as there are. Starting at address 0x10, a pointer to the address corresponding to the codeword length of the character or match is stored. This pointer points to the starting address where the decoded data of the character or match length is stored, corresponding to the current codeword length. Address 0x2000 begins storing the decoded data of the character or match length, stored sequentially in ascending order of codeword length; Starting at address 0x4000, the codeword length of the matching distance is stored. The dynamically generated Huffman tree stores the codeword lengths in ascending order, and stores as many codeword lengths as there are available. Address 0x4010 is used to store the address pointer corresponding to the codeword length. This address pointer points to the starting address of the data after decoding the character or match length corresponding to the current codeword length. The data after decoding the matching distance is stored starting at address 0x6000, and is stored sequentially in order of codeword length from shortest to longest. The Huffman decoding table is dynamically generated based on the hardware storage format and address after dynamic statistics of the LZ77 compressed results, and it is a set of tables corresponding to the current compressed data.

2. The hardware compression method based on the Huffman decoding table according to claim 1, characterized in that, The matching information is a combination of distance and matching length.

3. The hardware decompression method based on the Huffman decoding table according to any one of claims 1-2, characterized in that, The decompression method includes the following steps: S31 uses a dynamic lookup method to find the corresponding pointer information for the current binary codeword length; S32, find the starting address of the decoded data corresponding to the codeword length based on the pointer information; traverse the interval of the decoded data corresponding to the codeword length, restore the binary code to the character, match length or match distance, and after all binary codes are restored, the current data is decompressed into the format of character and match length plus match distance; S33. Send these two parts of data into the LZ77 decompression module, traverse the matching length and matching distance, and restore them to the corresponding strings, thus completing the LZ77 decompression and generating the original data.