Image hash lossless compression method and image hash lossless reconstruction method

By combining cross-format parsing with reverse derivation of native compression parameters with a deep lossless compression algorithm, the problem of limited compression ratio and inconsistent hash values ​​in existing technologies is solved, and an efficient and reliable image data storage and transmission solution is achieved.

CN121547599BActive Publication Date: 2026-06-12SANDSTONE DATA TECH CO LTD +1

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
SANDSTONE DATA TECH CO LTD
Filing Date
2026-01-20
Publication Date
2026-06-12

Smart Images

  • Figure CN121547599B_ABST
    Figure CN121547599B_ABST
Patent Text Reader

Abstract

The application discloses an image hash lossless compression method and an image hash lossless reconstruction method. The compression method comprises the following steps: performing format analysis on an original image to separate structured metadata and pixel data; analyzing and reconstructing internal parameters of a native compression algorithm when decoding the pixel; respectively performing deep lossless re-compression on the metadata and the decoded pixel data; and finally packaging into a hash lossless compression file. The reconstruction method is a reverse process, comprising the following steps: disassembling the compression file to obtain each module; restoring the pixel data; using the reconstruction parameters to re-encode to generate a standard pixel code stream; and decompressing and restoring the metadata to reorganize into an original image file. The method realizes efficient compression across formats, significantly improves the compression ratio and processing speed on the premise of ensuring that the hash values of the reconstructed file and the original file are completely consistent.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to image compression and reconstruction technology, and in particular to an image hash lossless compression method and an image hash lossless reconstruction method. Background Technology

[0002] In storage and related applications, image data is being generated and accumulated on an unprecedented scale. For industries that rely on massive amounts of images for analysis, detection, and recognition, such as image retrieval, digital forensics, medical imaging, and record management, maintaining the integrity and originality of image data is crucial. Any minute pixel-level modification can lead to deviations in analysis results, loss of legal validity, or diagnostic errors. Therefore, these applications strictly require that images achieve true "bit-level" or "hash-level" losslessness during storage and transmission, meaning that the decoded file must be completely consistent with the original file.

[0003] To address storage cost pressures, traditional image compression techniques are typically categorized into lossy and lossless types. While lossy compression can significantly reduce file size, it irreversibly loses information, failing to meet the stringent data integrity requirements of the aforementioned fields. Standard lossless compression techniques, while preserving information, often have limited compression ratios, making it difficult to significantly reduce the overall storage overhead of massive images while maintaining hash consistency. Furthermore, existing general-purpose lossless compression schemes usually do not deeply analyze the internal format structure of images (such as PNG's filtering strategies and JPEG's Huffman tables), thus failing to perform in-depth optimization for encoding redundancy specific to certain formats, and also making it difficult to accurately reconstruct the encapsulation structure and internal parameters of the compressed file to be completely identical to the original.

[0004] Therefore, the core challenge facing existing technologies lies in how to design a compression method that can achieve compression efficiency far exceeding that of conventional lossless compression to reduce storage costs, while ensuring that the decompressed and restored image file is completely consistent with the original file at the hash value level, thereby meeting the practical needs of professional fields that have strict requirements for the immutability of image data.

[0005] It should be noted that the information disclosed in the background section above is only for understanding the background of this application, and therefore may include information that does not constitute prior art known to those skilled in the art. Summary of the Invention

[0006] The main objective of this invention is to overcome the deficiencies in the aforementioned background technology and provide an image hash lossless compression method and an image hash lossless reconstruction method.

[0007] To achieve the above objectives, the present invention adopts the following technical solution:

[0008] In a first aspect of the present invention, an image hash lossless compression method includes the following steps:

[0009] A1. Input Image Format Parsing and Metadata Extraction: The raw image data to be processed is parsed in container format, and all structured metadata except for pixel data is extracted and serialized.

[0010] A2. Original compression parameter reconstruction: While decoding pixel data, analyze the original image compression algorithm to obtain its internal encoding parameters, and perform consistency verification to obtain the reconstructed set of original compression parameters.

[0011] A3. Lossless recompression of pixel data and metadata: The fully decoded pixel data is recompressed using a deep lossless compression algorithm; the structured metadata is compressed using a combined compression algorithm based on dictionary encoding and context modeling.

[0012] A4. Encapsulation of compressed outputs: The compressed structured metadata, the recompressed pixel data, and the reconstructed original compression parameter set are encapsulated to form the final hash lossless compressed image file.

[0013] In a second aspect of the present invention, an image hash lossless reconstruction method includes the following steps:

[0014] B1. Compressed Image Decomposition: The hash lossless compressed image file is parsed to decompose the compressed structured metadata, highly compressed pixel data, and the reconstructed original compression parameter set.

[0015] B2. Pixel data restoration: Decompress the highly compressed pixel data to restore it to the original uncompressed pixel data;

[0016] B3. Original format reconstruction: Based on the recovered uncompressed pixel data and the reconstructed original compression parameter set, the encoding process of the original image format is re-executed to generate a pixel compressed bitstream that conforms to the original format specification;

[0017] B4. Image file recovery: Decompress and deserialize the compressed structured metadata, and reassemble it with the generated pixel compressed bitstream into a standard format image file to restore the original image with consistent hash values.

[0018] The present invention has the following beneficial effects:

[0019] The image hash lossless compression and reconstruction method provided by this invention can efficiently compress images while ensuring that the original file can be completely restored after decoding, achieving lossless restoration at the hash value level, thereby meeting the strict requirements of storage-sensitive industries for the integrity of image data.

[0020] The core advantages of this invention stem from its multi-layered technological innovations. First, by constructing a unified and scalable parsing and recovery framework, it can perform differentiated and precise decomposition and reconstruction for various heterogeneous image formats such as PNG, JPEG, TIFF, and BMP, achieving broad cross-format compatibility. Second, it innovatively employs a probabilistic statistical model and a dynamically updated local cache table mechanism, which can automatically search for and infer the image's internal compression parameters without requiring original encoder information, thereby reliably achieving reversible reconstruction of the original compression process—a key to achieving hash-based lossless recovery. Furthermore, by abstracting all non-pixel structured data of the original image into a serialized data stream and processing it using a combined compression technique combining dictionary encoding and two-stage context modeling, it significantly improves the efficiency of metadata compression and the scalability of cross-format processing. These technologies work together to provide an efficient solution for the storage and transmission of massive image data while ensuring absolute data integrity.

[0021] The method of this invention exhibits significant advantages in storage savings and processing efficiency. Practical tests show that for large-scale JPEG image datasets, this method can achieve considerable size reduction in a very short compression time and fully restore the original image at a faster decoding speed; it also demonstrates efficient compression capabilities and fast decoding performance for high-resolution PNG image sets. This proves that the method possesses excellent compression ratio and fast encoding / decoding performance while ensuring hash consistency.

[0022] Other beneficial effects of the embodiments of the present invention will be further described below. Attached Figure Description

[0023] Figure 1 This is a flowchart of the image hash lossless compression method according to an embodiment of the present invention.

[0024] Figure 2 This is a flowchart of an image reconstruction method according to an embodiment of the present invention.

[0025] Figure 3 This is a screenshot of the command line output for querying MD5 checksum and file size in an embodiment of the present invention.

[0026] Figure 4 This is a screenshot of the image hash lossless compression parameter configuration and operation log of an embodiment of the present invention.

[0027] Figure 5 This is a screenshot of the command line output for querying the size of compressed files in an embodiment of the present invention.

[0028] Figure 6 This is a screenshot of the image hash lossless reconstruction parameter configuration and operation log of an embodiment of the present invention.

[0029] Figure 7 This is a screenshot of the command line output for querying the size of the reconstructed image file in an embodiment of the present invention.

[0030] Figure 8 This is a screenshot of the command line output for batch compression-decoding-diff verification of the dataset in an embodiment of the present invention.

[0031] Figure 9 This is a screenshot of the command line output for batch compression-decoding-diff verification of the dataset in an embodiment of the present invention. Detailed Implementation

[0032] The embodiments of the present invention will be described in detail below. It should be emphasized that the following description is merely exemplary and not intended to limit the scope and application of the present invention.

[0033] Furthermore, the terms "first" and "second" are used for descriptive purposes only and should not be construed as indicating or implying relative importance or implicitly specifying the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature. In the description of embodiments of the present invention, "a plurality of" means two or more, unless otherwise explicitly specified.

[0034] This invention aims to address the problem that existing lossless image compression technologies struggle to significantly improve compression ratios while ensuring hash-level data consistency. It proposes an image processing method that combines cross-format parsing, reverse derivation of native compression parameters, and deep recompression. This method achieves efficient compression and fast decoding of multiple image formats while ensuring that the reconstructed file is completely consistent with the original file, thus providing a reliable storage optimization solution for fields with strict requirements for data integrity.

[0035] See Figure 1 This invention provides an image hash lossless compression method, comprising the following steps:

[0036] Step A1, Input Image Format Parsing and Metadata Extraction: Parse the container format of the raw image data to be processed, and extract and serialize all structured metadata except for pixel data;

[0037] In some embodiments, step A1, the container format parsing includes: for PNG format, parsing and extracting key data blocks, and serializing all modules except for pixel compression data; for JPEG format, extracting format header data including Huffman tables and scan header information; for TIFF and BMP formats, parsing their format headers and related structure contents.

[0038] Step A2, Original Compression Parameter Reconstruction: While decoding pixel data, analyze the original image compression algorithm to obtain its internal encoding parameters, and perform consistency verification to obtain the reconstructed set of original compression parameters.

[0039] In some embodiments, step A2, the reconstruction of the original compression parameters includes: for PNG format, during the decoding of pixel data stream, searching and estimating the compression filter type and prediction mode parameters based on a probabilistic statistical model and a local cache table, and verifying the correctness of the inference through a verification mechanism; for JPEG, TIFF, and BMP formats, reverse derivation is performed based on the quantization table, Huffman table, or format header parameters already existing in the file to recover the compression parameters of the original encoder, and the consistency of the recovered bitstream structure is verified; abnormal data and redundant data in the bitstream are identified and merged into the reconstruction parameters.

[0040] In some embodiments, the probability statistical model and the local cache table update mechanism include: weighting the compression parameters based on image size and data block size, constructing and maintaining a weighted compression parameter cache table; during the compression process, searching and verifying the parameters in the cache table according to their priority, and dynamically updating the usage frequency weight of each parameter in the cache table according to the verification results.

[0041] Step A3, Lossless recompression of pixel data and metadata: The fully decoded pixel data is recompressed using a deep lossless compression algorithm; the structured metadata is compressed using a combined compression algorithm based on dictionary encoding and context modeling.

[0042] In some embodiments, in step A3, when compressing structured metadata, a combination of a variant LZ77 algorithm and two-stage context modeling is used for compression; when compressing pixel data, a depth-based lossless compression algorithm independent of image format structure is used to obtain a higher compression ratio than the original format.

[0043] Step A4, Encapsulation of compressed outputs: Encapsulate the compressed structured metadata, the recompressed pixel data, and the reconstructed original compression parameter set to form the final hash lossless compressed image file.

[0044] In some embodiments, the encapsulation process of step A4 may specifically include: first constructing a format header containing data block type identifiers, encapsulating the compressed structured metadata, the recompressed pixel data, and the reconstructed original compression parameter set into independent data blocks, with each data block corresponding one-to-one with the identifier in the format header, and then integrating all parts according to preset rules to form a hash lossless compressed image file that can support accurate disassembly of the corresponding modules during decoding.

[0045] See Figure 2This invention also provides an image hash lossless reconstruction method, comprising the following steps:

[0046] Step B1, Compressed Image Decomposition: The hash lossless compressed image file is parsed to decompose the compressed structured metadata, highly compressed pixel data, and the reconstructed original compression parameter set.

[0047] In some embodiments, step B1 specifically includes: parsing the format header and data block layout of the compressed file, and separating data blocks that store compressed metadata, compressed pixel data and compressed parameters respectively.

[0048] Step B2, Pixel Data Restoration: Decompress the highly compressed pixel data to restore it to the original uncompressed pixel data.

[0049] In some embodiments, step B2, the pixel data restoration specifically includes: using a decompression algorithm corresponding to the depth lossless compression algorithm used in the encoding process to decode the pixel data module and recover the complete original pixel matrix.

[0050] Step B3, Native Format Reconstruction: Based on the recovered uncompressed pixel data and the reconstructed original compression parameter set, the encoding process of the original image format is re-executed to generate a pixel compressed bitstream that conforms to the original format specification.

[0051] In some embodiments, step B3 specifically includes: using the reconstructed original compression parameter set, including compression strategy, level, and filter parameters, to re-encode the recovered uncompressed pixel data in a lossless manner that conforms to the original format specification.

[0052] Step B4, Image File Recovery: Decompress and deserialize the compressed structured metadata, and reassemble it with the generated pixel compressed bitstream into a standard format image file to restore the original image with consistent hash values.

[0053] In some embodiments, step B4 specifically includes: decoding the compressed non-pixel metadata module using a decompression algorithm corresponding to the encoding stage to obtain a serialized metadata binary stream; deserializing the binary stream to restore the original image format structure module; and assembling the restored format structure module with the pixel compressed bitstream generated in step B3 according to the original image format specification to generate the final image file.

[0054] The image hash lossless compression and lossless reconstruction methods proposed in this invention, by constructing a unified cross-format parsing and reconstruction framework, can perform refined processing on various image formats such as PNG, JPG, TIFF, and BMP. While achieving significant storage savings, they ensure lossless data recovery at the hash level. Its core advantage lies in the use of a parameter inverse derivation mechanism based on a probabilistic model and dynamic caching, which accurately reconstructs the original compression parameters without relying on the original encoder. Combined with a deeply optimized lossless recompression algorithm, it achieves compression efficiency that surpasses conventional lossless compression while maintaining the bit consistency of the original file. Furthermore, it significantly improves the performance of batch processing through an intelligent caching strategy, thus providing an efficient and reliable solution for storage applications with strict requirements for data integrity.

[0055] The following further describes the implementation methods and experimental verification of specific embodiments of the present invention.

[0056] This invention proposes an image hash lossless compression method and a corresponding lossless reconstruction method, aiming to solve the problem of limited compression ratio in existing technologies while ensuring data integrity. The image hash lossless compression method mainly includes: first, performing unified cross-format parsing on the input image to separate pixel data from structured metadata; second, reversibly simulating the encoding process by reverse analyzing and reconstructing the core parameters of the original compression algorithm during pixel decoding; and finally, performing lossless recompression on both the metadata and the original pixel data after deep optimization, and encapsulating all information. The corresponding decoding process reverses the above steps, accurately reconstructing an image with a hash value completely consistent with the original file using the saved parameters and data.

[0057] Figure 1 and Figure 2 The flowcharts are for image hash lossless compression and lossless reconstruction methods, respectively. Right-angled rectangles represent entities of certain data, and rounded rectangles represent actions that perform certain tasks.

[0058] like Figure 1 As shown, the encoding chain of the image hash lossless compression method includes the following processes:

[0059] Input Image and Format Structure Parsing. This step acquires the raw image data to be processed and parses its container format step by step to extract and serialize all structured metadata except for pixel data. Specifically, for PNG format, key data blocks such as IHDR and IDAT are parsed and extracted, and all modules except for pixel compression data are serialized; for JPEG format, Huffman tables including DHT and SOS and scan header information are extracted; and for other formats such as TIFF and BMP, their format headers and related structure contents are parsed.

[0060] The raw image data is first input to the decoder. Then, it enters the parallel decoding and original compression algorithm parameter reconstruction stage. This stage is executed by the decoder, which extracts and separates image metadata, decodes pixel data, and analyzes the image's native compression algorithm in real time to obtain its internal encoding parameters. For PNG format, during the decoding of the IDAT data stream, intelligent search and estimation of reconstruction parameters such as filter type and prediction mode are performed based on a probabilistic statistical model and a local cache table. The correctness of the inference is verified by a CRC32 check mechanism, and the local cache table is then dynamically updated. For JPEG, TIFF, and BMP formats, the original encoder's compression parameters are directly deduced from the existing quantization table, Huffman table, or format header parameters in the file, and the consistency of the recovered bitstream structure is checked. Furthermore, this step also calculates abnormal data (such as illegal quantization tables) and redundant data unrelated to the image format in the bitstream and merges them into the reconstruction parameter set. These reconstruction parameters are ultimately compressed using Brotli and saved as the original image compression strategy for complete restoration during decoding.

[0061] The next step is efficient lossless recompression. This step processes two types of data separately: for the separated structured metadata, a combination of the variant LZ77 algorithm and two-stage context modeling is used for compression; for the fully decoded pixel data or DCT data, it is sent to a dedicated lossless compression algorithm module for processing. This module uses a deep lossless compression algorithm that is independent of the format structure, aiming to obtain compressed data that is more compressed than the original pixel data, while ensuring that pixel information is compressed to the maximum extent in a lossless manner.

[0062] Finally, the assembler performs the encapsulation and compression of the product. The assembler encapsulates the three types of information obtained in the previous steps—namely, all the serialized and compressed non-pixel modules of the original image, the recompressed pixel data modules, and the reconstructed set of parameters of the original compression algorithm (i.e., the original image compression strategy)—to form the final hash-free compressed image file.

[0063] like Figure 2 As shown, the decoding link of the image lossless reconstruction method includes the following process:

[0064] First, the compressed image structure is decomposed. That is, the input hash lossless compressed image file is parsed by the decomposer to restore its three core components: image metadata (corresponding to non-pixel modules), compressed pixel data modules, and original image reconstruction strategy and data (corresponding to the reconstruction parameter set of the original compression algorithm).

[0065] The pixel module restoration stage then begins. In this stage, the compressed pixel data module undergoes decompression using a lossless decoder, restoring it to its original, uncompressed state.

[0066] Next, native format reconstruction and pixel recompression are performed. This step inputs the restored image metadata, original pixel data, and the original image reconstruction strategy and data obtained from the decomposition into the reconstruction encoder. Based on this information, the complete encoding process of the original image format is re-executed to generate a pixel compressed bitstream that conforms to the corresponding format specification (such as zlib / deflate stream).

[0067] Finally, metadata deserialization and image restoration are completed. In this stage, non-pixel modules are decompressed and deserialized to restore them to the original structured metadata. Then, this metadata is reassembled with the pixel compressed bitstream generated in the previous step according to the standard image format specification, ultimately achieving the restoration of the original image with completely consistent hash values.

[0068] Taking PNG format as an example, its complete hash-based lossless compression and recovery process is as follows. All pixel data in a PNG file is stored in contiguous IDAT data blocks, which are stored using the zlib format and compressed using the deflate algorithm. A zlib stream typically contains four parts: zlib compression method / flag code (1 byte), additional flags / checksum (1 byte), compressed data blocks (variable length), and checksum (4 bytes).

[0069] A complete PNG compression chain begins with the disassembly of PNG data. The compression software parses the PNG's block structure, primarily extracting the IDAT and IHDR blocks. Basic information such as the image's width, height, and bit depth are obtained from the IHDR blocks for subsequent lossless compression; simultaneously, the deflate compressed data stream is read from the IDAT blocks for subsequent parsing of raw compression parameters. The software completely records all binary data except for the IDAT blocks for later storage of the original non-pixel data.

[0070] The original image compression method is then analyzed. Since the original deflate compressed data stream does not record the specific compression parameters used during zlib encoding (such as buffer size, compression strategy, compression level, memory level, and filter settings), this method reconstructs these parameters through an intelligent search mechanism. While decoding each IDAT block to obtain pixel data, the software attempts to recompress this pixel data using zlib and ensures through memory verification that the compression parameters used can achieve hash-level lossless reconstruction. Specifically, this method combines the semantics of the CMF and FLG fields in the zlib format header (e.g., the correspondence between the FLEVEL field and the compression level) and the distribution patterns of common PNG compression parameters to define a series of priority search paths. For example, it tries different compression strategies in a specific order or prioritizes certain filter parameters based on image features. To improve performance during industrial batch processing, the system builds and maintains a dynamically weighted compression parameter table during operation. This table comprehensively considers factors such as image size and IDAT block size to evaluate and sort various parameters, allowing for the priority use of high-weight parameter combinations for trial and verification when processing subsequent PNG images, thereby significantly improving the efficiency of parameter search and reconstruction. By performing the above operations, an accurate original image reconstruction strategy can be obtained, ensuring that the original file can be perfectly restored during decoding.

[0071] After parsing the compression parameters of the original PNG image, the pixel data compression stage begins. In this stage, based on parameters such as image width, height, bit depth, and number of channels obtained from information like IHDR, the fully decoded original pixel data is further compressed using a deep lossless compression algorithm. This process is independent of any specific image format structure and aims to achieve a more extreme compression ratio than the original PNG file format, ensuring that pixel information is compressed to the maximum extent without loss, ultimately resulting in a deeply compressed pixel data module.

[0072] Next comes the saving of the non-pixel data from the original image. The system marks and extracts all data from the PNG file except for the IDAT block, and attempts to compress it using the Brotli algorithm based on the data volume. It then selects the smaller file size between the compressed and original data for saving. Simultaneously, the system precisely records the compression method used and the exact location of that data within the original PNG file, ultimately resulting in the compressed non-pixel data module.

[0073] Finally, the compressed image is assembled. The resulting compressed file contains a format header and several data blocks, which are the depth-compressed pixel data blocks generated in the previous steps, the original image reconstruction strategy blocks, and the compressed non-pixel data blocks. These parts are assembled according to a predetermined format to obtain the final hash-based lossless compressed file.

[0074] The corresponding decoding process is the reverse of the encoding process. First, the compressed file is decompressed to extract basic image information (such as width, height, number of channels, bit depth, etc.), as well as deeply compressed pixel data blocks, original image reconstruction strategy blocks, and compressed non-pixel data blocks. Second, the deeply compressed pixel data blocks are decompressed to recover the original RGB pixel bitstream. Then, using the extracted original image reconstruction strategy (i.e., the recovered compression parameters), the RGB pixel bitstream is re-compressed and encoded according to the PNG standard to generate the pixel data portion. Finally, the compressed non-pixel data blocks are decompressed and restored, and merged with the reconstructed pixel portion to completely recover the original PNG file, ensuring that the hash values ​​are completely consistent.

[0075] In the above process, the probabilistic statistical model mainly defines the depth-first search order used when searching for PNG compression parameters. The local cache table stores the weighted compression parameters, and its design rules are as follows: the outer layer is a hash table, where the key is a combination of the image width and height, and the corresponding value is a max-heap; this max-heap uses the hash value of the compression parameter as the key and the frequency weight of the parameter as the value. Furthermore, the system uses a linked list structure to manage the first-in, first-out (FIFO) eviction mechanism for cached items.

[0076] Regarding the compression of non-pixel data, the system uniformly adopts the Brotli algorithm, which incorporates a two-stage context modeling technique. For handling redundant data, it should be noted that all non-pixel data is ultimately compressed using Brotli and stored in the compressed file, allowing for complete recovery during decoding. It is important to note that for PNG, BMP, and TIFF formats, non-pixel data is directly read from its binary structure before Brotli compression; however, for JPEG format, an additional serialization and deserialization step is added. Specifically, the structured header information is first serialized into a binary data stream before Brotli compression, and during decoding, decompression and deserialization are required to restore the original structure.

[0077] Specifically, the serialization and deserialization here are as follows: For PNG, TIFF, and BMP formats, the non-pixel binary data is directly compressed using Brotli, and the position coordinates of each data segment in the file are recorded, forming a "coordinates + data" structure. For JPEG format, existing solutions (such as jpeg-lepton) are used to achieve its format parsing and reconstruction.

[0078] Experimental Test

[0079] To verify the practical effect of this invention, a portion of images from the coco2014, Kodak, and clic2025 datasets were selected for compression testing. The file size of the original images and their corresponding MD5 hash values ​​are recorded as follows: Figure 3 As shown.

[0080] Command-line parameters used during the test, such as Figure 4 As shown, `indir` specifies the input folder, `outdir` specifies the output folder, `-v 2` indicates a log level of 2, `threads -1` indicates using the same number of threads as the number of CPU cores, and `lossless_type 8` enables hash-based lossless compression mode. The size of the compressed file is as follows: Figure 5 As shown.

[0081] During the reconstruction phase, decoding is performed using the exact same parameters as during encoding. A screenshot of the reconstruction process is shown below. Figure 6 As shown. The size of the reconstructed image file and its MD5 hash value are as follows. Figure 7 As shown, the image is verified to be completely consistent with the original image, proving the effectiveness of hash lossless recovery.

[0082] To further evaluate the performance of this method in batch processing, large-scale dataset tests were conducted. On the Coco2014 dataset (containing 7819 JPEG images, with a total size of approximately 1.3GB), compression took 27 seconds, the compressed size was approximately 1.1GB, and decoding took 10 seconds. Verification using a file comparison tool confirmed that the contents of the original folder and the reconstructed folder were completely identical. The results are as follows... Figure 8 As shown.

[0083] On the Clic 2025 validation and test sets (containing 62 PNG images, totaling approximately 218MB), compression took 9.4 seconds, resulting in a compressed file size of approximately 152MB. Decoding took 6 seconds. After difference verification, all files in the reconstructed folder were completely identical to those in the original folder, as shown in the results. Figure 9 As shown in Table 1, a summary of the batch compression and decoding performance comparisons for different datasets is presented.

[0084] Table 1 Comparison of batch compression and decoding performance for different datasets

[0085] It should be noted that the lossless compression capability of hashing depends on the reconstructability of the original compression algorithm. For example, some PNG images may use a custom LZ77 compression algorithm instead of the standard zlib, in which case lossless parametric reconstruction may not be possible. For such irreversible compression formats, this method performs decoding and hash consistency checks during compression. Compression is only completed when it is confirmed that the original hash can be fully recovered, thus ensuring the absolute reliability of the reconstruction at the decoding end.

[0086] In summary, this invention provides a lossless image hashing compression and reconstruction method that significantly improves compression efficiency while strictly ensuring data integrity. Key innovative contributions and technical points of this invention include: First, it constructs a unified and scalable cross-format parsing and reconstruction framework, capable of differentiated processing for various heterogeneous image formats such as PNG, JPEG, TIFF, and BMP, achieving refined decomposition and accurate restoration of their formatted structural modules, thus laying the foundation for broad compatibility. Second, it innovatively proposes and implements an automatic compression parameter search and reconstruction mechanism based on a probabilistic statistical model and a dynamic local cache table. This mechanism intelligently infers the internal compression parameters of the image without accessing the original encoder, ensuring the reversible simulation of the original compression process, which is the core technical guarantee for achieving hash-level lossless recovery. Finally, by abstracting all non-pixel structured data of the original image into a unified serialized data stream and processing it using a high-efficiency compression algorithm combining variant LZ77 and two-stage context modeling, the efficiency of metadata compression is significantly improved, further enhancing the scalability and flexibility of the entire technical solution when processing different image formats. These technological innovations are organically combined to form a complete solution that excels in compression ratio, processing speed, and data fidelity.

[0087] This invention also provides a storage medium for storing a computer program, which, when executed, performs at least the methods described above.

[0088] This invention also provides a control device, including a processor and a storage medium for storing a computer program; wherein the processor executes the computer program by performing at least the method described above.

[0089] This invention also provides a processor that executes a computer program, at least performing the methods described above.

[0090] The storage medium can be implemented by any type of non-volatile storage device, or a combination thereof. The non-volatile memory can be read-only memory (ROM), programmable read-only memory (PROM), erasable programmable read-only memory (EPROM), electrically erasable programmable read-only memory (EEPROM), magnetic random access memory (FRAM), flash memory, magnetic surface memory, optical disc or CD-ROM; magnetic surface memory can be disk storage or magnetic tape storage. The storage media described in the embodiments of this invention are intended to include, but are not limited to, these and any other suitable types of memory.

[0091] In the several embodiments provided by this invention, it should be understood that the disclosed systems and methods can be implemented in other ways. The device embodiments described above are merely illustrative. For example, the division of units is only a logical functional division, and in actual implementation, there may be other division methods, such as: multiple units or components can be combined, or integrated into another system, or some features can be ignored or not executed. In addition, the coupling or direct coupling or communication connection between the various components shown or discussed can be through some interfaces, and the indirect coupling or communication connection between devices or units can be electrical, mechanical, or other forms.

[0092] The units described above as separate components may or may not be physically separate. The components shown as units may or may not be physical units, that is, they may be located in one place or distributed across multiple network units. Some or all of the units may be selected to achieve the purpose of this embodiment according to actual needs.

[0093] In addition, in the various embodiments of the present invention, each functional unit can be integrated into one processing unit, or each unit can be a separate unit, or two or more units can be integrated into one unit; the integrated unit can be implemented in hardware or in the form of hardware plus software functional units.

[0094] Those skilled in the art will understand that all or part of the steps of the above method embodiments can be implemented by hardware related to program instructions. The aforementioned program can be stored in a computer-readable storage medium. When the program is executed, it performs the steps of the above method embodiments. The aforementioned storage medium includes various media capable of storing program code, such as mobile storage devices, read-only memory (ROM), random access memory (RAM), magnetic disks, or optical disks.

[0095] Alternatively, if the integrated units of this invention are implemented as software functional modules and sold or used as independent products, they can also be stored in a computer-readable storage medium. Based on this understanding, the technical solutions of the embodiments of this invention, or the parts that contribute to the prior art, can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute all or part of the methods described in the various embodiments of this invention. The aforementioned storage medium includes various media capable of storing program code, such as mobile storage devices, ROM, RAM, magnetic disks, or optical disks.

[0096] The methods disclosed in the several method embodiments provided by this invention can be arbitrarily combined without conflict to obtain new method embodiments.

[0097] The features disclosed in the several product embodiments provided by this invention can be arbitrarily combined without conflict to obtain new product embodiments.

[0098] The features disclosed in the several method or device embodiments provided by the present invention can be arbitrarily combined without conflict to obtain new method or device embodiments.

[0099] The above description, in conjunction with specific preferred embodiments, provides a further detailed explanation of the present invention. It should not be construed that the specific implementation of the present invention is limited to these descriptions. For those skilled in the art, various equivalent substitutions or obvious modifications can be made without departing from the concept of the present invention, and all such modifications, achieving the same performance or application, should be considered within the scope of protection of the present invention.

Claims

1. A lossless image hashing compression method, characterized in that, Includes the following steps: A1. Input Image Format Parsing and Metadata Extraction: The raw image data to be processed is parsed in container format, and all structured metadata except for pixel data is extracted and serialized. The container format parsing includes: for PNG format, parsing and extracting key data blocks, and serializing all data blocks except for pixel compression data; for JPEG format, extracting format header data including Huffman tables and scan head information; for TIFF and BMP formats, parsing their format headers and related structure contents. A2. Original Compression Parameter Reconstruction: While decoding pixel data, the original image compression algorithm is analyzed to obtain its internal encoding parameters, and consistency verification is performed to obtain the reconstructed original compression parameter set. For PNG format, during the decoding of the pixel data stream, the compression filter type and prediction mode parameters are searched and estimated based on a probabilistic statistical model and a local cache table, and the correctness of the inference is verified through a verification mechanism. The update mechanism of the probabilistic statistical model and the local cache table includes: weighting the compression parameters based on image size and data block size, constructing and maintaining a weighted compression parameter cache table; during compression, searching and verifying the parameters according to their priority in the cache table, and dynamically updating the usage frequency weight of each parameter in the cache table based on the verification results; for JPEG format, reverse derivation is performed based on the quantization table and Huffman table already existing in the file; for TIFF and BMP formats, reverse derivation is performed based on the format header parameters already existing in the file to recover the original encoder's compression parameters, and consistency verification is performed on the recovered bitstream structure; abnormal and redundant data in the bitstream are identified and merged into the reconstructed parameter set for complete restoration during decoding. A3. Lossless recompression of pixel data and metadata: The fully decoded pixel data is recompressed using a lossless compression algorithm; the structured metadata is compressed using a combined compression algorithm based on dictionary encoding and context modeling. A4. Encapsulation of compressed outputs: The compressed structured metadata, the recompressed pixel data, and the reconstructed original compression parameter set are encapsulated to form the final hash lossless compressed image file.

2. A method for image hashing lossless reconstruction, reconstructing a hash-based lossless compressed image file obtained by the image hashing lossless compression method of claim 1, characterized in that, Includes the following steps: B1. Compressed Image Decomposition: The hash lossless compressed image file is parsed to decompose the compressed structured metadata, highly compressed pixel data, and the reconstructed original compression parameter set; among them, the format header and data block layout of the compressed file are parsed to separate the data blocks that store compressed metadata, compressed pixel data and compression parameters respectively. B2. Pixel Data Restoration: Decompress the highly compressed pixel data to restore the original uncompressed pixel data; wherein, a decompression algorithm corresponding to the lossless compression algorithm used in the encoding process is used to decode the pixel data blocks to restore the complete original pixel matrix; B3. Original Format Reconstruction: Based on the recovered uncompressed pixel data and the reconstructed original compression parameter set, the encoding process of the original image format is re-executed to generate a pixel compressed bitstream that conforms to the original format specification; wherein, using the reconstructed original compression parameter set, including compression strategy, level, and filter parameters, the recovered uncompressed pixel data is re-encoded with lossless compression that conforms to the original format specification. B4. Image File Recovery: The compressed structured metadata is decompressed and deserialized, and then reassembled with the generated pixel compressed bitstream into a standard format image file to restore the original image with consistent hash values. Specifically, the compressed non-pixel metadata blocks are decoded using a decompression algorithm corresponding to the encoding stage to obtain a serialized metadata binary stream. This binary stream is then deserialized to restore the original image format structure module. The restored format structure module is then assembled with the pixel compressed bitstream generated in step B3 according to the original image format specification to generate the final image file.