An Enhanced QOI Encoder Implemented on FPGA
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- UNIV OF ELECTRONICS SCI & TECH OF CHINA
- Filing Date
- 2024-12-20
- Publication Date
- 2026-06-30
AI Technical Summary
然而,现有的QOI编码实现主要依赖于软件平台,其性能受限于通用处理器的计算能力和内存带宽
Smart Images

Figure CN119729000B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of lossless image encoding, specifically designing an enhanced QOI encoder based on FPGA implementation. The encoder is based on an enhanced QOI encoding strategy and aims to improve the efficiency and throughput of QOI image encoding through programmable hardware circuits. Background Technology
[0002] With the rapid development of digital image technology, image coding technology has become an important part of the field of digital image processing. QOI (Quite OK Image), as an emerging lossless image coding algorithm, has shown great potential in image storage and transmission due to its simplicity, efficiency, and fast encoding and decoding speed. However, existing QOI encoding implementations mainly rely on software platforms, and their performance is limited by the computing power and memory bandwidth of general-purpose processors. Therefore, exploring the hardware implementation of QOI encoding strategies is of great significance for improving the speed and efficiency of image coding. Summary of the Invention
[0003] This invention proposes a hardware implementation method for a QOI image encoding strategy. The encoding strategy is derived from the patent "An Improved QOI Image Encoding Strategy," and is referred to as the Enhanced QOI Encoding Strategy in this invention. Based on an FPGA-implemented Enhanced QOI encoder, it converts the image pixel stream (RGB888, 24-bit true color, pixels containing red, green, and blue colors, each represented by 8 bits) into an Enhanced QOI encoded data stream. Both the pixel stream and the encoded data stream conform to the AXI Stream bus interface protocol, with a bus data width of 64 bits. The Enhanced QOI encoder mainly consists of an input bit-width transformation, a JPEG-LS predictor, an Enhanced QOI encoding unit, a coding group FIFO, a compact bit-width transformation, and a keep signal specification module. Its hardware architecture is as follows: Figure 1 As shown.
[0004] The input bit width conversion module converts the input pixel stream from 64 bits to 48 bits, enabling parallel input of two pixels.
[0005] Both the JPEG-LS predictor and the enhanced QOI coding unit employ a pipelined design, enabling the encoding of two pixels per clock cycle.
[0006] The predictor calculates the predicted values of two pixels according to the rules, and inputs them together with the original pixel values into the enhanced QOI coding unit for enhanced QOI encoding, outputting a 9-byte encoding group, with a maximum of 9 valid bytes in the encoding group.
[0007] The FIFO of the encoding group serves as a buffer for the encoding group data.
[0008] The compact bit-width transformation module generates two 8-byte outputs from a fully valid 9-byte encoding group.
[0009] The keep signal specification module compacts the AXIS encoded data stream and removes null bytes. Attached Figure Description
[0010] Figure 1 To enhance the hardware architecture of the QOI encoder;
[0011] Figure 2 Example of bit-width transformation for input pixel stream (image resolution is 24*24);
[0012] Figure 3 For JPEG-LS prediction rules;
[0013] Figure 4 To enhance the composition of the 9-byte encoded group output by the QOI encoding unit;
[0014] Figure 5 It demonstrates two possible cases for the 9-byte encoding group;
[0015] Figure 6 This is a preliminary shaping diagram of a 9-byte encoding group;
[0016] Figure 7 This is a schematic diagram of a 16-byte ring register buffer. Detailed Implementation
[0017] To further clarify the design considerations and advantages of enhanced QOI encoders, combined with Figure 1 The encoder architecture shown will be explained one by one, including the functions and design ideas of each sub-module.
[0018] The encoder consists of an input bit-width transform, a JPEG-LS predictor, an enhanced QOI coding unit, a coding group FIFO, a compact bit-width transform, and a keep signal specification.
[0019] Input bit-width transformation converts the data bit width of the input pixel stream from 64 bits to 48 bits. Taking a 24×24 resolution image as an example, the bit-width transformation of its input pixel stream is as follows: Figure 2 As shown, there are a total of 216 data transmissions on the input AXIS bus, and there are no empty bytes in the data. As long as the image width × height is divisible by 8, it can be ensured that all transmissions on the input AXIS are valid bytes.
[0020] After the bit width is converted to 48 bits (2 pixels in parallel), the transmission becomes 288 records. The input bit width conversion can be implemented using a state machine, controlling the ready signal of the input AXIS. Every two 64-bit data inputs are converted into three 48-bit data outputs.
[0021] The predictor processes 2 pixels at a time, performing operations such as... Figure 3 The simple nonlinear prediction shown generates predicted values for the corresponding locations.
[0022] The predicted value at position x is determined based on the values of surrounding pixels. When pixels a, b, and c do not exist, i.e., x is located at the top left corner of the image, and the first pixel of the image is being encoded, the predicted value x is {0, 0, 0}. When pixels b and c do not exist, x is located in the first row of the image, and the predicted value x is the value of pixel a. When a and c do not exist, x is located in the first column of the image, and the predicted value x is the value of pixel b. In general, if c is greater than or equal to the maximum value of a and b, the predicted value x is the minimum value of a and b; if c is less than or equal to the minimum value of a and b, the predicted value x is the maximum value of a and b; if neither of the above conditions is met, the predicted value x is equal to a + bc.
[0023] The predictor is a fully pipelined data stream processing unit with a total of 3 pipelines. It outputs the original pixel value and the predicted value of 2 pixels, totaling 96 bits. To avoid the situation where 2 pixels are split across 2 rows, the width of the input image is limited to an even number to simplify the design.
[0024] The enhanced QOI encoding unit performs enhanced QOI encoding based on pixels and their predicted values to generate encoding groups. The enhanced QOI encoding unit is also a fully pipelined data stream processing unit with a total of 8 pipeline stages. It encodes 2 pixels at a time, with the first 6 stages completing the encoding and the last 2 stages shaping the encoding groups.
[0025] The enhanced QOI encoding strategy provides seven encoding strategies, listed in descending order of priority: QOI_RUN, QOI_INDEX, QOI_DIFF1, QOI_DIFF2, QOI_LUMA, QOI_DIFF3, and QOI_RGB. The corresponding encoding byte lengths also increase sequentially, with QOI_RUN, QOI_INDEX, and QOI_DIFF1 being the shortest, requiring only 1 byte, while QOI_RGB is the longest, requiring 4 bytes.
[0026] During the encoding process, all six encoding strategies except QOI_RGB are considered. Since the enhanced QOI encoding unit encodes two pixels in parallel, the correlation between the two encoded pixels also needs to be considered. The decision-making process is distributed in an orderly manner in the first five stages of the pipeline. Each of the two pixels involved in encoding has a 6-bit encoding flag. If a certain encoding strategy is met, the corresponding encoding flag is marked. Finally, in the sixth stage of the pipeline, the final encoding is completed according to the priority of the encoding flags (the priority of the encoding strategy), generating a 9-byte encoded group.
[0027] The composition of a 9-byte encoded group is as follows Figure 4 As shown, it consists of two pixel coding groups plus a pre-run coding. There may be empty bytes in the coding group. The 9-byte coding group has a 9-bit keep signal. In the keep signal, 0 indicates that the corresponding position of the coding group is an empty byte. Figure 5 (a) and Figure 5 (b) gives two possible cases for the 9-byte encoding group.
[0028] The 7th and 8th stages of the enhanced QOI coding unit pipeline achieve the initial shaping of the coding group. Figure 5 The shaping process of the encoding group shown in (a) is as follows: Figure 6 As shown, Figure 5 (b) The encoding group has been shaped, and the final 9-byte encoding group data is right-aligned. In addition, the encoding group data includes a 4-bit user signal to indicate the effective length of the encoding group.
[0029] The compact bit-width transformation converts the 72-bit code group output to 64 bits (64 bits is the data bit width commonly used in DMA, Direct Memory Access), preparing for the subsequent keep signal normalization. This bit-width transformation is only effective for code groups with preceding RUN encoding and both pixel codes being QOI_RGB encoding. In this case, the effective byte count is 9 bytes. The compact bit-width transformation module divides it into two 8-byte blocks and outputs them in two clock cycles. Code groups of other lengths are directly truncated into 8-byte blocks for transmission. The probability of a code group having an effective byte count of 9 bytes is relatively small, belonging to a special case. Using this simple processing method simplifies hardware design, and the resulting throughput loss is almost negligible.
[0030] Both the JPEG-LS predictor and the enhanced QOI coding unit employ pipelined processing and use a common data interface (data signal and valid signal). Figure 1This is reflected in the diagram (the small rectangle represents the AXIS bus interface). When the compact bit-width transform processes a 9-byte code group, it can cause input blockage, affecting the preceding pipeline prediction and encoding. Therefore, a code group FIFO is added before the compact bit-width transform for buffering. When the compact bit-width transform module causes blockage, the FIFO buffers the code group data. When the accumulated data reaches a set threshold, it will lower s_ready. At this time, the data input to the JPEG-LS predictor is blocked, and the pipeline pauses. The threshold is set reasonably to ensure that the code group FIFO does not overflow when buffering code group data.
[0031] After the initial shaping and compact bit-width transformation of the enhanced QOI coding unit, the AXIS data stream still has a large number of empty bytes. Most DMAs do not allow empty bytes in the AXIS data stream, so the keep signal normalization module is needed to integrate the sparse coded data and remove empty bytes.
[0032] The keep signal normalization module uses a 16-byte ring register buffer divided into two fields to concatenate sparse encoded data, such as... Figure 7 As shown, the write pointer points to the field where the write byte pointer is located, and the read pointer points to another field. Each time 8 bytes of data are input, the valid bytes are written to the ring register in sequence, and the write byte pointer is updated. If the updated write pointer crosses the field boundary, the read and write pointers are swapped. At this time, the field pointed to by the read pointer (a total of 8 bytes of data) can be read.
[0033] The logic synthesis is performed on the board. The FPGA device model is Xilinx xc7z020clg400-2, the clock period is set to 6ns (i.e., the clock frequency is 167MHz), the clock uncertainty is set to 10% of the clock period, and the worst path setup time margin is 0.549ns. Therefore, the encoder meets the target clock frequency of 167MHz.
[0034] In terms of resource consumption, this design used a total of 439 Slices (configurable logic units inside the FPGA, consisting of lookup tables (LUTs), flip-flops (FFs), multiplexers (Muxes), and carry chains) and 2.5 BRAMs (36kb).
[0035] The paper "High-throughput architecture for b oThe "Lossless and Near-Lossless Compression Modes of the LOCO-I Algorithm" is implemented on an FPGA with a parallelism of 2. The FPGA used in the design is a Xilinx Virtex6-75t, with resource consumption of 8.3K slices and 7 BRAMs (36kb). In comparison, the enhanced QOI encoder proposed in this invention has advantages in terms of resource consumption.
Claims
1. An enhanced QOI encoder based on FPGA implementation, characterized in that, The encoder converts the RGB888 image pixel stream into an enhanced QOI encoded data stream. Both the RGB888 image pixel stream and the enhanced QOI encoded data stream follow the AXIStream bus interface protocol with a bus data bit width of 64 bits. The encoder consists of an input bit width conversion unit, a JPEG-LS predictor, an enhanced QOI encoding unit, an encoding group FIFO unit, a compact bit width conversion unit, and a keep signal specification module. The input bit width conversion unit converts the data bit width of the input RGB888 image pixel stream from 64 bits to 48 bits. The JPEG-LS predictor processes two pixels at a time, performs non-linear prediction, and generates a predicted value for the corresponding position. The JPEG-LS predictor is a fully pipelined data stream processing unit with three pipeline stages, outputting the original pixel value and the predicted value of two pixels, totaling 96 bits. The enhanced QOI encoding unit performs enhanced QOI encoding based on the original pixel value and its predicted value to generate an encoding group. The enhanced QOI encoding unit is a fully pipelined data stream processing unit with 8 pipeline stages. It encodes 2 pixels at a time, with the first 6 stages completing the encoding and the last 2 stages shaping the encoding group. The FIFO unit of the coding group buffers the data of the coding group. When the accumulated data reaches a set threshold, it will block the data input to the JPEG-LS predictor, thereby pausing the pipeline and ensuring that the FIFO unit of the coding group does not overflow when buffering the coding group. The compact bit-width conversion unit converts the 72-bit encoding group output to 64 bits, preparing for the subsequent keep signal specification module; The keep signal specification module uses a 16-byte ring register buffer divided into two fields to splice sparse encoded data, generating an encoded AXIS data stream without empty bytes. When the updated write byte pointer crosses the field boundary, the read and write field pointers are swapped. At this time, the field pointed to by the read field pointer can be read. The field pointed to by the read field pointer contains a total of 8 bytes of data.