Image encoding device, image decoding device, control method for image encoding device, and control method for image decoding device
The image encoding device addresses the limitation of VVC by enabling quantization matrix usage for inter-predicted blocks, improving encoding efficiency and image quality through frame-based prediction.
Patent Information
- Authority / Receiving Office
- WO · WO
- Patent Type
- Applications
- Current Assignee / Owner
- CANON KK
- Filing Date
- 2025-12-18
- Publication Date
- 2026-07-02
AI Technical Summary
The existing Versatile Video Coding (VVC) encoding method is limited in its ability to perform quantization control using a quantization matrix on transformation coefficients obtained by inter-LFNSST, which affects subjective image quality.
An image encoding device that enables quantization using a quantization matrix for transformation coefficients derived from predicted images across different frames, employing a first and second conversion means with or without a quantization matrix based on prediction modes.
Facilitates improved quantization control for inter-predicted blocks, enhancing encoding efficiency and image quality.
Smart Images

Figure JP2025044261_02072026_PF_FP_ABST
Abstract
Description
Image encoding device, image decoding device, control method for image encoding device, control method for image decoding device
[0001] The present disclosure relates to encoding / decoding technology.
[0002] As an encoding method for compression recording of moving images, a Versatile Video Coding (VVC) encoding method (hereinafter referred to as VVC) is known. In VVC, in order to improve encoding efficiency, a basic block of up to 128x128 pixels is divided into sub-blocks not only in the shape of a conventional square but also in the shape of a rectangle.
[0003] In VVC, a process called quantization matrix processing is used, in which coefficients (orthogonal transform coefficients) after orthogonal transformation are weighted according to frequency components. By further reducing data of high-frequency components that are less noticeable in human vision degradation, it is possible to improve compression efficiency while maintaining image quality. Patent Document 1 discloses a technique for encoding such a quantization matrix.
[0004] In recent years, the Joint Video Experts Team (JVET) that standardized VVC has been conducting technical studies to achieve higher compression efficiency than VVC. In order to improve encoding efficiency, in VVC, a new technique (hereinafter referred to as inter-LFNSST) that applies low-frequency non-separable transform (LFNST) applied only to intra-predicted blocks to inter-predicted blocks is being studied.
[0005] Japanese Unexamined Patent Application Publication No. 2013-38758
[0006] The quantization matrix in VVC is premised on LFNST applied only to intra-predicted blocks and cannot handle inter-LFNSST. Therefore, there is a problem that quantization control using a quantization matrix cannot be performed on the transform coefficients to which inter-LFNSST is applied, and the subjective image quality cannot be improved.
[0007] This disclosure provides a technique for enabling quantization using a quantization matrix even for transformation coefficients obtained by transforming the difference between a predicted image generated based on a frame that is temporally different from the frame to which the block to be encoded belongs, and the block to be encoded.
[0008] One aspect of the present disclosure is an image encoding device for encoding image data in block units, comprising: prediction means for generating a predicted image using pixels of a picture different from the picture to which a block of interest of a predetermined size to be encoded in the image belongs, and generating a prediction error which is the difference between the block of interest and the predicted image; conversion means for frequency-converting the prediction error obtained by the prediction means to generate conversion coefficients; quantization means for quantizing the conversion coefficients obtained by the conversion means to generate quantized conversion coefficients; and encoding means for entropy-encoding the quantized conversion coefficients obtained by the quantization means, wherein the conversion means further comprises a first conversion means and a second conversion means, and the quantization means is characterized in that it quantizes the conversion coefficients generated using only the second conversion means using a quantization matrix to generate quantized conversion coefficients, and quantizes the conversion coefficients generated using both the first conversion means and the second conversion means without using a quantization matrix to derive the quantized conversion coefficients.
[0009] According to this disclosure, quantization using a quantization matrix is also possible for transformation coefficients obtained by transforming the difference between a predicted image generated based on a frame that is temporally different from the frame to which the block to be encoded belongs, and the block to be encoded.
[0010] Other features and advantages of the technical ideas derived from this disclosure will become apparent from the following description with reference to the attached drawings. In the attached drawings, the same or similar components are given the same reference numeral.
[0011] The attached drawings are included in the specification and constitute part thereof, illustrating embodiments in this disclosure and used to explain the technical ideas derived from this disclosure together with their descriptions. Block diagram showing an example of the functional configuration of an image encoding device. Block diagram showing an example of the functional configuration of an image decoding device. Flowchart of the processing performed by the image encoding device. Flowchart of the processing performed by the image decoding device. Block diagram showing an example of the hardware configuration of a computer device. Diagram showing an example of the configuration of a bitstream. Diagram showing an example of a subblock partitioning pattern. Diagram showing an example of a subblock partitioning pattern. Diagram showing an example of a subblock partitioning pattern. Diagram showing an example of a subblock partitioning pattern. Diagram showing an example of a subblock partitioning pattern. Diagram showing an example of a quantization matrix configuration. Diagram showing an example of a quantization matrix configuration. Diagram showing an example of a scan order. Diagram showing an example of a one-dimensional difference matrix configuration. Diagram showing an example of a one-dimensional difference matrix configuration. Diagram showing an example of a one-dimensional difference matrix configuration. Diagram showing an example of a coding table configuration. Diagram showing an example of a coding table configuration. Diagram illustrating an example of template fit prediction.
[0012] The embodiments will be described in detail below with reference to the attached drawings. Note that the following embodiments do not limit the scope of the claims. While the embodiments describe multiple features, not all of these features are necessary, and the features may be combined in any way. Furthermore, in the attached drawings, identical or similar configurations are given the same reference numerals, and redundant descriptions are omitted.
[0013] [First Embodiment] First, an example of the functional configuration of an image encoding device, which is an example of an encoding device according to this embodiment, will be explained using the block diagram in Figure 1. The block division unit 102 acquires the input frame (input image, image data). The method of acquiring frames by the block division unit 102 is not limited to a specific acquisition method. For example, the block division unit 102 may acquire each frame in a moving image (for example, 30 frames / second) captured by an imaging device connected to the image encoding device. Alternatively, for example, the block division unit 102 may acquire each frame in a moving image stored in an external device such as a server device or an external memory device connected to the image encoding device. Alternatively, for example, the block division unit 102 may acquire each frame in a moving image stored in the memory of the image encoding device. The block division unit 102 then divides the acquired frame into a plurality of basic blocks.
[0014] The storage unit 103 stores multiple quantization matrices used for quantization processing. The method for obtaining the quantization matrices is not limited to a specific method; for example, the quantization matrices may be input to the image encoding device in response to user operation, or the image encoding device may calculate the quantization matrices from the characteristics of the frames. Also, for example, the quantization matrices stored by the storage unit 103 may use pre-specified initial values.
[0015] In this embodiment, as an example, a two-dimensional quantization matrix corresponding to an orthogonal transformation (frequency transformation) of 8x8 pixels, as shown in Figures 8A to 8C, is generated and held in the holding unit 103.
[0016] The control unit 114 acquires information indicating whether or not to apply the quantization matrix in the quantization and inverse quantization processes performed in the subsequent conversion / quantization unit 105 and inverse quantization / inverse conversion unit 106, respectively, as "quantization matrix control information". The method of acquiring the quantization matrix control information by the control unit 114 is not limited to a specific method. For example, the control unit 114 may acquire the quantization matrix control information input to the image encoding device in response to user operation, or the image encoding device may calculate the quantization matrix control information from the characteristics of the frame. Alternatively, for example, the control unit 114 may acquire a value predetermined as an initial value as the quantization matrix control information.
[0017] The prediction unit 104 divides the basic block into multiple subblocks (subblock division), and performs prediction processing (intra-prediction, which is intra-frame prediction; inter-prediction, which is inter-frame prediction; template fitting prediction, etc.) on a subblock basis to generate a predicted image for the subblock. The prediction unit 104 then calculates the prediction error from the subblock and the predicted image generated for the subblock. The prediction unit 104 also outputs information necessary for prediction (for example, information such as subblock division, prediction mode, motion vector, etc.) as prediction information.
[0018] The conversion / quantization unit 105 performs an orthogonal transformation (frequency transformation) on the prediction error for each subblock generated by the prediction unit 104 to generate the conversion coefficients for the subblock. The conversion / quantization unit 105 also generates low-frequency transformation information for each subblock, indicating whether or not to perform low-frequency transformation on the low-frequency component of the subblock's conversion coefficient. The conversion / quantization unit 105 then performs low-frequency transformation on the low-frequency component of the subblock's conversion coefficient that indicates "perform low-frequency transformation on the low-frequency component of the subblock's conversion coefficient." Finally, the conversion / quantization unit 105 quantizes the conversion coefficients for each subblock to generate the quantized coefficients (quantized conversion coefficients) for the subblock.
[0019] The inverse quantization / inverse transformation unit 106 performs the reverse operation of the transformation / quantization unit 105 to regenerate the prediction error. In other words, the inverse quantization / inverse transformation unit 106 inverse quantizes the quantization coefficients generated by the transformation / quantization unit 105 to regenerate the transformation coefficients. Then, the inverse quantization / inverse transformation unit 106 performs an inverse low-frequency transformation on the transformation coefficients of the subblocks of the low-frequency transformation information that indicates "a low-frequency transformation should be performed on the low-frequency component of the transformation coefficients of the subblocks" among the regenerated subblock transformation coefficients. Finally, the inverse quantization / inverse transformation unit 106 performs an inverse orthogonal transformation on the transformation coefficients of each subblock to regenerate the prediction error.
[0020] The image playback unit 107 generates a predicted image by appropriately referring to the frame memory 108 based on the prediction information output from the prediction unit 104. Then, the image playback unit 107 generates a reconstructed image from the generated predicted image and the prediction error reconstructed by the inverse quantization / inverse transformation unit 106, and stores the generated reconstructed image in the frame memory 108.
[0021] The in-loop filter unit 109 performs in-loop filtering, such as deblocking filtering and sample adaptive offsetting, on the playback image stored in the frame memory 108, and then stores the playback image with the in-loop filtering applied back into the frame memory 108.
[0022] The encoding unit 110 entropy encodes the quantization coefficients and low-frequency conversion information generated by the conversion / quantization unit 105 and the prediction information output from the prediction unit 104 to generate coded image data. The encoding unit 113 encodes the quantization matrix held in the storage unit 103 to generate coded data for the quantization matrix.
[0023] The integrated encoding unit 111 generates header code data using the code data of the quantization matrix generated by the encoding unit 113, the quantization matrix control information generated by the control unit 114, and so on. Furthermore, the integrated encoding unit 111 forms a bitstream by combining the generated header code data with the code data of the image generated by the encoding unit 110, and outputs the formed bitstream.
[0024] The control unit 150 controls the operation of the entire image encoding device. For example, the control unit 150 controls the operation of each of the above-mentioned functional units in the image encoding device. As a result, each of the above-mentioned functional units in the image encoding device operates under the control of the control unit 150.
[0025] Next, the encoding operation of frames by the image encoding device will be explained in more detail. Below, we will describe the case in which the holding unit 103 generates and holds the quantization matrix.
[0026] As described above, the control unit 114 acquires quantization matrix control information for each subblock. In the following explanation, as an example, a case is described in which the quantization matrix control information for a subblock that performs low-frequency conversion is set to a value of "1" indicating that "a quantization matrix is not used for quantization," and the quantization matrix control information for a subblock that does not perform low-frequency conversion is set to a value of "0" indicating that "a quantization matrix is used for quantization." The method of setting such values for quantization matrix control information is not limited to a specific method.
[0027] The quantization matrix is generated according to the size of the subblock to be encoded and the type of prediction method. In this embodiment, the holding unit 103 generates a quantization matrix of 8x8 pixels, corresponding to a basic block of a predetermined size of 8x8 pixels, as shown in Figure 7A. However, the size of the generated quantization matrix is not limited to 8x8 pixels; quantization matrices corresponding to the shape of the subblock, such as 4x8 pixels, 8x4 pixels, or 4x4 pixels, may also be generated.
[0028] The method for determining the values of each element in a quantization matrix is not limited to a specific method. For example, a predetermined value may be used for each element in the quantization matrix, a different value may be used for each element, or a value may be used that is appropriate to the characteristics of the frame.
[0029] The holding unit 103 holds the quantization matrix generated in this manner. In this embodiment, we will describe the case in which the holding unit 103 generates and holds the quantization matrices shown in Figures 8A, 8B, and 8C, respectively.
[0030] The quantization matrix shown in Figure 8A is an example of a quantization matrix used for quantizing transformation coefficients generated based on prediction errors obtained in intra-prediction. The quantization matrix shown in Figure 8B is an example of a quantization matrix used for quantizing transformation coefficients generated based on prediction errors obtained in inter-prediction. The quantization matrix shown in Figure 8C is an example of a quantization matrix used for quantizing transformation coefficients generated based on prediction errors obtained in template-fit prediction.
[0031] As shown in Figures 8A, 8B, and 8C, the quantization matrix has 8x8 elements (quantization step values). In this embodiment, the quantization matrix held by the holding unit 103 is assumed to be a two-dimensional array of quantization step values, as shown in Figures 8A, 8B, and 8C, but the form of holding the quantization step values in the quantization matrix is not limited to a specific form.
[0032] Furthermore, the holding unit 103 can also hold multiple quantization matrices for the same prediction method, depending on the size of the subblock or whether the encoding target is a luminance block or a chrominance block. Generally, in order to realize quantization processing according to human visual characteristics, the quantization matrix has a small quantization step value for the DC component corresponding to the upper left corner of the quantization matrix, as shown in Figures 8A, 8B, and 8C, and a large quantization step value for the AC component corresponding to the lower right corner.
[0033] The encoding unit 113 reads the quantization matrices shown in Figures 8A, 8B, and 8C from the storage unit 103, scans each element in the read quantization matrices to calculate the difference between elements, and generates a one-dimensional array by arranging the calculated differences.
[0034] In this embodiment, each element in the quantization matrix is scanned according to the scanning order shown in Figure 9, and for each scanned element, the difference between the value of that element (quantization step value) and the value of the element immediately preceding it in the scanning order (quantization step value) is calculated.
[0035] For example, when each element in the quantization matrix in Figure 8C is scanned according to the scanning order shown in Figure 9, the first element located in the upper left corner (quantization step value "8") is scanned, followed by the element located directly below it (quantization step value "11"). The difference calculated is "3," which is the difference between the value of the former element "8" and the value of the latter element "11." Note that for the first element of the quantization matrix in the scanning order, the difference is calculated from a predetermined initial value (for example, "8"), but it is not limited to this; the difference can also be calculated from an arbitrary value or from the value of the first element itself.
[0036] In this way, the encoding unit 113 scans each element in the quantization matrix of Figure 8A according to the scanning order of Figure 9, calculates the differences between the elements, and arranges them to generate a one-dimensional array as the one-dimensional difference matrix shown in Figure 10A.
[0037] Similarly, the encoding unit 113 scans each element in the quantization matrix of Figure 8B according to the scanning order of Figure 9, calculates the differences between the elements, and arranges them to generate a one-dimensional array as the one-dimensional difference matrix shown in Figure 10B.
[0038] Similarly, the encoding unit 113 scans each element in the quantization matrix of Figure 8C according to the scanning order of Figure 9, calculates the difference between the elements, and arranges them to generate a one-dimensional array as the one-dimensional difference matrix shown in Figure 10C.
[0039] Then, the encoding unit 113 encodes the one-dimensional difference matrices of FIGS. 10A, 10B, and 10C generated for each of the quantization matrices of FIGS. 8A, 8B, and 8C to generate encoded data of the quantization matrix. In the present embodiment, the encoding unit 113 encodes the difference matrix using the encoding table shown in FIG. 11A, but the encoding table used for encoding the difference matrix is not limited to the encoding table shown in FIG. 11A. For example, the encoding unit 113 may encode the difference matrix using the encoding table shown in FIG. 11B.
[0040] The integrated encoding unit 111 encodes the header information necessary for encoding the frame, including quantization matrix control information, and integrates the encoded data of the quantization matrix generated by the encoding unit 113 into the encoded header information.
[0041] The block division unit 102 divides the input frame (input image for one frame) into a plurality of basic blocks. In the present embodiment, as described above, the size of the basic block is 8x8 pixels.
[0042] The prediction unit 104 determines a sub-block division method, which is a method of dividing a basic block into a plurality of sub-blocks (sub-block division), and determines a prediction process (prediction mode) for generating a predicted image of the sub-blocks.
[0043] FIGS. 7A to 7F are diagrams showing an example of a sub-block division pattern. The rectangle indicated by the thick outer frame in FIGS. 7A to 7F represents the basic block, and in the present embodiment, the basic block has a size of 8x8 pixels. Also, the rectangles inside the thick frame represent sub-blocks.
[0044] FIG. 7A shows an example where the basic block = sub-block. FIG. 7B represents an example of conventional square sub-block division, and a basic block having a size of 8x8 pixels is divided into four "sub-blocks having a size of 4x4 pixels".
[0045] Figures 7C to 7F show an example of rectangular sub-block division. In Figure 7C, a basic block having a size of 8x8 pixels is divided into two "vertically long sub-blocks having a size of 4x8 pixels". In Figure 7D, a basic block having a size of 8x8 pixels is divided into two "horizontally long sub-blocks having a size of 8x4 pixels". In Figure 7E, a basic block having a size of 8x8 pixels is divided into three sub-blocks: a "vertically long sub-block having a size of 2x8 pixels", a "vertically long sub-block having a size of 4x8 pixels", and a "vertically long sub-block having a size of 2x8 pixels". In Figure 7F, a basic block having a size of 8x8 pixels is divided into three sub-blocks: a "horizontally long sub-block having a size of 8x2 pixels", a "horizontally long sub-block having a size of 8x4 pixels", and a "horizontally long sub-block having a size of 8x2 pixels". Thus, not only squares but also rectangular sub-blocks are used for encoding processing.
[0046] In this embodiment, as shown in Figure 7A, a case determined by the sub-block determination method in which a basic block having a size of 8x8 pixels is not divided and is used as a sub-block will be described. However, a quadtree division as shown in Figure 7B, a ternary tree division as shown in Figures 7E and 7F, or a binary tree division as shown in Figures 7C and 7D may also be used. That is, the block to be encoded (encoding target block) may be a basic block or a sub-block obtained by dividing the basic block.
[0047] When other sub-block division methods than Figure 7A are also used, it is necessary to generate a quantization matrix corresponding to the sub-blocks to be used. Also, the generated quantization matrix will be encoded by the encoding unit 113.
[0048] The method for determining the sub-block division method is not limited to a specific determination method. For example, it may be determined according to a user operation or according to a predetermined criterion. Also, the sub-block determination method may be predetermined.
[0049] Furthermore, the prediction unit 104 determines the prediction mode for the subblock. In this embodiment, for each subblock, one of the following prediction processes (prediction mode) is determined: intra prediction, inter prediction, or template-fit prediction.
[0050] In intra-prediction, a predicted image of the target block is generated using pre-encoded pixels located spatially around the target block, and an intra-prediction mode is also generated that indicates intra-prediction methods such as horizontal prediction, vertical prediction, and DC prediction.
[0051] In interpretation, a frame different from (temporalally different from) the frame to which the block to be encoded belongs is used as a reference frame. A predicted image of the block to be encoded is generated using the encoded pixels in the reference frame, and motion information such as the reference frame and motion vectors is also generated.
[0052] In template fitting prediction, a group of pixels adjacent to the block to be encoded is used as a template. A group of pixels deemed similar to this template is searched for from "the encoded pixel group in the frame to which the block to be encoded belongs," and a predicted image of the block to be encoded is generated based on the searched pixel group.
[0053] Here, we will explain an example of template fitting prediction using the specific example shown in Figure 12. Here, we will explain a method for generating a predicted image (P) 1203 corresponding to the encoded block (C) 1201 in frame 1200.
[0054] The prediction unit 104 uses a pixel group (N) 1202, which consists of a group of pixels adjacent to the upper side of the block to be encoded 1201 and a group of pixels adjacent to the left side of the block to be encoded 1201, as a template.
[0055] The prediction unit 104 then searches for a group of pixels similar to the template from the encoded pixel group in frame 1200. The method for searching for the "group of pixels similar to the template" is not limited to a specific search method. For example, template matching may be performed between the region of the encoded pixel group in frame 1200 and the template, and the group of pixels in the region of the encoded pixel group in frame 1200 with the highest similarity to the template may be defined as the "group of pixels most similar to the template". Figure 12 shows a case in which the pixel group (T') 1204 was found as the group of pixels most similar to the template from the encoded pixel group in frame 1200.
[0056] The prediction unit 104 then generates a predicted image 1203 composed of encoded pixels of frame 1200 based on the pixel group 1204. The example in Figure 12 shows a case where the pixel group 1204 and the image in the rectangular area adjacent to it in the lower right are generated as the predicted image 1203.
[0057] Template fit prediction is considered a technique that improves encoding efficiency, particularly in artificial images such as computer screens where the same characters or textures repeatedly appear within the frames being encoded and decoded.
[0058] In other words, the prediction unit 104 divides the basic block into multiple subblocks according to the subblock division method determined as described above. In this embodiment, the basic block is not divided as described above, so the subblocks in the following description are the same as the basic block.
[0059] The prediction unit 104 then calculates the difference between each subblock and the predicted image generated by the prediction process of the prediction mode determined for that subblock as the prediction error for that subblock.
[0060] Furthermore, the prediction unit 104 outputs information necessary for prediction (such as the subblock partitioning method, prediction mode (information indicating which prediction mode was used: intra prediction, inter prediction, or template fitting prediction), motion vector, etc.) as prediction information.
[0061] The conversion and quantization unit 105 performs orthogonal transformation (frequency transformation) and quantization on the prediction error for each subblock generated by the prediction unit 104 to generate the quantization coefficient (conversion coefficient after quantization) for the subblock.
[0062] More specifically, the conversion / quantization unit 105 generates conversion coefficients by performing an orthogonal transformation corresponding to the size of the prediction error of each subblock. The conversion / quantization unit 105 also generates low-frequency conversion information for each subblock. For example, if the conversion coefficients are concentrated in the low-frequency component and further improvement in compression efficiency can be expected by applying low-frequency conversion to the conversion coefficients, the conversion / quantization unit 105 generates low-frequency conversion information (1) indicating "perform low-frequency conversion on the low-frequency component of the subblock's conversion coefficients" as the low-frequency conversion information for the conversion coefficients. On the other hand, if this is not the case, the conversion / quantization unit 105 generates low-frequency conversion information (0) indicating "do not perform low-frequency conversion on the low-frequency component of the subblock's conversion coefficients" as the low-frequency conversion information for the conversion coefficients.
[0063] For subblocks for which low-frequency conversion information (1) indicating "perform low-frequency conversion on the low-frequency component of the subblock's conversion coefficient" has been generated, the conversion / quantization unit 105 sets the conversion coefficient of the subblock as the target conversion coefficient from among the conversion coefficients of each subblock, sets the high-frequency component of the target conversion coefficient to 0 regardless of the value of the conversion coefficient, and performs low-frequency conversion on the low-frequency component of the target conversion coefficient.
[0064] The conversion / quantization unit 105 then quantizes the conversion coefficients for each subblock. When the value of the quantization matrix control information for the subblock is "0", the storage unit 103 selects a quantization matrix from among the quantization matrices it holds that corresponds to the prediction mode of the subblock, and quantizes the conversion coefficients using the selected quantization matrix to generate quantized coefficients (quantized conversion coefficients). On the other hand, when the value of the quantization matrix control information for the subblock is "1", the conversion / quantization unit 105 generates quantized coefficients by quantizing the conversion coefficients of the subblock without using the quantization matrix held by the storage unit 103. For example, the conversion / quantization unit 105 generates quantized coefficients by quantizing the conversion coefficients of the subblock using the same quantization scale.
[0065] In this embodiment, the transformation / quantization unit 105 selects the quantization matrix shown in Figure 8A for subblocks where the prediction mode is intra-prediction. The transformation / quantization unit 105 also selects the quantization matrix shown in Figure 8B for subblocks where the prediction mode is inter-prediction. Furthermore, the transformation / quantization unit 105 selects the quantization matrix shown in Figure 8C for subblocks where the prediction mode is template-fit prediction. However, the quantization matrices used are not limited to these.
[0066] On the other hand, for subblocks for which low-frequency conversion information (0) indicating "no low-frequency conversion will be performed on the low-frequency component of the subblock's conversion coefficients" has been generated, i.e., subblocks to which low-frequency conversion is not performed, the conversion / quantization unit 105 quantizes the conversion coefficients of the subblock to generate quantized coefficients. At that time, the holding unit 103 selects a quantization matrix from among the quantization matrices it holds that corresponds to the prediction mode of the subblock, and quantizes the conversion coefficients using the selected quantization matrix to generate quantized coefficients (quantized conversion coefficients).
[0067] In other words, the transformation / quantization unit 105 generates "quantized transformation coefficients" by quantizing the transformation coefficients generated using only the second transformation using a quantization matrix, and derives "quantized transformation coefficients" by quantizing the transformation coefficients generated using both the first and second transformations without using a quantization matrix.
[0068] The inverse quantization / inverse conversion unit 106 first determines whether or not low-frequency conversion has been applied to the subblock based on the low-frequency conversion information input from the conversion / quantization unit 105. If low-frequency conversion has been applied to the subblock, that is, if the low-frequency conversion information for the subblock is 1, the inverse quantization / inverse conversion unit 106 inversely quantizes the quantization coefficients based on the quantization matrix control information input from the control unit 114 and reconstructs the low-frequency conversion coefficients. Specifically, if the quantization matrix control information indicates 0, it performs inverse quantization processing using the quantization matrix stored in the holding unit 103. Similar to the conversion / quantization unit 105, this inverse quantization processing using the quantization matrix uses a quantization matrix corresponding to the prediction mode of the subblock. Specifically, the same quantization matrix used in the conversion / quantization unit 105 is used.
[0069] Conversely, if the quantization matrix control information indicates 1, inverse quantization processing is performed without using the quantization matrix. In this case, all quantization coefficients within the subblock are inversely quantized using the same quantization scale, and the low-frequency conversion coefficients are reconstructed. The inverse quantization / inverse conversion unit 106 then performs inverse low-frequency conversion processing on the reconstructed low-frequency conversion coefficients to reconstruct orthogonal conversion coefficients, and further performs inverse orthogonal conversion on the orthogonal conversion coefficients to reconstruct prediction error data.
[0070] Thus, the inverse quantization / inverse transformation unit 106 can inversely quantize both the quantization coefficients of the subblocks of the quantization matrix control information having a value of "0" and the quantization coefficients of the subblocks of the quantization matrix control information having a value of "1".
[0071] On the other hand, if low-frequency conversion has not been applied to the subblock, the inverse quantization / inverse conversion unit 106 inversely quantizes the quantization coefficients of the input subblock using the quantization matrix stored in the holding unit 103 to regenerate the conversion coefficients. The inverse quantization / inverse conversion unit 106 further performs an inverse orthogonal transformation on the regenerated conversion coefficients to regenerate the prediction error data. Similar to the conversion / quantization unit 105, a quantization matrix corresponding to the prediction mode of the block to be encoded is used for the inverse quantization process. Specifically, the same quantization matrix used in the conversion / quantization unit 105 is used. The inverse quantization / inverse conversion unit 106 then supplies the regenerated prediction error data to the image regeneration unit 107.
[0072] The image playback unit 107 generates (plays back) a predicted image by appropriately referring to the frame memory 108 based on the prediction information output from the prediction unit 104. Then, the image playback unit 107 adds the generated predicted image and the prediction error of the subblock reproduced by the inverse quantization / inverse transform unit 106 to generate a reproduced image of the subblock, and stores the generated reproduced image in the frame memory 108.
[0073] The in-loop filter unit 109 performs in-loop filtering, such as deblocking filtering and sample adaptive offsetting, on the playback image stored in the frame memory 108, and then stores the playback image with the in-loop filtering applied back into the frame memory 108.
[0074] The encoding unit 110 entropy encodes, for each subblock, the quantization coefficients and low-frequency conversion information generated by the conversion / quantization unit 105 for that subblock, along with the prediction information output from the prediction unit 104, to generate image encoding data. For entropy encoding, for example, Golomb coding, arithmetic coding, or Huffman coding can be used.
[0075] The integrated encoding unit 111 generates header code data as described above. Furthermore, the integrated encoding unit 111 forms a bitstream by multiplexing the generated header code data with the image code data generated by the encoding unit 110, and outputs the formed bitstream. The output destination of the bitstream by the integrated encoding unit 111 is not limited to a specific output destination. For example, the integrated encoding unit 111 may transmit the bitstream to an external device such as a server device via a network such as a LAN or the Internet, or it may output (store) the bitstream in the memory of the image encoding device. An example of the data structure of the bitstream generated and output by the integrated encoding unit 111 is shown in Figure 6.
[0076] The sequence header contains quantization matrix control information codes and coded data for the quantization matrix, and this coded data contains coded data for each element of the quantization matrix. However, the location in which the quantization matrix control information and the quantization matrix are encoded in the sequence header is not limited to this; they may also be encoded in the picture header or other header sections. Furthermore, when changing the quantization matrix within a single sequence, it is possible to update it by recoding the quantization matrix. In this case, the entire quantization matrix may be rewritten, or it may be possible to change only a part of it by specifying the prediction mode of the quantization matrix corresponding to the quantization matrix to be rewritten.
[0077] Next, the processing performed by the image encoding device for encoding one frame in a moving image will be explained according to the flowchart in Figure 3. In step S301, the control unit 114 acquires quantization matrix control information for each subblock as described above.
[0078] In step S302, prior to the encoding process, the storage unit 103 generates and stores a quantization matrix, which is a two-dimensional array of quantization step values. In this embodiment, as described above, the storage unit 103 generates and stores the quantization matrices shown in Figures 8A to 8C (corresponding to subblocks having a size of 8x8 pixels, and corresponding to the prediction methods of intra prediction, inter prediction, and template fitting prediction).
[0079] In step S303, the encoding unit 113 reads the quantization matrix generated in step S302 and held in the storage unit 103, scans each element in the read quantization matrix to calculate the difference between elements, and generates a one-dimensional array of the calculated differences as a one-dimensional difference matrix. In this embodiment, as described above, the encoding unit 113 scans each element in the quantization matrix of Figure 8A according to the scan order shown in Figure 9 to calculate the difference between elements, and generates the difference matrix of Figure 10A by arranging the calculated differences. Furthermore, the encoding unit 113 scans each element in the quantization matrix of Figure 8B according to the scan order shown in Figure 9 to calculate the difference between elements, and generates the difference matrix of Figure 10B by arranging the calculated differences. Furthermore, the encoding unit 113 scans each element in the quantization matrix of Figure 8C according to the scan order shown in Figure 9 to calculate the difference between elements, and generates the difference matrix of Figure 10C by arranging the calculated differences.
[0080] The encoding unit 113 then refers to the encoding table shown in Figures 11A and 11B to identify the binary code corresponding to the value of each element (value to be encoded) in the one-dimensional difference matrix of the quantization matrix, and generates the set of identified binary codes as the code data of the quantization matrix.
[0081] In step S304, the integrated encoding unit 111 encodes header information necessary for encoding the frame, such as quantization matrix control information acquired by the control unit 114 in step S301, and integrates the encoded header information with the encoded quantization matrix code data generated in step S303 to generate header encoded data.
[0082] In step S305, the block division unit 102 divides the input frame (an input image for one frame) into multiple basic blocks. In step S306, the prediction unit 104 selects one of the multiple basic blocks obtained in step S305 that was not selected as the selected basic block. The prediction unit 104 then divides the selected basic block into multiple subblocks, performs prediction processing on each subblock to generate a predicted image for that subblock, and calculates the difference between the subblock and its predicted image as the prediction error. The prediction unit 104 also outputs the information necessary for prediction as prediction information.
[0083] More specifically, the prediction unit 104 designates each subblock in the selected basic block as a focus subblock (focus block), and generates a predicted image of the focus subblock by performing the following processing on the focus subblock.
[0084] The prediction unit 104 performs intra-prediction on the subblock of interest by referring to the region of the encoded pixel group of the frame to which the subblock of interest belongs, and generates a predicted image (intra-predicted image) of the subblock of interest.
[0085] Furthermore, the prediction unit 104 performs interpretation on the subblock of interest by referring to an encoded frame different from the frame to which the subblock of interest belongs (for example, the frame encoded immediately before), and generates a predicted image (interpretation image) of the subblock of interest.
[0086] Furthermore, the prediction unit 104 performs template fitting prediction on the subblock of interest, as illustrated in Figure 12, to generate a predicted image (template fitting prediction image) of the subblock of interest.
[0087] The prediction unit 104 then generates a difference image between the subblock of interest and the intra-predicted image generated for the subblock of interest, and calculates the sum of the squared values (or absolute values, for example) of the pixel values of each pixel in the difference image as the first evaluation value.
[0088] Furthermore, the prediction unit 104 generates a difference image between the subblock of interest and the inter-predicted image generated for the subblock of interest, and calculates the sum of the squared values (or absolute values, for example) of the pixel values of each pixel in the difference image as the second evaluation value.
[0089] Furthermore, the prediction unit 104 generates a difference image between the subblock of interest and the template-fitted prediction image generated for the subblock of interest, and calculates the sum of the squared values (or absolute values, for example) of the pixel values of each pixel in the difference image as the third evaluation value.
[0090] The prediction unit 104 then identifies the smallest evaluation value among the first evaluation value, second evaluation value, and third evaluation value, and determines the prediction mode of the predicted image for which the smallest evaluation value was calculated as the prediction mode of the subblock of interest.
[0091] For example, if the smallest of the first evaluation value, second evaluation value, and third evaluation value is the first evaluation value, the prediction unit 104 determines that the prediction image for which the first evaluation value was calculated is an intra-predicted image, and therefore the prediction mode of the subblock of interest is the intra-prediction mode.
[0092] For example, if the second evaluation value is the smallest of the first evaluation value, second evaluation value, and third evaluation value, the prediction unit 104 determines that the prediction mode of the subblock of interest is the inter-prediction mode, because the predicted image from which the second evaluation value was calculated is an inter-prediction image.
[0093] For example, if the third evaluation value is the smallest of the first evaluation value, second evaluation value, and third evaluation value, the prediction unit 104 determines that the prediction mode of the subblock of interest is the template-fitting prediction mode, because the predicted image for which the third evaluation value was calculated is a template-fitting prediction image.
[0094] The prediction unit 104 then outputs prediction information including the prediction mode of the subblock of interest. If the prediction unit 104 determines that the prediction mode of the subblock of interest is the intra-prediction mode, it takes the difference between the subblock of interest and the intra-prediction image as the prediction error of the subblock of interest.
[0095] Furthermore, if the prediction unit 104 determines that the prediction mode of the subblock of interest is the interprediction mode, it sets the difference between the subblock of interest and the interprediction image as the prediction error of the subblock of interest.
[0096] Furthermore, if the prediction unit 104 determines that the prediction mode for the subblock of interest is the template-fitting prediction mode, it sets the difference between the subblock of interest and the template-fitting prediction image as the prediction error for the subblock of interest.
[0097] In step S307, the conversion / quantization unit 105 performs an orthogonal transformation on the prediction error data calculated in step S306 to generate conversion coefficients. Next, the conversion / quantization unit 105 decides whether or not to perform a low-frequency transformation on the orthogonal transformation coefficients generated in the subblock, and generates this information as low-frequency transformation information.
[0098] If it is decided to apply low-frequency conversion, the conversion / quantization unit 105 applies low-frequency conversion to the low-frequency component of the orthogonal conversion coefficient to generate low-frequency conversion coefficients, and then performs quantization processing based on the quantization matrix control information generated in step S301 and the quantization matrix generated and held in step S302 to generate quantization coefficients. Specifically, if the quantization matrix control information indicates 0, quantization processing is performed using the quantization matrix. In this case, the conversion / quantization unit 105 selects one of the quantization matrices generated and held in step S302 according to the prediction mode, quantizes the low-frequency conversion coefficient using the selected quantization matrix, and generates quantization coefficients. In this embodiment, the quantization matrix shown in Figure 8A is used for subblocks where prediction processing is performed in intra-prediction, the quantization matrix shown in Figure 8B is used for subblocks where inter-prediction is performed, and the quantization matrix shown in Figure 8C is used for subblocks where template fitting prediction is performed. However, the quantization matrix used is not limited to these.
[0099] Conversely, if the quantization matrix control information indicates 1, quantization processing is performed without using the quantization matrix. In this case, all low-frequency conversion coefficients within the subblock are quantized using the same quantization scale, and quantization coefficients are generated.
[0100] On the other hand, if it is decided not to perform low-frequency conversion, the conversion / quantization unit 105 selects one of the quantization matrices generated and held in step S302 based on the prediction information, performs quantization using the selected quantization matrix, and generates quantization coefficients. In this embodiment, the quantization matrix shown in Figure 8A is used for subblocks where intra-prediction is used, the matrix shown in Figure 8B is used for subblocks where inter-prediction is used, and the matrix shown in Figure 8C is used for subblocks where template-fitted prediction is used.
[0101] In step S308, the inverse quantization / inverse conversion unit 106 first determines whether or not low-frequency conversion has been applied to the subblock based on the low-frequency conversion information generated in step S307. If low-frequency conversion has been applied to the subblock, the inverse quantization / inverse conversion unit 106 inversely quantizes the quantization coefficients based on the quantization matrix control information generated in step S301 and reconstructs the low-frequency conversion coefficients. Specifically, if the quantization matrix control information indicates 0, the inverse quantization process is performed using the quantization matrix generated and held in step S302. Similar to the conversion / quantization unit 105 in step S306, the quantization matrix corresponding to the prediction mode of the subblock is used for this inverse quantization process using the quantization matrix. Specifically, the same quantization matrix used in step S306 is used.
[0102] Conversely, if the quantization matrix control information indicates 1, inverse quantization processing is performed without using the quantization matrix. In this case, all quantization coefficients within the subblock are inversely quantized using the same quantization scale, and the low-frequency conversion coefficients are reconstructed. The inverse quantization / inverse conversion unit 106 then performs inverse low-frequency conversion processing on the reconstructed low-frequency conversion coefficients to reconstruct orthogonal conversion coefficients, and further performs inverse orthogonal conversion on the orthogonal conversion coefficients to reconstruct prediction error data.
[0103] On the other hand, if low-frequency conversion has not been performed on the subblock, the inverse quantization / inverse conversion unit 106 performs inverse quantization on the quantization coefficients generated in step S307 using the quantization matrix generated and held in step S302, and reconstructs the conversion coefficients. The inverse quantization / inverse conversion unit 106 further performs an inverse orthogonal transformation on the conversion coefficients and reconstructs the prediction error data. In this step, the same quantization matrix used in step S307 is used, and the inverse quantization process is performed.
[0104] In step S309, the image playback unit 107 generates a predicted image by appropriately referring to the frame memory 108 based on the prediction information output from the prediction unit 104 in step S306. The image playback unit 107 then generates (plays back) a replayed image (subblock) from the generated predicted image and the prediction error regenerated by the inverse quantization / inverse transformation unit 106 in step S308, and stores the generated replayed image in the frame memory 108.
[0105] In step S310, the encoding unit 110 entropy encodes the low-frequency conversion information and quantization coefficients generated by the conversion / quantization unit 105 in step S307, and the prediction information output from the prediction unit 104 in step S306, to generate image encoding data.
[0106] Then, the integrated encoding unit 111 forms a bitstream by combining the image encoding data generated by the encoding unit 110 with the header encoding data generated by the integrated encoding unit 111 in step S304, and outputs the formed bitstream.
[0107] In step S311, the control unit 150 determines whether all basic blocks in the frame have been selected as selected basic blocks. If, as a result of this determination, all basic blocks in the frame have been selected as selected basic blocks, the process proceeds to step S312. On the other hand, if there are still basic blocks in the frame that have not yet been selected as selected basic blocks, the process proceeds to step S306.
[0108] In step S312, the in-loop filter unit 109 performs in-loop filtering on the playback image generated in step S309 and stored in the frame memory 108, and then stores the playback image that has undergone the in-loop filtering back into the frame memory 108.
[0109] Furthermore, if the image encoding device performs encoding processing on a group of subsequent frames following the above-mentioned single frame, it will perform the processing in steps S305 to S312 above for each subsequent frame in the group of subsequent frames.
[0110] Thus, according to this embodiment, in particular in steps S307 and S308, by performing quantization and inverse quantization processing on the subblocks in which low-frequency conversion is used, with control over whether or not the quantization matrix is applied based on quantization matrix control information, quantization can be controlled according to the characteristics of the low-frequency conversion, thereby improving subjective image quality.
[0111] In this embodiment, a single quantization matrix control information is used to uniquely control whether or not the quantization matrix is applied to the subblock where low-frequency conversion is used, for each quantization matrix corresponding to intra-prediction, inter-prediction, and template-fit prediction. However, this is not limited to this configuration. For example, quantization matrix control information can be defined individually for each quantization matrix, and its use can be controlled individually for each quantization matrix. This allows for appropriate quantization control according to the properties of the conversion coefficients obtained by low-frequency conversion, even if the properties differ depending on the prediction mode, thereby improving subjective image quality.
[0112] Furthermore, for intra-prediction, inter-prediction, and template-fit prediction, two types of quantization matrix control information, such as first quantization matrix control information and second quantization matrix control information, may be provided to control whether or not the quantization matrix is applied. Specifically, the first quantization matrix control information controls whether or not the quantization matrix is applied to intra-prediction and subblocks using low-frequency conversion, and the second quantization matrix control information controls whether or not the quantization matrix is applied to inter-prediction, template-fit prediction, and subblocks using low-frequency conversion. This minimizes the increase in the code amount of the quantization matrix control information itself, while enabling appropriate quantization control according to the properties of different low-frequency conversion coefficients (conversion coefficients converted by low-frequency conversion) depending on the prediction mode, thereby improving subjective image quality.
[0113] Furthermore, while this embodiment describes a case in which two types of transformation processes are used—orthogonal transformations represented by discrete cosine transforms and low-frequency transformations—it is not limited to this. For example, it is also possible to use other transformation processes such as discrete sine transforms, wavelet transforms, or Carunen-Loebe transforms. In that case, the quantization matrix control information of this embodiment may be configured to control whether or not the quantization matrix is applied not only to subblocks in which low-frequency transformations are used, but also to subblocks in which transformations other than orthogonal transformations represented by discrete cosine transforms, such as discrete sine transforms or wavelet transforms. This makes it possible to control the application of a quantization matrix, which was defined on the premise that only discrete cosine transforms are used, to subblocks in which other transformation processes are used, thereby improving subjective image quality.
[0114] Furthermore, in this embodiment, a two-stage transformation process is used, in which the prediction error is orthogonally transformed and the low-frequency component of the orthogonal transformation coefficients is subjected to a low-frequency transformation. However, the system is not limited to this, and a different transformation process combining orthogonal transformation and low-frequency transformation may be used instead. In that case, the quantization matrix control information of this embodiment may be configured to control whether or not the quantization matrix is applied to the subblock in which this different transformation process is used. This makes it possible to control the application to the subblock in which this different transformation process is used instead, thereby improving subjective image quality.
[0115] Furthermore, when transformation processes other than discrete cosine transform and low-frequency transform are used, it is also acceptable to use separate quantization matrix control information for each transformation process. For example, separate quantization matrix control information can be defined, such as quantization matrix control information corresponding to low-frequency transform and quantization matrix control information corresponding to discrete sine transform. This allows for control such as enabling the application of quantization matrices to transformation processes that have similar properties to discrete cosine transforms, and restricting the application of quantization matrices to transformation processes that have different properties from discrete cosine transforms, thereby improving subjective image quality.
[0116] Furthermore, although this embodiment describes a case where the frame is an image, the frame is not limited to an image. For example, a two-dimensional array of feature data used in machine learning, such as object recognition, may be encoded as a frame and output as a bitstream. In this case, each element in this two-dimensional array can be treated as a pixel and this embodiment can be applied. This makes it possible to efficiently encode feature data used in machine learning.
[0117] [Second Embodiment] In this embodiment, an image decoding device, which is an example of a decoding device that acquires and decodes the bitstream for each frame generated by the image encoding device according to the first embodiment, will be described. In this embodiment, the differences from the first embodiment will be described, and unless otherwise specified below, it will be assumed to be the same as the first embodiment. First, an example of the functional configuration of the image decoding device according to this embodiment will be described using the block diagram in Figure 2.
[0118] The decoupling unit 202 acquires the bitstream generated by the image encoding device according to the first embodiment. The method of acquiring the bitstream by the decoupling unit 202 is not limited to a specific acquisition method. For example, the decoupling unit 202 may acquire the bitstream transmitted from the image encoding device via a network such as a LAN or the Internet. Also, if the bitstream generated by the image encoding device is stored in an external device such as a server device, the decoupling unit 202 may acquire the bitstream from the external device.
[0119] The separation and decoding unit 202 then separates coded data related to the decoding process and coefficients from the acquired bitstream, and further separates coded data present in the header portion of the bitstream.
[0120] For example, the separation / decoding unit 202 separates the quantization matrix control information code from the sequence header of the bitstream shown in Figure 6A, decodes the separated quantization matrix control information code, and reconstructs the quantization matrix control information. The separation / decoding unit 202 then supplies the reconstructed quantization matrix control information to the inverse quantization / inverse transformation unit 204.
[0121] Furthermore, the separation / decoding unit 202 extracts the coded data of the quantization matrix shown in Figures 8A to 8C from the sequence header of the bitstream shown in Figure 6A, and supplies the extracted coded data of the quantization matrix to the quantization matrix decoding unit 209.
[0122] Furthermore, the separation and decoding unit 202 extracts the coded image data at the sub-block level (decoded block level) of the basic block in the picture data of the bitstream shown in Figure 6A, and supplies the extracted coded data to the quantization matrix decoding unit 209. In this way, the separation and decoding unit 202 operates in the opposite way to the integrated encoding unit 111 described above.
[0123] The quantization matrix decoding unit 209 decodes the coded data of the quantization matrix shown in Figures 8A to 8C supplied from the separation decoding unit 202 and reconstructs the one-dimensional difference matrix shown in Figures 10A to 10C. In this embodiment, as in the first embodiment, the coded data of the quantization matrix is decoded using the coding table shown in Figures 11A and 11B, but the coding table is not limited to this, and other coding tables may be used as long as they are the same as those in the first embodiment.
[0124] The quantization matrix decoding unit 209 then reconstructs a two-dimensional quantization matrix from the reconstructed one-dimensional difference matrix in the reverse process of the process by which the encoding unit 113 generates a one-dimensional difference matrix from the quantization matrix. As a result, the quantization matrix decoding unit 209 reconstructs the quantization matrices shown in Figures 8A to 8C from the difference matrices shown in Figures 10A to 10C, respectively.
[0125] The decoding unit 203 decodes the coded data supplied from the separation decoding unit 202 and reconstructs the quantization coefficients, low-frequency conversion information, and prediction information. The decoding unit 203 then supplies the reconstructed quantization coefficients and low-frequency conversion information to the inverse quantization / inverse conversion unit 204, and supplies the reconstructed prediction information to the image reproduction unit 205.
[0126] The inverse quantization / inverse conversion unit 204 first determines whether or not low-frequency conversion has been applied to the subblock based on the low-frequency conversion information input from the decoding unit 203. If low-frequency conversion has been applied to the subblock, the inverse quantization / inverse conversion unit 204 inverse quantizes the quantization coefficients based on the quantization matrix control information input from the separation / decoding unit 202 and reconstructs the low-frequency conversion coefficients. The inverse quantization / inverse conversion unit 204 then applies an inverse low-frequency conversion process to the reconstructed low-frequency conversion coefficients to reconstruct the orthogonal conversion coefficients, and further applies an inverse orthogonal conversion to the orthogonal conversion coefficients to reconstruct the prediction error data.
[0127] Specifically, if the quantization matrix control information indicates 0, the quantization matrix decoding unit 209 performs inverse quantization processing using the reconstructed quantization matrix. Similar to the transformation / quantization unit 105 and the inverse quantization / inverse transformation unit 106, this inverse quantization processing using the quantization matrix uses a quantization matrix corresponding to the prediction mode of the subblock. Specifically, the same quantization matrix used in the transformation / quantization unit 105 and the inverse quantization / inverse transformation unit 106 is used.
[0128] Conversely, if the quantization matrix control information indicates 1, inverse quantization processing is performed without using the quantization matrix. In this case, all quantization coefficients within the subblock are inversely quantized using the same quantization scale, and the low-frequency conversion coefficients are reconstructed. The inverse quantization / inverse conversion unit 204 then performs inverse low-frequency conversion processing on the reconstructed low-frequency conversion coefficients to reconstruct orthogonal conversion coefficients, and further performs inverse orthogonal conversion on the orthogonal conversion coefficients to reconstruct prediction error data.
[0129] On the other hand, if low-frequency conversion has not been applied to the subblock, the inverse quantization / inverse transformation unit 204 selects one of the quantization matrices reconstructed by the quantization matrix decoding unit 209. The inverse quantization / inverse transformation unit 204 then inverse quantizes the input quantization coefficients using the selected quantization matrix to generate orthogonal transformation coefficients. The inverse quantization / inverse transformation unit 204 further performs an inverse orthogonal transformation to reconstruct the prediction error data and supplies the reconstructed prediction information to the image playback unit 205.
[0130] In this embodiment, the inverse quantization / inverse transformation unit 204 determines the quantization matrix to be used in the inverse quantization process according to the prediction mode of the block to be decoded, which is determined according to the prediction information reconstructed by the decoding unit 203. Specifically, the quantization matrix shown in Figure 8A is selected for subblocks where intra prediction is used, the quantization matrix shown in Figure 8B is selected for subblocks where inter prediction is used, and the quantization matrix shown in Figure 8C is selected for subblocks where template fitting prediction is used. However, the quantization matrix used is not limited to these, and may be the same as the quantization matrix used in the transformation / quantization unit 105 and the inverse quantization / inverse transformation unit 106 of the first embodiment.
[0131] The image playback unit 205 appropriately refers to the frame memory 206 based on the prediction information reproduced by the decoding unit 203, performs prediction processing for each subblock, and generates (plays back) a predicted image for that subblock. The image playback unit 205 identifies the prediction mode of the subblock by referring to the prediction information, and refers to the frame memory 206 to perform prediction processing according to the identified prediction mode to generate a predicted image for that subblock. As described in the first embodiment, there are three types of prediction modes: intra prediction, inter prediction, and template fitting prediction. The image playback unit 205 generates a predicted image for that subblock by performing the prediction processing corresponding to the prediction mode of the subblock from among these three types of prediction processing. Each of these three types of prediction processing is performed in the same manner as the prediction unit 104.
[0132] The image playback unit 205 then adds the predicted image of the subblock to the prediction error of the subblock reproduced by the inverse quantization / inverse transformation unit 204 to generate a reproduced image of the subblock, and stores the generated reproduced image in the frame memory 206. The stored reproduced image becomes a prediction reference candidate when decoding other subblocks.
[0133] The in-loop filter unit 207, like the in-loop filter unit 109, performs in-loop filtering, such as deblocking filtering and sample adaptive offsetting, on the playback image stored in the frame memory 206, and then stores the playback image with the in-loop filtering applied back into the frame memory 206.
[0134] The control unit 250 controls the operation of the entire image decoding device. For example, the control unit 250 controls the operation of each of the above-mentioned functional units in the image decoding device. As a result, each of the above-mentioned functional units in the image decoding device operates under the control of the control unit 250.
[0135] Next, the process performed by the image decoding device to decode a single frame bitstream in a video will be explained according to the flowchart in Figure 4. Since the details of the process at each step are as described above, a brief explanation will follow below.
[0136] In step S401, the decoupling unit 202 acquires the bitstream generated by the image encoding device according to the first embodiment. The decoupling unit 202 then separates the coded data related to the decoding process and coefficients from the acquired bitstream, and further separates the coded data present in the header portion of the bitstream.
[0137] For example, the separation / decoding unit 202 separates the quantization matrix control information code from the bitstream sequence header, decodes the separated quantization matrix control information code, and reconstructs the quantization matrix control information.
[0138] Furthermore, the separation / decoding unit 202 extracts the coded data of the quantization matrix from the bitstream sequence header and supplies the extracted coded data of the quantization matrix to the quantization matrix decoding unit 209.
[0139] Furthermore, the separation and decoding unit 202 extracts the coded image data of the subblock units of the basic blocks in the picture data of the bitstream and supplies the extracted coded data to the decoding unit 203.
[0140] In step S402, the quantization matrix decoding unit 209 decodes the coded data of the quantization matrix shown in Figures 8A to 8C supplied from the separation decoding unit 202 and reconstructs the one-dimensional difference matrix shown in Figures 10A to 10C. Then, the quantization matrix decoding unit 209 reconstructs the two-dimensional quantization matrix shown in Figures 8A to 8C from the reconstructed one-dimensional difference matrix in the reverse process of the process by which the encoding unit 113 generates the one-dimensional difference matrix from the quantization matrix.
[0141] In step S403, the decoding unit 203 decodes the image code data supplied from the separation decoding unit 202 and reconstructs the quantization coefficients, prediction information, and low-frequency conversion information.
[0142] In step S404, the inverse quantization / inverse conversion unit 204 first determines whether or not low-frequency conversion has been applied to the subblock based on the low-frequency conversion information generated in step S403. If low-frequency conversion has been applied to the subblock, the inverse quantization / inverse conversion unit 204 inversely quantizes the quantization coefficients based on the quantization matrix control information regenerated in step S401 and regenerates the low-frequency conversion coefficients. Specifically, if the quantization matrix control information indicates 0, the inverse quantization process is performed using the quantization matrix regenerated in step S402. Similar to the conversion / quantization unit 105 in step S307 and the inverse quantization / inverse conversion unit 106 in step S308, the quantization matrix corresponding to the prediction mode of the subblock is used for this inverse quantization process using the quantization matrix. Specifically, the same quantization matrix used in steps S306 and S307 is used. The inverse quantization / inverse conversion unit 204 then performs an inverse low-frequency conversion process on the regenerated low-frequency conversion coefficients to regenerate orthogonal conversion coefficients, and further performs an inverse orthogonal conversion on the orthogonal conversion coefficients to regenerate prediction error data.
[0143] Conversely, if the quantization matrix control information indicates 1, inverse quantization processing is performed without using the quantization matrix. In this case, all quantization coefficients within the subblock are inversely quantized using the same quantization scale, and the low-frequency conversion coefficients are reconstructed. The inverse quantization / inverse conversion unit 204 then performs inverse low-frequency conversion processing on the reconstructed low-frequency conversion coefficients to reconstruct orthogonal conversion coefficients, and further performs inverse orthogonal conversion on the orthogonal conversion coefficients to reconstruct prediction error data.
[0144] On the other hand, if low-frequency conversion has not been applied to the subblock, the inverse quantization / inverse conversion unit 204 performs inverse quantization on the quantization coefficients using the quantization matrix reconstructed in step S402 to obtain conversion coefficients. The inverse quantization / inverse conversion unit 204 further performs an inverse orthogonal transformation to reconstruct prediction error data. In this embodiment, the inverse quantization / inverse conversion unit 204 determines the quantization matrix to be used according to the prediction mode determined by the prediction information reconstructed in step S403. That is, the quantization matrix shown in Figure 8A is used for subblocks where intra-prediction is used, the quantization matrix shown in Figure 8B is used for subblocks where inter-prediction is used, and the quantization matrix shown in Figure 8C is used for subblocks where template-fitted prediction is used. However, the quantization matrix used is not limited to these, and may be the same as the quantization matrix used in steps S306 and S307 of the first embodiment.
[0145] In step S405, the image playback unit 205 appropriately refers to the frame memory 206 based on the prediction information reproduced by the decoding unit 203 in step S403, performs prediction processing for each subblock, and generates (plays back) the predicted image for that subblock.
[0146] The image playback unit 205 then adds the predicted image of the subblock to the prediction error of the subblock reproduced by the inverse quantization / inverse transformation unit 204 in step S404 to generate a reproduced image of the subblock, and stores the generated reproduced image in the frame memory 206.
[0147] In step S406, the control unit 250 determines whether the processing in steps S403 to S405 has been performed for all basic blocks in the frame. If the result of this determination is that the processing in steps S403 to S405 has been performed for all basic blocks in the frame, the process proceeds to step S407. On the other hand, if there are still basic blocks in the frame that have not yet undergone the processing in steps S403 to S405 (unprocessed basic blocks), the process proceeds to step S403, and the processing in steps S403 to S405 is performed for the unprocessed basic blocks.
[0148] In step S407, the in-loop filter unit 207 performs in-loop filtering on the playback image stored in the frame memory 206 in step S405, and stores the playback image that has undergone the in-loop filtering back into the frame memory 206.
[0149] Furthermore, the handling of the in-loop filtered playback image stored in the frame memory 206 is not limited to any specific method. For example, the control unit 250 may transmit the in-loop filtered playback image stored in the frame memory 206 to an external device via a network such as a LAN or the Internet, or it may display it on a display screen of the image decoding device.
[0150] Thus, according to this embodiment, even for subblocks generated by the image encoding device according to the first embodiment that use low-frequency conversion, it is possible to control whether or not to apply the quantization matrix based on the quantization matrix control information, thereby decoding a bitstream with improved subjective image quality.
[0151] In this embodiment, a single quantization matrix control information is used to uniquely control whether or not the quantization matrix is applied to the subblock where low-frequency conversion is used, for each quantization matrix corresponding to intra-prediction, inter-prediction, and template-fit prediction. However, this is not limited to this configuration. For example, quantization matrix control information can be defined individually for each quantization matrix, and the use of each quantization matrix can be controlled individually. This makes it possible to perform appropriate quantization control according to the properties of the conversion coefficients obtained by low-frequency conversion, even if the properties differ depending on the prediction mode, and to decode a bitstream with improved subjective image quality.
[0152] Furthermore, for intra-prediction, inter-prediction, and template-fit prediction, two types of quantization matrix control information, such as first quantization matrix control information and second quantization matrix control information, may be provided to control whether or not the quantization matrix is applied. Specifically, the first quantization matrix control information controls whether or not the quantization matrix is applied to intra-prediction and subblocks where low-frequency conversion is used, and the second quantization matrix control information controls whether or not the quantization matrix is applied to inter-prediction, template-fit prediction, and subblocks where low-frequency conversion is used. This minimizes the increase in the code amount of the quantization matrix control information itself, and allows for appropriate quantization control according to the properties of different low-frequency conversion coefficients (conversion coefficients converted by low-frequency conversion) depending on the prediction mode, thereby enabling decoding of a bitstream with improved subjective image quality.
[0153] Furthermore, while this embodiment describes a case in which two types of transformation processes are used—orthogonal transformations represented by discrete cosine transforms and low-frequency transformations—it is not limited to this. For example, it is also possible to use other transformation processes such as discrete sine transforms, wavelet transforms, or Carunen-Loebe transforms. In that case, the quantization matrix control information of this embodiment may be configured to control whether or not the quantization matrix is applied not only to subblocks in which low-frequency transformations are used, but also to subblocks in which transformations other than orthogonal transformations represented by discrete cosine transforms, such as discrete sine transforms or wavelet transforms. This makes it possible to control the application of a quantization matrix, defined on the premise that only discrete cosine transforms are used, to subblocks in which other transformation processes are used, and to decode a bitstream with improved subjective image quality.
[0154] Furthermore, in this embodiment, a two-stage transformation process is used, in which the quantization coefficients are subjected to inverse low-frequency transformation, and the low-frequency transformation coefficients generated are subjected to inverse orthogonal transformation. However, the embodiment is not limited to this, and a configuration using another inverse transformation process that combines inverse low-frequency transformation and inverse orthogonal transformation may be used instead. In that case, the quantization matrix control information of this embodiment may be configured to control whether or not the quantization matrix is applied to the subblocks to which this other inverse transformation process is used. This makes it possible to control the application to subblocks encoded using this other transformation process instead, thereby improving subjective image quality.
[0155] Furthermore, when transformation processes other than discrete cosine transform and low-frequency transform are used, it is also acceptable to use separate quantization matrix control information for each transformation process. For example, separate quantization matrix control information can be defined, such as quantization matrix control information corresponding to low-frequency transform and quantization matrix control information corresponding to discrete sine transform. This allows for control such as enabling the application of quantization matrices to transformation processes that have similar properties to discrete cosine transforms, and restricting the application of quantization matrices to transformation processes that have different properties from discrete cosine transforms, thereby enabling the decoding of bitstreams with improved subjective image quality.
[0156] Furthermore, although this embodiment describes a case where the frame is an image, the frame is not limited to an image. For example, a bitstream encoded with a two-dimensional array of feature data used in machine learning, such as object recognition, may be used as the target for decoding. In this case, each element in this two-dimensional array can be treated as a pixel and the embodiment can be applied accordingly. This makes it possible to efficiently decode a bitstream encoded with feature data used in machine learning.
[0157] Note that the image encoding device according to the first embodiment and the image decoding device according to the second embodiment may be separate devices. Alternatively, a single device may be configured having the above-described functions of the image encoding device according to the first embodiment and the above-described functions of the image decoding device according to the second embodiment.
[0158] Furthermore, the image coding device may also have an imaging unit for capturing moving images, in which case the image coding device can be implemented as an imaging device capable of capturing moving images. Such an imaging device may further have the functions of an image decoding device according to the second embodiment.
[0159] [Third Embodiment] Each of the functional units shown in Figures 1 and 2 may be implemented in hardware, or the functional units excluding frame memory 108 and frame memory 206 may be implemented in software (computer program). In the latter case, a computer device capable of executing such a computer program can be applied to an encoding device or a decoding device. Such a computer device can be a PC, smartphone, tablet terminal, or other computer device. An example of the hardware configuration of such a computer device will be explained using the block diagram in Figure 5.
[0160] The CPU 501 executes various processes using computer programs and data stored in the RAM 502. In doing so, the CPU 501 controls the operation of the entire computer system and also executes or controls the various processes described as being performed by the encoding and decoding devices.
[0161] The RAM 502 has an area for storing computer programs and data loaded from the ROM 503 and storage device 506, and an area for storing computer programs and data received from an external device via the I / F 507. Furthermore, the RAM 502 has a work area used by the CPU 501 when executing various processes. In this way, the RAM 502 can provide various areas as appropriate.
[0162] ROM 503 stores configuration data for the computer device, computer programs and data related to the startup of the computer device, computer programs and data related to the basic operation of the computer device, and so on.
[0163] The operation unit 504 is a user interface such as a keyboard, mouse, or touch panel, which allows the user to input various instructions and information to the computer device through operation.
[0164] The display unit 505 has an LCD screen or a touch panel screen and can display the processing results of the CPU 501 as images, text, etc. The display unit 505 may also be a projection device such as a projector that projects images and text.
[0165] The storage device 506 is a large-capacity information storage device such as a hard disk drive or an SSD. The storage device 506 stores the OS (operating system), computer programs and data for the CPU 501 to execute or control the various processes described as being performed by the encoding device and decoding device. The data stored in the storage device 506 may include frames to be encoded and bitstreams to be decoded. The frame memory 108 and frame memory 206 described above can be implemented using memory devices such as RAM 502 and storage device 506.
[0166] I / F 507 may include a communication interface for a computer device to communicate data with external devices via a network such as a LAN or the Internet. I / F 507 may also include an interface for connecting external devices such as display devices, memory devices, and projectors to the computer device.
[0167] The CPU 501, RAM 502, ROM 503, operation unit 504, display unit 505, storage device 506, and I / F 507 are all connected to the system bus 508. Note that the hardware configuration shown in Figure 5 is merely one example of a hardware configuration for a computer device applicable to an encoding device or decoding device, and can be modified or changed as appropriate.
[0168] In the above configuration, when the power to the computer device is turned ON, the CPU 501 executes the boot program in the ROM 503, loads the OS stored in the storage device 506 into the RAM 502, and starts the OS. As a result, the computer device becomes capable of communication via the I / F 507 and functions as an encoding device and a decoding device. Then, under the control of the OS, the CPU 501 loads an application related to encoding (an application corresponding to the flowchart in Figure 3) from the storage device 506 into the RAM 502 and executes it, so that the CPU 501 functions as each of the functional units in Figure 1 (excluding the frame memory 108), and the computer device functions as an encoding device. On the other hand, the CPU 501 loads an application related to image decoding (an application corresponding to the flowchart in Figure 4) from the storage device 506 into the RAM 502 and executes it, so that the CPU 501 functions as each of the functional units in Figure 2 (excluding the frame memory 206), and the computer device functions as a decoding device.
[0169] The numerical values, processing timing, processing order, processing entity, data (information) structure / acquisition method / destination / source / storage location, etc., used in the above embodiment are given as examples for the purpose of providing a concrete explanation, and are not intended to limit the scope to such examples.
[0170] Furthermore, some or all of the embodiments described above may be used in appropriate combinations. Also, some or all of the embodiments described above may be used selectively. (Other embodiments) The present invention can also be realized by supplying a program that implements one or more of the functions of the above embodiments to a system or device via a network or storage medium, and by having one or more processors in the computer of the system or device read and execute the program. It can also be realized by a circuit (for example, an ASIC) that implements one or more functions.
[0171] The technical ideas derived from this disclosure are not limited to the exemplary embodiments disclosed, but are intended to encompass various modifications of the exemplary embodiments, or substitutions with equivalent structures or functions. The scope of the following claims should be interpreted in the broadest way to encompass all such modifications and equivalent structures and functions.
[0172] This application claims priority based on Japanese Patent Application No. 2024-229383, filed on December 25, 2024, and all of its contents are incorporated herein by reference.
Claims
1. An image encoding device for encoding image data in block units, comprising: prediction means for generating a predicted image using pixels from a picture different from the picture to which a block of interest of a predetermined size to be encoded in the image belongs, and generating a prediction error which is the difference between the block of interest and the predicted image; conversion means for frequency-converting the prediction error obtained by the prediction means to generate conversion coefficients; quantization means for quantizing the conversion coefficients obtained by the conversion means to generate quantized conversion coefficients; and encoding means for entropy-encoding the quantized conversion coefficients obtained by the quantization means, wherein the conversion means further comprises a first conversion means and a second conversion means, and the quantization means quantizes the conversion coefficients generated using only the second conversion means using a quantization matrix to generate quantized conversion coefficients, and quantizes the conversion coefficients generated using both the first conversion means and the second conversion means without using a quantization matrix to derive the quantized conversion coefficients.
2. The image encoding apparatus according to claim 1, characterized in that the encoding means further encodes control information that controls whether the quantization means quantizes the conversion coefficients generated using both the first conversion means and the second conversion means using a quantization matrix or quantizes them without using a quantization matrix.
3. A control method for an image encoding device that encodes an image, comprising: a prediction step of generating a predicted image using pixels from a picture different from the picture to which a block of interest of a predetermined size to be encoded in the image belongs, and generating a prediction error which is the difference between the block of interest and the predicted image; a conversion step of frequency-converting the prediction error obtained in the prediction step to generate conversion coefficients; a quantization step of quantizing the conversion coefficients obtained in the conversion step to generate quantized conversion coefficients; and an encoding step of entropy-encoding the quantized conversion coefficients obtained in the quantization step, wherein the conversion step further comprises a first conversion step and a second conversion step, and the quantization step is characterized in that it quantizes the conversion coefficients generated using only the second conversion step using a quantization matrix to generate quantized conversion coefficients, and quantizes the conversion coefficients generated using both the first conversion step and the second conversion step without using a quantization matrix to derive the quantized conversion coefficients.
4. An image decoding device for decoding image data in block units, comprising: decoding means for decoding quantized conversion coefficients; inverse quantization means for inversely quantizing the quantized conversion coefficients in order to derive the conversion coefficients; inverse transformation means for inversely transforming the conversion coefficients in order to derive a prediction error; and prediction means for generating a prediction image and reconstructing a block of interest using the prediction image and the prediction error, wherein the prediction means generates the prediction image using pixels from a picture different from the picture to which the block to be decoded belongs; the inverse transformation means further comprises a first inverse transformation means and a second inverse transformation means; the inverse quantization means derives the conversion coefficients by inversely quantizing the quantized conversion coefficients of a block from which the prediction error is derived using only the second inverse transformation means, using a quantization matrix; and derives the conversion coefficients by inversely quantizing the quantized conversion coefficients of a block from which the prediction error is derived using both the first inverse transformation means and the second inverse transformation means, without using a quantization matrix.
5. The image decoding apparatus according to claim 4, characterized in that the decoding means further decodes control information that controls whether the inverse quantization means derives the quantized conversion coefficients of a block from which the prediction error is derived using both the first inverse transformation means and the second inverse transformation means by inverse quantization using a quantization matrix to derive the conversion coefficients, or by inverse quantization without using a quantization matrix to derive the conversion coefficients.
6. A control method for an image decoding device that decodes image data in block units, comprising: a decoding step of decoding quantized conversion coefficients; an inverse quantization step of inversely quantizing the quantized conversion coefficients in order to derive the conversion coefficients; an inverse transformation step of inversely transforming the conversion coefficients in order to derive a prediction error; and a prediction step of generating a prediction image and reconstructing a block of interest using the prediction image and the prediction error, wherein the prediction step generates the prediction image using pixels from a picture different from the picture to which the block to be decoded belongs; the inverse transformation step further comprises a first inverse transformation step and a second inverse transformation step; the inverse quantization step derives the conversion coefficients by inversely quantizing the quantized conversion coefficients of a block from which the prediction error is derived using only the second inverse transformation step, using a quantization matrix; and derives the conversion coefficients by inversely quantizing the quantized conversion coefficients of a block from which the prediction error is derived using both the first inverse transformation step and the second inverse transformation step, without using a quantization matrix.
7. A computer program for causing a computer to function as each means of the image encoding apparatus described in claim 1 or 2.
8. A computer program for causing a computer to function as each of the means of the image decoding apparatus described in claim 4 or 5.