Video encoding / decoding method and apparatus
The CIIP mode enhances video encoding/decoding efficiency by generating predicted blocks through weighted averaging of intra- and inter-prediction signals, addressing the inefficiencies in existing technologies with increasing video data.
Patent Information
- Authority / Receiving Office
- JP · JP
- Patent Type
- Patents
- Current Assignee / Owner
- HYUNDAI MOTOR CO LTD
- Filing Date
- 2022-06-30
- Publication Date
- 2026-06-19
AI Technical Summary
Existing video compression technologies struggle to efficiently handle increasing video size, resolution, and frame rate, necessitating more efficient compression methods with improved image quality.
A video encoding/decoding method and apparatus utilizing Combined Inter/Intra Prediction (CIIP) mode, which generates a predicted block by weighted-averaging intra- and inter-prediction signals, determining various intra-prediction modes and weighting values, and transmitting these values to enhance encoding/decoding efficiency.
Improves video encoding/decoding efficiency by generating and transmitting predicted blocks effectively, thereby addressing the challenges of increasing video data volumes.
Smart Images

Figure 0007876559000008 
Figure 0007876559000009 
Figure 0007876559000010
Abstract
Description
[Technical Field]
[0001] The present invention relates to a video encoding / decoding method and apparatus, and more particularly to a video encoding / decoding method and apparatus that generates a predicted block of the current block using a combined inter / intra prediction (CIIP) mode. [Background technology]
[0002] The content described below merely provides background information related to this embodiment and does not constitute prior art.
[0003] Because video data contains a much larger amount of data than audio or still image data, storing or transmitting it without compression requires significant hardware resources, including memory.
[0004] Therefore, when storing or transmitting video data, an encoder is typically used to compress the video data before storage or transmission, and a decoder receives the compressed video data, decompresses it, and plays it back. Such video compression technologies include H.264 / AVC and HEVC (High Efficiency Video Coding), as well as VVC (Versatile Video Coding), which improves encoding efficiency by more than 30% compared to HEVC.
[0005] However, as video size, resolution, and frame rate gradually increase, the amount of data to be encoded also increases accordingly. Therefore, there is a need for new compression technologies that are more efficient than conventional compression technologies and offer greater image quality improvement.
[0006] The Combined Inter / Intra Prediction (CIIP) mode is a method of generating a predicted block of a current block by weighted-averaging an intra-prediction signal and an inter-prediction signal. When performing combined inter / intra prediction, it is necessary to use various intra-prediction modes and various weighting values. Summary of the Invention Problems to be Solved by the Invention
[0007] An object of the present disclosure is to provide a method and apparatus for generating a predicted block of a current block based on the Combined Inter / Intra Prediction (CIIP) mode.
[0008] Furthermore, an object of the present disclosure is to provide a method and apparatus for determining various intra-prediction modes in the combined inter / intra prediction mode.
[0009] In addition, an object of the present disclosure is to provide a method and apparatus for determining various weighting values in the combined inter / intra prediction mode.
[0010] Furthermore, an object of the present disclosure is to provide a method and apparatus for transmitting various weighting values in the combined inter / intra prediction mode.
[0011] In addition, an object of the present disclosure is to provide a method and apparatus for improving video encoding / decoding efficiency.
[0012] Furthermore, an object of the present disclosure is to provide a recording medium storing a bitstream generated by the video encoding / decoding method or apparatus of the present disclosure.
[0013] In addition, an object of the present disclosure is to provide a method and apparatus for transmitting a bitstream generated by the video encoding / decoding method or apparatus of the present disclosure. Means for Solving the Problems
[0014] The video decoding method according to this disclosure includes the steps of: generating an inter-prediction block for the current block based on a reference block present in the reference picture corresponding to the current block; generating an intra-prediction block for the current block based on the reference block and a first reference pixel adjacent to the reference block; inducing weight values to be assigned to the inter-prediction block and the intra-prediction block based on the fact that the current block is not used in distortion calculation; and generating a CIIP (Combined Inter Intra Prediction) prediction block for the current block based on the weight values, the inter-prediction block and the intra-prediction block.
[0015] In the video decoding method according to the present disclosure, the step of generating an intra-prediction block for the current block includes the step of generating a first intra-prediction block based on a first reference pixel adjacent to the reference block, the step of inducing an intra-prediction mode based on the distortion of the reference block and the first intra-prediction block, and the step of generating an intra-prediction block based on the intra-prediction mode and a second reference pixel adjacent to the current block.
[0016] In the video decoding method relating to this disclosure, the weighting value is derived based on the intra-predictive coding and inter-predictive coding of the surrounding blocks adjacent to the current block.
[0017] In the video decoding method relating to this disclosure, the step of inducing weight values to be assigned to the inter-prediction block and the intra-prediction block includes the step of inducing distortion of the inter-prediction signal, the step of inducing distortion of the intra-prediction signal, and the step of inducing the weight values based on the distortion of the inter-prediction signal and the distortion of the intra-prediction signal.
[0018] In the video decoding method according to this disclosure, the distortion of the interpretation signal is induced based on the difference between a second reference pixel adjacent to the current block and a first reference pixel adjacent to the reference block.
[0019] In the video decoding method according to this disclosure, the step of inducing distortion of the intra-prediction signal includes the step of generating a second intra-prediction block based on the intra-prediction mode and a first reference pixel adjacent to the reference block, and the step of inducing distortion of the intra-prediction signal based on the distortion of the reference block and the second intra-prediction block.
[0020] In the video decoding method according to this disclosure, the step of inducing distortion of the intra-prediction signal includes the step of generating a third intra-prediction block based on the intra-prediction mode and a second reference pixel adjacent to the current block, and the step of inducing distortion of the intra-prediction signal based on the distortion of the reference block and the third intra-prediction block.
[0021] In the video decoding method according to this disclosure, the step of inducing distortion of the intra-prediction signal includes the step of generating a fourth intra-prediction block based on a planar mode and a second reference pixel adjacent to the current block, and the step of inducing distortion of the intra-prediction signal based on the distortion of the reference block and the fourth intra-prediction block.
[0022] The video decoding method according to this disclosure further includes the steps of obtaining weights to be assigned to the inter-prediction block and the intra-prediction block, based on using the current block for distortion calculation, and generating a CIIP prediction block of the current block based on the weights, the inter-prediction block and the intra-prediction block, wherein the weights are obtained based on index information mapped to the weights.
[0023] In the video decoding method relating to this disclosure, the weighted value is derived based on at least one of the intra-prediction error distribution and the inter-prediction error distribution.
[0024] The video encoding method according to this disclosure includes the steps of: generating an inter-prediction block for the current block based on a reference block present in a reference picture corresponding to the current block; generating an intra-prediction block for the current block based on the reference block and a first reference pixel adjacent to the reference block; determining weight values to be assigned to the inter-prediction block and the intra-prediction block; and generating a CIIP prediction block for the current block based on the weight values, the inter-prediction block and the intra-prediction block.
[0025] In the video coding method according to the present disclosure, the step of generating an intra-prediction block for the current block includes the step of generating a first intra-prediction block based on a first reference pixel adjacent to the reference block, the step of determining an intra-prediction mode based on the distortion of the reference block and the first intra-prediction block, and the step of generating an intra-prediction block based on the intra-prediction mode and a second reference pixel adjacent to the current block.
[0026] In the video coding method relating to this disclosure, the weighting value is determined based on the intra-predictive coding and inter-predictive coding of the surrounding blocks adjacent to the current block.
[0027] In the video coding method relating to this disclosure, the step of determining the weight values to be assigned to the inter-prediction block and the intra-prediction block includes the step of determining the distortion of the inter-prediction signal, the step of determining the distortion of the intra-prediction signal, and the step of determining the weight values based on the distortion of the inter-prediction signal and the distortion of the intra-prediction signal.
[0028] In the video coding method according to the present disclosure, the distortion of the intra-prediction signal is determined on the basis of the difference between a second reference pixel adjacent to the current block and a first reference pixel adjacent to the reference block, and the step of determining the distortion of the intra-prediction signal includes the step of generating a second intra-prediction block on the basis of the intra-prediction mode and the first reference pixel adjacent to the reference block, and the step of determining the distortion of the intra-prediction signal on the basis of the distortion of the reference block and the second intra-prediction block.
[0029] In the video coding method according to this disclosure, the step of determining the distortion of the intra-prediction signal includes the step of generating a third intra-prediction block based on the intra-prediction mode and a second reference pixel adjacent to the current block, and the step of determining the distortion of the intra-prediction signal based on the distortion of the reference block and the third intra-prediction block.
[0030] In the video coding method according to the present disclosure, the step of determining the distortion of the intra-prediction signal includes the step of generating a fourth intra-prediction block based on a planar mode and a second reference pixel adjacent to the current block, and the step of determining the distortion of the intra-prediction signal based on the distortion of the reference block and the fourth intra-prediction block.
[0031] In the video coding method according to the present disclosure, the distortion of the intra-prediction signal is determined on the basis of the difference between the current block and the reference block, and the step of determining the distortion of the intra-prediction signal includes the step of generating a third intra-prediction block on the basis of the intra-prediction mode and a second reference pixel adjacent to the current block, and the step of determining the distortion of the intra-prediction signal on the basis of the distortion of the current block and the third intra-prediction block.
[0032] In the video coding method according to the present disclosure, the step of determining the distortion of the intra-prediction signal includes the step of generating a fourth intra-prediction block based on a planar mode and a second reference pixel adjacent to the current block, and the step of determining the distortion of the intra-prediction signal based on the distortion of the current block and the fourth intra-prediction block.
[0033] The video encoding method relating to this disclosure further includes the step of encoding an index mapped to the weighted values.
[0034] Furthermore, this disclosure provides a method for transmitting a bitstream generated by a video encoding method or apparatus relating to this disclosure.
[0035] Furthermore, this disclosure provides a recording medium that stores a bitstream generated by a video encoding method or apparatus relating to this disclosure.
[0036] Furthermore, according to this disclosure, a recording medium is provided which stores a bitstream that is received and decoded by the video decoding device related to this disclosure and used for image restoration. [Effects of the Invention]
[0037] According to this disclosure, a method and apparatus for generating a predicted block for the current block based on a combined inter / intra prediction (CMIP) mode are provided.
[0038] Furthermore, this disclosure provides a method and apparatus for determining various in-screen prediction modes in a combined in-screen inter-screen prediction mode.
[0039] Furthermore, this disclosure provides a method and apparatus for determining various weighting values in a combined in-screen inter-screen prediction mode.
[0040] Furthermore, this disclosure provides a method and apparatus for transmitting various weighted values in a combined in-screen inter-screen prediction mode.
[0041] Furthermore, this disclosure provides a method and apparatus for improving video encoding / decoding efficiency.
[0042] The effects derived from this disclosure are not limited to those mentioned above, and other effects not mentioned above will be clearly understood by a person with ordinary skill in the art to which this disclosure pertains from the following description. [Brief explanation of the drawing]
[0043] [Figure 1] This is an illustrative block diagram relating to a video encoding device that embodies the technology of this disclosure. [Figure 2] This diagram illustrates how to divide a block using the QTBTTTT structure. [Figure 3a] This figure shows multiple intra-prediction modes, including a wide-angle intra-prediction mode. [Figure 3b] This figure shows multiple intra-prediction modes, including a wide-angle intra-prediction mode. [Figure 4] This is an illustrative diagram showing the surrounding blocks of the current block. [Figure 5] This is an exemplary block diagram of an image decoding device that can embody the technology of this disclosure. [Figure 6] This figure illustrates a method for generating a predicted block for the current block in Combined Inter / Intra Prediction (CMIP) mode according to one embodiment of the present disclosure. [Figure 7] This figure illustrates a peripheral block referenced to determine weighted values in a combined in-screen inter-screen prediction mode according to one embodiment of the present disclosure. [Figure 8] This figure illustrates a method for determining weighted values in a combined in-screen inter-screen prediction mode according to one embodiment of the present disclosure. [Figure 9]This figure illustrates a method for using various in-screen prediction modes in a combined in-screen inter-screen prediction mode according to one embodiment of the present disclosure. [Figure 10] This figure illustrates the process of determining various in-screen prediction modes using combined in-screen and inter-screen prediction modes according to one embodiment of the present disclosure. [Figure 11] This figure illustrates a peripheral block referenced to determine weighted values in a combined in-screen inter-screen prediction mode, according to another embodiment of the present disclosure. [Figure 12] This figure illustrates a method for determining weighted values in a combined in-screen inter-screen prediction mode, according to another embodiment of the present disclosure. [Figure 13] This figure illustrates a method for determining weighted values in a combined in-screen inter-screen prediction mode, according to yet another embodiment of the present disclosure. [Figure 14] This figure illustrates the weighting values by index in other embodiments of the present disclosure. [Figure 15] This figure illustrates a method for assigning a fixed-length code to a weighted value index according to one embodiment of the present disclosure. [Figure 16] This figure illustrates a method for assigning phased-in codes to weighted indexes according to one embodiment of the present disclosure. [Figure 17] This figure illustrates a method for assigning a variable-length code to a weighted value index according to one embodiment of the present disclosure. [Figure 18a] This figure illustrates the error distribution of inter-screen prediction and intra-screen prediction according to one embodiment of the present disclosure. [Figure 18b] This figure illustrates the error distribution of inter-screen prediction and intra-screen prediction according to one embodiment of the present disclosure. [Figure 19a] This figure illustrates the weighted values for in-screen prediction and inter-screen prediction of 8x8 blocks according to one embodiment of the present disclosure. [Figure 19b]This figure illustrates the weighted values for in-screen prediction and inter-screen prediction of 8x8 blocks according to one embodiment of the present disclosure. [Figure 19c] This figure illustrates the weighted values for in-screen prediction of an 8x8 size block in another embodiment of the present disclosure. [Figure 19d] This figure illustrates the weighted values for in-screen prediction of an 8x8 size block in another embodiment of the present disclosure. [Figure 20] This is a diagram illustrating the video decoding process according to one embodiment of the present disclosure. [Figure 21] This is a diagram illustrating a video encoding process according to one embodiment of the present disclosure. [Modes for carrying out the invention]
[0044] Hereinafter, embodiments of the present invention will be described in detail with reference to illustrative drawings. When assigning reference numerals to the components in each drawing, it should be noted that identical components will have the same reference numeral whenever possible, even if they are shown in other drawings. In describing these embodiments, if a specific description of a related known configuration or function would obscure the gist of these embodiments, such a detailed description will be omitted.
[0045] Figure 1 is an illustrative block diagram of a video encoding device embodying the technology of this disclosure. The video encoding device and its sub-configurations will be described below with reference to Figure 1.
[0046] The video encoding device is configured to include a picture splitting unit 110, a prediction unit 120, a subtractor 130, a conversion unit 140, a quantization unit 145, a sorting unit 150, an entropy encoding unit 155, an inverse quantization unit 160, an inverse conversion unit 165, an adder 170, a loop filter unit 180, and a memory 190.
[0047] Each component of the video encoding device may be embodied in hardware or software, or in a combination of hardware and software. Furthermore, the function of each component may be embodied in software, and a microprocessor may be configured to execute the software function corresponding to each component.
[0048] A single video consists of one or more sequences containing multiple pictures. Each picture is divided into multiple regions, and encoding is performed on each region. For example, a single picture is divided into one or more tiles and / or slices. Here, one or more tiles are defined as a tile group. Each tile and / or slice is divided into one or more Coding Tree Units (CTUs). Each CTU is then divided into one or more Coding Units (CUs) by a tree structure. Information applicable to each CU is encoded as the CU syntax, and information applicable to all CUs contained within a single CTU is encoded as the CTU syntax. Furthermore, information applicable to all blocks within a slice is encoded as the slice header syntax, and information applicable to all blocks constituting one or more pictures is encoded in the Picture Parameter Set (PPS) or picture header. In addition, information commonly referenced by multiple pictures is encoded in the Sequence Parameter Set (SPS). Information that one or more SPSs refer to in common is encoded in a Video Parameter Set (VPS). Furthermore, information that applies in common to a single tile or tile group may be encoded as the syntax of the tile or tile group header. The syntax contained in the SPS, PPS, slice header, tile, or tile group header is referred to as high-level syntax.
[0049] The picture segmentation unit 110 determines the size of the Coding Tree Unit (CTU). Information regarding the size of the CTU (CTU size) is encoded as SPS or PPS syntax and transmitted to the video decoding device.
[0050] The picture division unit 110 divides each picture constituting the video into multiple Coding Tree Units (CTUs) of predetermined sizes, and then recursively divides the CTUs using a tree structure. In the tree structure, the leaf nodes become the basic units of encoding, which are called Coding Units (CUs).
[0051] In tree structures, there are quad trees (QT) where the top node (or parent node) is divided into four lower nodes (or child nodes) of the same size, binary trees (BT) where the top node is divided into two lower nodes, or ternary trees (TT) where the top node is divided into three lower nodes in a 1:2:1 ratio, or structures that combine two or more of these QT, BT, and TT structures. For example, a QTBT (Quad Tree plus Binary Tree) structure may be used, or a QTBTTT (Quad Tree plus Binary Tree Ternary Tree) structure may be used. Here, BTTT are collectively called MTT (Multiple-Type Tree).
[0052] Figure 2 is a diagram illustrating how to divide a block using the QTBTTT structure.
[0053] As shown in Figure 2, the CTU is initially split into a QT structure. Quad-tree splitting is repeated until the size of the splitting block reaches the minimum block size of a leaf node allowed in the QT, MinQTSize. A first flag, QT_split_flag, which indicates whether each node in the QT structure is split into four lower-layer nodes, is encoded by the entropy encoding unit 155 and signaled to the video decoder. If the leaf nodes of the QT are not larger than the maximum block size of a root node allowed in the BT, they are further split into one or more BT or TT structures. In the BT and / or TT structures, there are multiple splitting directions. For example, there are two directions in which the block of the node in question is split: horizontally and vertically. As shown in Figure 2, when MTT splitting is initiated, a second flag, mtt_split_flag, indicating whether or not a node has been split, and, if split, additional flags indicating the splitting direction (vertical or horizontal) and / or the splitting type (binary or terrary), are encoded by the entropy encoding unit 155 and signaled to the video decoding device.
[0054] Alternatively, before encoding the first flag QT_split_flag, which indicates whether each node will be split into four lower layer nodes, the CU split flag split_cu_flag, which indicates whether the node will be split, may be encoded. If the value of the CU split flag split_cu_flag indicates that the node will not be split, the block of the node becomes a leaf node in the split tree structure and becomes a coding unit (CU), which is the basic unit of encoding. If the value of the CU split flag split_cu_flag indicates that the node will be split, the video encoding device starts encoding from the first flag in the manner described above.
[0055] When QTBT is used as another example of a tree structure, there are two types: one in which the block of the node is divided horizontally into two blocks of the same size (i.e., symmetric horizontal splitting) and another in which it is divided vertically (i.e., symmetric vertical splitting). A splitting flag, split_flag, indicating whether each node in the BT structure is to be split into a lower layer block, and splitting type information indicating the type of splitting, are encoded by the entropy encoding unit 155 and transmitted to the video decoding device. On the other hand, there may be an additional type in which the block of the node is divided into two blocks in an asymmetrical manner. The asymmetrical form may include a form in which the block of the node is divided into two rectangular blocks having a size ratio of 1:3, or a form in which the block of the node is divided diagonally.
[0056] CUs can have various sizes depending on the QTBT or QTBTTT partitioning from the CTU. Hereafter, the block corresponding to the CU to be encoded or decoded (i.e., a leaf node in the QTBTTT) will be referred to as the "current block." Depending on the QTBTTT partitioning used, the shape of the current block may be a rectangle as well as a square.
[0057] The prediction unit 120 predicts the current block and generates a predicted block. The prediction unit 120 includes an intra-prediction unit 122 and an inter-prediction unit 124.
[0058] Generally, each current block in a picture is coded predictively. Generally, the prediction of the current block is performed using intra-prediction techniques (using data from the picture containing the current block) or inter-prediction techniques (using data from pictures coded before the picture containing the current block). Inter-prediction includes both one-way and two-way prediction.
[0059] The intra-prediction unit 122 predicts pixels within the current block using pixels (reference pixels) located around the current block within the current picture that contains the current block. Multiple intra-prediction modes exist depending on the prediction direction. For example, as shown in Figure 3a, the multiple intra-prediction modes include two non-directional modes, including the planar mode and the DC mode, and 65 directional modes. Each prediction mode is defined to use different surrounding pixels and calculation formulas.
[0060] For efficient directional prediction of rectangular current blocks, additional directional modes (67-80, -1-14 intra-prediction modes) are used, as shown by dashed arrows in Figure 3b. These are referred to as "wide-angle intra-prediction modes." In Figure 3b, the arrows point to the corresponding reference samples used for prediction, not to the prediction direction. The prediction direction is opposite to the direction indicated by the arrow. Wide-angle intra-prediction modes are modes that perform prediction in the opposite direction of a specific directional mode without additional bit transmission when the current block is rectangular. In this case, from among the wide-angle intra-prediction modes, some of the available wide-angle intra-prediction modes for the current block are determined by the ratio of the width to height of the rectangular current block. For example, wide-angle intra-prediction modes with angles smaller than 45 degrees (intra-prediction modes 67-80) are available when the current block is in the shape of a rectangle where the height is smaller than the width, and wide-angle intra-prediction modes with angles greater than -135 degrees (intra-prediction modes -1-14) are available when the current block is in the shape of a rectangle where the width is larger than the height.
[0061] The intra-prediction unit 122 determines the intra-prediction mode to use to encode the current block. In some examples, the intra-prediction unit 122 may encode the current block using various intra-prediction modes and select the appropriate intra-prediction mode to use from the tested modes. For example, the intra-prediction unit 122 may calculate bitrate distortion values using bitrate distortion analysis for various tested intra-prediction modes and select the intra-prediction mode with the best bitrate distortion characteristics among the tested modes.
[0062] The intra-prediction unit 122 selects one intra-prediction mode from among several intra-prediction modes and predicts the current block using the surrounding pixels (reference pixels) and calculation formula determined by the selected intra-prediction mode. Information regarding the selected intra-prediction mode is encoded by the entropy coding unit 155 and transmitted to the video decoding device.
[0063] The interpretation unit 124 generates a predicted block for the current block using a motion compensation process. The interpretation unit 124 searches for the block most similar to the current block in the reference picture, which has been encoded and decoded before the current picture, and generates a predicted block for the current block using the found block. It then generates a motion vector (MV) corresponding to the displacement between the current block in the current picture and the predicted block in the reference picture. Generally, motion estimation is performed on the luma component, and the motion vector calculated based on the luma component is used for both the luma and chroma components. Motion information, including information about the reference picture used to predict the current block and information about the motion vector, is encoded by the entropy encoding unit 155 and transmitted to the video decoding device.
[0064] The interpretation unit 124 may perform interpolation on a reference picture or reference block to improve the accuracy of the prediction. That is, a subsample between two consecutive integer samples is interpolated by applying a filter coefficient to a plurality of consecutive integer samples that include those two integer samples. When the interpolated reference picture is used to search for the block most similar to the current block, the motion vector is expressed with precision in decimal units rather than in integer samples. The precision or resolution of the motion vector is set differently for each unit of the target area to be encoded, such as a slice, tile, CTU, CU, etc. When such Adaptive Motion Vector Resolution (AMVR) is applied, information about the motion vector resolution applied to each target area must be signaled for each target area. For example, if the target area is a CU, information about the motion vector resolution applied to each CU is signaled. The information about the motion vector resolution is information indicating the precision of the differential motion vector, which will be described later.
[0065] On the other hand, the inter-prediction unit 124 performs inter-prediction using bi-prediction. In the case of bi-prediction, two reference pictures and two motion vectors representing the block position most similar to the current block within each reference picture are used. The inter-prediction unit 124 selects a first reference picture and a second reference picture from reference picture list 0 (RefPicList0) and reference picture list 1 (RefPicList1), respectively, and searches for a block similar to the current block within each reference picture to generate a first reference block and a second reference block. Then, it generates a predicted block for the current block by averaging or weighting the first and second reference blocks. Finally, it transmits motion information, including information about the two reference pictures used to predict the current block and information about the two motion vectors, to the encoding unit 150. Here, reference picture list 0 consists of previously restored pictures that are prior to the current picture in display order, and reference picture list 1 consists of previously restored pictures that are from the current picture onwards in display order. However, it is not necessarily limited to this, and previously restored pictures that are from the current picture onwards in display order may be added to reference picture list 0, and conversely, previously restored pictures that are prior to the current picture may be added to reference picture list 1.
[0066] Various methods are used to minimize the number of bits required to encode motion information.
[0067] For example, if the reference picture and motion vector of the current block are the same as those of a surrounding block, the motion information of the current block can be transmitted to the video decoding device by encoding information that identifies that surrounding block. This method is called "merge mode".
[0068] In merge mode, the interpretation unit 124 selects a predetermined number of merge candidate blocks (hereinafter referred to as "merge candidates") from the surrounding blocks of the current block.
[0069] As surrounding blocks used to guide merge candidates, as shown in Figure 4, all or part of the left block A0, lower left block A1, upper block B0, upper right block B1, and upper left block A2 adjacent to the current block in the current picture are used. Furthermore, blocks located in a reference picture (which may or may not be the same as the reference picture used to predict the current block) rather than the current picture in which the current block is located may be used as merge candidates. For example, a block in the reference picture that is in the same position as the current block (a co-located block) or a block adjacent to that block in the same position may be additionally used as a merge candidate. If the number of merge candidates selected by the method described above is less than a predetermined number, the 0 vector is added to the merge candidates.
[0070] The interpretation unit 124 constructs a merge list containing a predetermined number of merge candidates using such surrounding blocks. From the merge candidates included in the merge list, it selects a merge candidate to be used as the movement information of the current block and generates merge index information to identify the selected candidate. The generated merge index information is encoded by the encoding unit 150 and transmitted to the video decoding device.
[0071] The merge skip mode is a special case of the merge mode in which, after quantization, when all transformation coefficients for entropy coding are close to zero, only peripheral block selection information is transmitted without transmitting residual signals. By using the merge skip mode, relatively high coding efficiency can be achieved for videos with little motion, still images, and screen content.
[0072] Hereafter, merge mode and merge skip mode will be collectively referred to as merge / skip mode.
[0073] Another method for encoding motion information is AMVP (Advanced Motion Vector Prediction) mode.
[0074] In AMVP mode, the interpretation unit 124 uses the surrounding blocks of the current block to derive predicted motion vector candidates for the motion vector of the current block. The surrounding blocks used to derive predicted motion vector candidates include all or part of the left block A0, the lower left block A1, the upper block B0, the upper right block B1, and the upper left block A2 adjacent to the current block in the current picture shown in Figure 4. Furthermore, blocks located in a reference picture (which may be the same as or different from the reference picture used to predict the current block) rather than the current picture in which the current block is located may be used as surrounding blocks to derive predicted motion vector candidates. For example, a block in the same position as the current block in the reference picture (a collocated block), or a block adjacent to that block in the same position, may be used. If the number of motion vector candidates obtained by the method described above is less than a preset number, a 0 vector is added to the motion vector candidates.
[0075] The interpretation unit 124 uses the motion vectors of the surrounding blocks to derive candidate predicted motion vectors, and uses these candidate predicted motion vectors to determine the predicted motion vector relative to the current block's motion vector. Then, it subtracts the predicted motion vector from the current block's motion vector to calculate the difference motion vector.
[0076] The predicted motion vector is obtained by applying a predefined function (e.g., median, mean calculation) to the predicted motion vector candidate. In this case, the video decoder also knows the predefined function. Furthermore, since the surrounding blocks used to guide the predicted motion vector candidate are already encoded and decoded blocks, the video decoder also already knows the motion vectors of those surrounding blocks. Therefore, the video encoder does not need to encode information to identify the predicted motion vector candidate. Consequently, in this case, information about the differential motion vector and information about the reference picture used to predict the current block are encoded.
[0077] Alternatively, the predicted motion vector may be determined by selecting one of the candidate predicted motion vectors. In this case, information to identify the selected candidate predicted motion vector is additionally encoded, along with information about the differential motion vector and the reference picture used to predict the current block.
[0078] The subtractor 130 generates a residual block by subtracting the predicted block generated by the intra-prediction unit 122 or the inter-prediction unit 124 from the current block.
[0079] The conversion unit 140 converts the residual signals in the residual block, which have pixel values in the spatial domain, into conversion coefficients in the frequency domain. The conversion unit 140 may convert the residual signals in the residual block using the entire size of the residual block as the conversion unit, or it may divide the residual block into a plurality of subblocks and convert using those subblocks as the conversion unit. Alternatively, it may divide the residual block into a conversion domain and two subblocks which are non-converted domains, and convert the residual signals using only the conversion domain subblock as the conversion unit. Here, the conversion domain subblock is one of two rectangular blocks having a size ratio of 1:1 based on the horizontal axis (or vertical axis). In this case, a flag cu_sbt_flag indicating that only the subblock was converted, directional (vertical / horizontal) information cu_sbt_horizontal_flag, and / or position information cu_sbt_pos_flag are encoded by the entropy encoding unit 155 and signaled to the video decoding device. Furthermore, the size of the conversion region subblock has a size ratio of 1:3 based on the horizontal axis (or vertical axis). In such cases, a flag cu_sbt_quad_flag that distinguishes the relevant division is additionally encoded by the entropy encoding unit 155 and signaled to the video decoding device.
[0080] Meanwhile, the transformation unit 140 performs transformations on the residual blocks separately in the horizontal and vertical directions. Various types of transformation functions or transformation matrices are used for the transformations. For example, a pair of transformation functions for horizontal and vertical transformations is defined as an MTS (Multiple Transform Set). The transformation unit 140 selects one transformation function pair from the MTS that has the best transformation efficiency and transforms the residual blocks in the horizontal and vertical directions, respectively. Information about the selected transformation function pair from the MTS, mts_idx, is encoded by the entropy encoding unit 155 and signaled to the video decoding device.
[0081] The quantization unit 145 quantizes the conversion coefficients output from the conversion unit 140 using quantization parameters and outputs the quantized conversion coefficients to the entropy coding unit 155. The quantization unit 145 may immediately quantize the associated residual blocks for any block or frame without conversion. The quantization unit 145 may apply different quantization coefficients (scaling values) to each other depending on the position of the conversion coefficients within the conversion block. The quantization matrix applied to the two-dimensionally arranged quantized conversion coefficients is encoded and signaled to the video decoding device.
[0082] The sorting unit 150 performs sorting of coefficient values for the quantized residual values.
[0083] The sorting unit 150 converts a two-dimensional coefficient array into a one-dimensional coefficient sequence using coefficient scanning. For example, the sorting unit 150 scans from the DC coefficients to the high-frequency region coefficients using a zig-zag scan or diagonal scan to output a one-dimensional coefficient sequence. Depending on the size of the conversion unit and the intra-prediction mode, a vertical scan that scans the two-dimensional coefficient array in the column direction or a horizontal scan that scans the two-dimensional block-shaped coefficients in the row direction may be used instead of a zig-zag scan. In other words, the scanning method used may be determined from among zig-zag scan, diagonal scan, vertical scan, and horizontal scan depending on the size of the conversion unit and the intra-prediction mode.
[0084] The entropy coding unit 155 generates a bitstream by coding the sequence of one-dimensional quantized transformation coefficients output from the sorting unit 150, using various coding schemes such as CABAC (Context-based Adaptive Binary Arithmetic Code) and Exponential Golomb.
[0085] Furthermore, the entropy coding unit 155 encodes information related to block partitioning, such as the CTU size, CU partitioning flag, QT partitioning flag, MTT partitioning type, and MTT partitioning direction, so that the video decoder can partition blocks in the same way as the video encoder. The entropy coding unit 155 also encodes information about the prediction type, indicating whether the block was currently encoded by intra-prediction or inter-prediction, and encodes intra-prediction information (i.e., information about the intra-prediction mode) or inter-prediction information (information about the motion information encoding mode (merge mode or AMVP mode), the merge index in the case of merge mode, and the reference picture index and differential motion vector in the case of AMVP mode) depending on the prediction type. The entropy coding unit 155 also encodes information related to quantization, i.e., information about the quantization parameters and information about the quantization matrix.
[0086] The inverse quantization unit 160 inverse quantizes the quantized conversion coefficients output from the quantization unit 145 to generate conversion coefficients. The inverse conversion unit 165 converts the conversion coefficients output from the inverse quantization unit 160 from the frequency domain to the spatial domain to restore the residual block.
[0087] The addition unit 170 adds the restored residual block and the predicted block generated by the prediction unit 120 to restore the current block. The pixels in the restored current block are used as reference pixels when intra-predicting the next block in order.
[0088] The loop filter section 180 performs filtering on the restored pixels to reduce blocking artifacts, ringing artifacts, blurring artifacts, etc., that occur due to block-based prediction and transformation / quantization. The filter section 180 includes all or part of a deblocking filter 182, a Sample Adaptive Offset (SAO) filter 184, and an Adaptive Loop Filter (ALF) 186 as in-loop filters.
[0089] The deblocking filter 182 filters the boundaries between restored blocks to remove blocking artifacts caused by block-level encoding / decoding, while the SAO filter 184 and alf 186 perform additional filtering on the deblocking filtered image. The SAO filter 184 and alf 186 are used to compensate for the difference between restored pixels and original pixels caused by lossy coding. The SAO filter 184 improves not only subjective image quality but also encoding efficiency by applying an offset in CTU units. In contrast, the ALF 186 performs block-level filtering, compensating for distortion by applying different filters to the edges and degree of change of the relevant block. Information regarding the filter coefficients used in the ALF is encoded and signaled to the video decoder.
[0090] The recovered blocks filtered through the deblocking filter 182, the SAO filter 184, and the ALF 186 are stored in memory 190. Once all blocks in a picture have been recovered, the recovered picture is used as a reference picture to interpret the blocks in the picture to be encoded later.
[0091] Figure 5 is an exemplary block diagram of an image decoding device embodying the technology of this disclosure. The image decoding device and its sub-configurations will be described below with reference to Figure 5.
[0092] The video decoding device is configured to include an entropy decoding unit 510, a sorting unit 515, an inverse quantization unit 520, an inverse transform unit 530, a prediction unit 540, an adder 550, a loop filter unit 560, and a memory 570.
[0093] Similar to the video encoding device in Figure 1, each component of the video decoding device may be implemented in hardware or software, or in a combination of hardware and software. Furthermore, the function of each component may be implemented in software, and a microprocessor may be implemented to execute the software function corresponding to each component.
[0094] The entropy decoding unit 510 decodes the bitstream generated by the video encoding device and extracts information related to block division to determine the current block to be decoded, and extracts prediction information necessary to restore the current block, as well as information related to the residual signal.
[0095] The entropy decoding unit 510 extracts information about the CTU size from the SPS (Sequence Parameter Set) or PPS (Picture Parameter Set) to determine the size of the CTU and divides the picture into CTUs of the determined size. Then, it determines the CTU as the top layer of the tree structure, i.e., the root node, and extracts division information about the CTU to divide the CTU using the tree structure.
[0096] For example, when splitting a CTU using a QTBTTT structure, first, the first flag QT_split_flag associated with the QT split is extracted, and each node is split into four lower layer nodes. Then, for nodes corresponding to the QT leaf nodes, the second flag MTT_split_flag associated with the MTT split, along with split direction (vertical / horizontal) and / or split type (binary / ternary) information is extracted, and the corresponding leaf node is split into an MTT structure. This recursively splits each node below the QT leaf nodes into a BT or TT structure.
[0097] Another example is when splitting a CTU using a QTBTTT structure. First, a CU splitting flag, split_cu_flag, which indicates whether the CU can be split, is extracted. If the block is split, the first flag, QT_split_flag, is extracted. During the splitting process, each node undergoes zero or more QT splits followed by zero or more MTT splits. For example, a CTU may undergo an MTT split immediately, or conversely, it may undergo only multiple QT splits.
[0098] As another example, when splitting a CTU using a QTBT structure, the first flag QT_split_flag associated with the splitting of QT is extracted, and each node is split into four lower layer nodes. Then, for nodes corresponding to the leaf nodes of QT, the split flag split_flag indicating whether or not to split further in BT and split direction information are extracted.
[0099] On the other hand, when the entropy decoding unit 510 determines the current block to be decoded using the division of the tree structure, it extracts information about the prediction type indicating whether the current block is intra-predicted or inter-predicted. If the prediction type information indicates intra-prediction, the entropy decoding unit 510 extracts syntax elements related to the intra-prediction information (intra-prediction mode) of the current block. If the prediction type information indicates inter-prediction, the entropy decoding unit 510 extracts syntax elements related to inter-prediction information, i.e., information representing the motion vector and the reference picture that the motion vector refers to.
[0100] Furthermore, the entropy decoding unit 510 extracts information related to quantization and information related to the residual signal, specifically information regarding the quantized transformation coefficients of the current block.
[0101] The sorting unit 515 converts the sequence of one-dimensional quantized transformation coefficients, which have been entropically decoded by the entropy decoding unit 510, back into a two-dimensional coefficient array (i.e., blocks) in the reverse order of the coefficient scanning sequence performed by the video encoding device.
[0102] The inverse quantization unit 520 inversely quantizes the quantized transformation coefficients and inversely quantizes the quantized transformation coefficients using quantization parameters. The inverse quantization unit 520 may apply different quantization coefficients (scaling values) to the two-dimensional array of quantized transformation coefficients. The inverse quantization unit 520 performs inverse quantization by applying a matrix of quantization coefficients (scaling values) from the video encoding device to the two-dimensional array of quantized transformation coefficients.
[0103] The inverse transform unit 530 generates a residual block for the current block by inversely transforming the inversely quantized transformation coefficients from the frequency domain to the spatial domain and restoring the residual signal.
[0104] Furthermore, when the inverse transformer 530 inversely transforms only a portion of the transformation block (subblock), it extracts a flag cu_sbt_flag indicating that only the subblock of the transformation block has been transformed, cu_sbt_horizontal_flag indicating the orientation (vertical / horizontal) of the subblock, and / or cu_sbt_pos_flag indicating the position of the subblock. It then reconstructs the residual signal by inversely transforming the transformation coefficients of the corresponding subblock from the frequency domain to the spatial domain, and generates the final residual block for the current block by satisfying the value of "0" in the residual signal for the region that has not been inversely transformed.
[0105] Furthermore, when MTS is applied, the inverse transformation unit 530 uses the MTS information mts_idx signaled from the video encoding device to determine the transformation function or transformation matrix to be applied in the horizontal and vertical directions, respectively, and performs an inverse transformation on the transformation coefficients within the transformation block in the horizontal and vertical directions using the determined transformation function.
[0106] The prediction unit 540 includes an intra-prediction unit 542 and an inter-prediction unit 544. The intra-prediction unit 542 is activated when the prediction type of the current block is intra-prediction, and the inter-prediction unit 544 is activated when the prediction type of the current block is inter-prediction.
[0107] The intra-prediction unit 542 determines the intra-prediction mode for the current block from among multiple intra-prediction modes based on the syntax elements for the intra-prediction modes extracted from the entropy decoding unit 510, and predicts the current block using the reference pixels surrounding the current block according to the intra-prediction mode.
[0108] The interprediction unit 544 uses the syntax elements for the interprediction mode extracted from the entropy decoding unit 510 to determine the motion vector of the current block and the reference picture that the motion vector refers to, and then predicts the current block using the motion vector and the reference picture.
[0109] The adder 550 adds the residual block output from the inverse transform unit to the predicted block output from the inter-prediction unit or intra-prediction unit to restore the current block. The pixels in the restored current block are used as reference pixels when intra-predicting blocks to be decoded later.
[0110] The loop filter section 560 includes a deblocking filter 562, an SAO filter 564, and an ALF 566 as in-loop filters. The deblocking filter 562 deblocks the boundaries between restored blocks to remove blocking artifacts that occur due to block-level decoding. The SAO filter 564 and ALF 566 perform additional filtering on the restored blocks after deblocking to compensate for the difference between the restored pixels and the original pixels that occurs due to lossy coding. The filter coefficients of the ALF are determined using information about the filter coefficients decoded from the bitstream.
[0111] The recovered blocks filtered through the deblocking filter 562, the SAO filter 564, and the ALF 566 are stored in memory 570. Once all blocks in a picture have been recovered, the recovered picture is used as a reference picture to interpret the blocks in the picture to be encoded later.
[0112] Figure 6 illustrates a method for generating a prediction block for the current block in Combined Inter / Intra Prediction (CMIP) mode according to one embodiment of the present disclosure. The term "intra-prediction mode" is synonymous with "intra-prediction mode." The intra-prediction mode and intra-prediction mode are used interchangeably. The term "inter-prediction mode" is synonymous with "inter-prediction mode." The inter-prediction mode and inter-prediction mode are used interchangeably. The term "combined inter-prediction mode" is synonymous with "combined intra-inter-prediction mode." The combined inter-prediction mode and CMIP mode are used interchangeably. In CMIP mode, inter-prediction blocks are generated in the same way as in general merge mode. Intra-prediction blocks are generated by applying a Planar mode to adjacent reference pixels around the current block. Weights are applied to the generated inter-prediction blocks and intra-prediction blocks to generate the final CMIP-based prediction block.
[0113] Referring to Figure 6, the reference block P within the reference picture is based on Merge mode. inter This is induced. Currently, the planar mode is applied to the reference pixel adjacent to the block, and the intra-predicted block P Planar This is generated. The corresponding reference block P inter and intra prediction block P Planar Applying weighted values to the CIIP-based prediction block P CIIP This is generated. Prediction block P based on CIIP. CIIP teeth,
number
[0114] Figure 7 illustrates a peripheral block referenced to determine weights in a combined intra-intra
[0115] Referring to Figure 7, in CIIP mode, the weighting is determined by considering the intra-predictive mode coding of the upper peripheral block A and the left peripheral block L adjacent to the current block.
[0116] Figure 8 illustrates a method for determining weights in combined intra-screen prediction modes according to one embodiment of the present disclosure. When surrounding blocks adjacent to the current block encode many intra-prediction modes, a large weight is assigned to the intra-prediction block. Conversely, when surrounding blocks adjacent to the current block encode few intra-prediction modes, a small weight is assigned to the intra-prediction block.
[0117] Referring to Figure 8, if the upper peripheral block A and the left peripheral block L of the current block in Figure 7 encode the intra-prediction mode, the weight assigned to the intra-prediction block corresponds to 3. If the upper peripheral block A encodes the intra-prediction mode and the left peripheral block L does not encode the intra-prediction mode, the weight assigned to the intra-prediction block corresponds to 2. If the upper peripheral block A does not encode the intra-prediction mode and the left peripheral block L encodes the intra-prediction mode, the weight assigned to the intra-prediction block corresponds to 2. If the upper peripheral block A does not encode the intra-prediction mode and the left peripheral block L does not encode the intra-prediction mode, the weight assigned to the intra-prediction block corresponds to 1.
[0118] The CIIP mode described in Figures 6 to 8 uses the intra-prediction mode fixed to planar mode, and therefore does not utilize the directional information currently present around the block. Furthermore, since the weighting value is determined by how the intra-prediction mode is used for a specific block, there are limitations to determining the weighting value.
[0119] Figure 9 is a diagram illustrating a method for using various in-screen prediction modes in a combined in-screen-to-in-screen prediction mode according to one embodiment of the present disclosure.
[0120] Referring to FIG. 9, a reference block P for the current block is induced using the merge mode. inter The first reference pixel adjacent to the periphery of the corresponding reference block P inter is used to generate an intra prediction block P1 within the screen. intra The distortion between the reference block P inter and the intra prediction block P1 within the screen intra is compared to determine the optimal intra prediction mode. Here, the distortion is calculated through various correlation measurement methods such as SAD (Sum of Absolute Differences) or SSE (Sum of Square Error). The distortion between the reference block P inter and the intra prediction block P1 within the screen intra is compared, and the intra prediction mode with the least distortion is determined as the optimal intra prediction mode. The determined optimal intra prediction mode is applied to the second reference pixel to generate an intra prediction block P2 within the screen. intra The intra prediction block P2 within the screen intra corresponds to the final intra prediction block.
[0121] FIG. 10 is a diagram for explaining the process of determining various intra prediction modes in a combined intra - inter prediction mode according to an embodiment of the present disclosure.
[0122] Referring to FIG. 10, the encoding device determines a reference block corresponding to the current block using the merge mode (S1010). The encoding device generates an intra prediction block based on the first reference pixel adjacent to the periphery of the reference block (S1020). The encoding device compares the distortion between the reference block and the generated intra prediction block to determine the optimal intra prediction mode (S1030). The encoding device generates a final intra prediction block based on the determined optimal intra prediction mode and the second reference pixel (S1040). Since the CIIP mode according to the present disclosure uses various intra prediction modes, the encoding efficiency is improved.
[0123] Figure 11 illustrates a peripheral block referenced to determine weights in a combined in-screen inter-screen prediction mode, according to another embodiment of the present disclosure. To determine weights in CIIP mode, information from all peripheral blocks adjacent to the current block is used. In this case, the encoder and decoder perform the same process to determine weights, so the encoder does not need to transmit weight information to the decoder. This constitutes an implicit method.
[0124] Referring to Figure 11, the weighting is determined using information from the surrounding blocks A1-A8, L1-L8, and AL adjacent to the current block. The weighting is determined by proportional distribution according to the ratio of intra-predictive coding to inter-predictive coding of the surrounding blocks. However, this disclosure is not limited to such embodiments. The weighting is determined using any number of surrounding blocks at any position, rather than all surrounding blocks. The number of surrounding blocks for which intra-predictive coding has been performed is N. intra The number of surrounding blocks that underwent interpredictive coding is N. inter In this case, the weighted value W is assigned to the intra prediction block. intra teeth
number
number
[0125] Figure 12 illustrates a method for determining weights in a combined in-screen to-screen prediction mode according to another embodiment of the present disclosure. In CIIP mode, weights are calculated using distortion. These weights are calculated based on the distortion of the in-screen prediction signal and the distortion of the to-screen prediction signal. In this case, the encoder and decoder perform the same process to determine the weights, so the encoder does not need to transmit weight information to the decoder. This constitutes an implicit method.
[0126] Referring to Figure 12, the distortion of the inter-screen prediction signal is determined in merge mode with a second reference pixel adjacent to the current block and the reference block P. inter It is determined by calculating the difference between it and the first reference pixel adjacent to it.
[0127] The distortion of the in-screen prediction signal is calculated in three ways. The first method is explained by referring to block P. inter A screen prediction block is generated using the first reference pixel adjacent to the surrounding area. The screen prediction block and the reference block P inter The optimal intra-prediction mode is determined by comparing the distortions. The determined optimal intra-prediction mode is applied to the first reference pixel to predict the in-screen block P. intra 1 is generated. Reference block P inter and the predicted block P on the screen intra The distortion of the predicted signal within the screen is calculated by calculating the distortion of 1.
[0128] To explain the second method, refer to block P inter A screen prediction block is generated using the first reference pixel adjacent to the surrounding area. The screen prediction block and the reference block P inter The optimal intra-prediction mode is determined by comparing the distortions. The determined optimal intra-prediction mode is applied to the second reference pixel to predict the in-screen block P. intra 2 is generated. Reference block P inter and the predicted block P on the screen intra The distortion of the predicted signal within the screen is calculated by calculating the distortion of 2.
[0129] To explain the third method, apply the planar mode to the second reference pixel and use a planar mode-based in-screen prediction block P planar This is generated. Reference block P inter and a screen prediction block P based on planar mode planar The distortion of the predicted signal within the screen is calculated by calculating the distortion of the signal.
[0130] The distortion of the in-screen prediction signal is calculated by selecting any of the three methods described above. The distortion of the inter-screen prediction signal is D inter Therefore, the distortion of the in-screen prediction signal is D intra If so, the weighted value W is assigned to the inter-screen prediction signal. inter teeth
number
number
[0131] Figure 13 illustrates a method for determining weights in a combined in-screen to-screen prediction mode, according to yet another embodiment of the present disclosure. In CIIP mode, weights are calculated using distortion. These weights are calculated based on the distortion of the in-screen prediction signal and the distortion of the to-screen prediction signal. In this case, the encoder calculates the distortion using information from the current block, which is the original signal that is not available to the decoder. Therefore, the encoder must transmit information about the weights calculated using distortion to the decoder. This constitutes an explicit method.
[0132] Referring to Figure 13, the distortion of the inter-screen prediction signal is between the current block and the reference block P. inter The difference between the two values is calculated and determined. The encoding device uses information from the current block that the decoding device cannot use.
[0133] The distortion of the in-screen prediction signal is calculated in two ways. The first method is described in reference block P. inter A screen prediction block is generated using the first reference pixel adjacent to the surrounding area. The screen prediction block and the reference block P inter The optimal intra-prediction mode is determined by comparing the distortions. The determined optimal intra-prediction mode is applied to the second reference pixel to predict the in-screen block P. intra This is generated. The current block and the predicted block P on the screen. intra The distortion of the predicted signal within the screen is calculated by calculating the distortion of the signal.
[0134] The second method involves applying the planar mode to a second reference pixel and creating a planar mode-based in-screen prediction block P. planar This is generated. Currently, based on block and planar mode, predict block P on the screen. planar The distortion of the predicted signal within the screen is calculated by calculating the distortion of the signal.
[0135] The distortion of the in-screen prediction signal is calculated by selecting one of the two methods described above. The distortion of the inter-screen prediction signal is D interTherefore, the distortion of the in-screen prediction signal is D intra If so, the weighted value W is assigned to the inter-screen prediction signal. inter teeth
number
number
[0136] Figure 14 is a diagram illustrating an index-based weighting according to another embodiment of the present disclosure. When calculating a weighting based on distortion as shown in Figure 13, the encoder calculates the distortion using information from the current block that the decoder cannot use. Therefore, the encoder must transmit information about the weighting calculated by distortion to the decoder. Generally, the weighting corresponds to any decimal value between 0 and 1. Many bits are required to transmit the weighting corresponding to the decimal value. The encoder maps information about the weighting with decimal values to a predefined table and transmits the corresponding index to the decoder. This reduces the number of bits transmitted.
[0137] Referring to Figure 14, there are three ways to map weights to indices. Method 1 corresponds to using three weights of 0.25, 0.5, and 0.75 and using indices 1 through 3. Method 2 corresponds to using a total of seven weights at intervals of 0.125 from 0.125 to 0.875 and using indices 1 through 7. Method 3 corresponds to using a total of nine weights at intervals of 0.1 from 0.1 to 0.9 and using indices 1 through 9. However, this disclosure is not limited to such embodiments. Weights can be used in any number and with any value.
[0138] The weighted values calculated in Figure 13 are compared with the weighted values used in one of the three methods selected from Figure 14. The index with the most similar weighted values is determined. The weighted value W of the in-screen prediction signal for the determined index is then calculated. intra and the weighted value W of the inter-screen prediction signal inter The prediction in CIIP mode is performed using this method. The encoding device transmits the determined index to the decoding device. As an example, the weighted value W of the calculated in-screen prediction signal. intra and the weighted value W of the inter-screen prediction signal inter These values are 0.358 and 0.642, respectively. If method 2 is selected in Figure 14, the determined index is 3, and the weighted value W of the in-screen prediction signal corresponding to index 3. intra and the weighted value W of the inter-screen prediction signal inter These correspond to 0.375 and 0.625, respectively. As a result, prediction in CIIP mode is performed using the weighted value of the in-screen prediction signal (0.375) and the weighted value of the inter-screen prediction signal (0.625). The encoding device then transmits index 3 to the decoding device.
[0139] Figure 15 illustrates a method for assigning fixed-length codes to weighted indexes according to one embodiment of the present disclosure. Weighted indexes are determined, and the determined indexes are transmitted using fixed-length codes (FLCs). When transmitting indexes using fixed-length codes, all indexes are assigned codewords of the same length.
[0140] Referring to Figure 15, Method 1 assigns a 2-bit fixed-length code to the index. Method 2 assigns a 3-bit fixed-length code to the index. Method 3 assigns a 4-bit fixed-length code to the index. Each method does not have sufficient use of the codewords available to the corresponding bits. Method 1 cannot use codeword 11. Method 2 cannot use codeword 111. Method 3 cannot use codewords 1001, 1010, 1011, 1100, 1101, 1110, and 1111. This results in low encoding efficiency. However, this disclosure is not limited to such embodiments. The method of mapping codewords to indices using fixed-length codes is arbitrarily determined.
[0141] Figure 16 illustrates a method for assigning phased-in codes to a weighted index according to one embodiment of the present disclosure. To solve the problem of low coding efficiency in Figure 15, codewords are assigned to the index using phased-in codes.
[0142] Referring to Figure 16, Phase-in coding allows for the assignment of codes to indices without wasting codewords, even when using fixed-length codes. Phase-in coding uses codes of varying lengths, even with fixed-length codes. This ensures that shorter codewords are assigned to frequently occurring indices and longer codewords to less frequently occurring indices. This eliminates codeword waste and improves coding efficiency. However, this disclosure is not limited to such embodiments. The method of mapping codewords to indices using Phase-in coding is arbitrarily determined.
[0143] Figure 17 illustrates a method for assigning variable-length codes to weighted indexes according to one embodiment of the present disclosure. This method of assigning variable-length codes to weighted indexes corresponds to a method of assigning codewords of different lengths to each index. Shorter codewords are assigned to indices with high occurrence frequency, and longer codewords are assigned to indices with low occurrence frequency. This improves coding efficiency. The occurrence frequency of each index is investigated based on offline training.
[0144] Referring to Figure 17, codewords are assigned using TR (Truncated Rice) codes. Method 1 generates codewords using TR codes with cMAX=2 and cRiceParam=0. Method 2 generates codewords using TR codes with cMAX=6 and cRiceParam=0. Method 3 generates codewords using TR codes with cMAX=8 and cRiceParam=0. The generated codewords are then assigned to indices. However, this disclosure is not limited to such embodiments. Variable-length codes are assigned to indices using any code other than TR codes. Based on offline training, smaller indices are assigned to frequently occurring weights, and larger indices are assigned to less frequently occurring weights. Subsequently, shorter codewords are assigned to frequently occurring weights, and longer codewords are assigned to less frequently occurring weights. This improves coding efficiency.
[0145] Figures 18a and 18b are diagrams illustrating the error distribution of inter-screen prediction and intra-screen prediction according to one embodiment of the present disclosure.
[0146] Referring to Figure 18a, the error distribution of inter-screen prediction is displayed according to the horizontal / vertical coordinate values. In inter-screen prediction, the motion vector is used relative to the center of the current block. As a result, the error of inter-screen prediction increases as the distance from the center of the current block increases.
[0147] Referring to Figure 18b, the error distribution of on-screen predictions is displayed according to the horizontal / vertical coordinate values. In on-screen predictions, the reference block used during prediction is located in the upper left of the current block. As a result, the error of on-screen predictions increases from the upper left to the lower right.
[0148] Figures 19a and 19b illustrate the weighting values for in-screen prediction and cross-screen prediction of an 8x8 size block according to one embodiment of the present disclosure. In in-screen prediction, the reference block used during prediction is located in the upper left of the current block. As a result, the upper left region of the current block is assigned a large weighting value to the in-screen prediction signal, and the lower right region of the current block is assigned a large weighting value to the cross-screen prediction signal.
[0149] Referring to Figure 19a, in an 8x8 block, the upper left region is assigned a large weight to the in-screen prediction signal, while the lower right region is assigned a small weight to the in-screen prediction signal.
[0150] Referring to Figure 19b, in an 8x8 block, the lower right region is assigned a large weight to the in-screen prediction signal, and the upper left region is assigned a small weight to the in-screen prediction signal. However, this disclosure is not limited to such embodiments. The size and shape of the block can be any size and shape. The weights assigned can be any weights.
[0151] Figures 19c and 19d illustrate the weighting of in-screen predictions for an 8x8 block in another embodiment of the present disclosure. In in-screen prediction, the reference blocks used during prediction are located above and to the left of the current block. As a result, a large weighting is assigned to the in-screen prediction signal in areas close to the upper and left reference blocks of the current block.
[0152] Referring to Figure 19c, in an 8x8 block, the upper area is assigned a large weight to the in-screen prediction signal, while the lower area is assigned a small weight to the in-screen prediction signal.
[0153] Referring to Figure 19d, in an 8x8 block, the left region is assigned a large weight to the in-screen prediction signal, and the right region is assigned a small weight to the in-screen prediction signal. However, this disclosure is not limited to such embodiments. The size and shape of the block can be any size and shape. The weights assigned can be any weights.
[0154] Figure 20 is a diagram illustrating the video decoding process according to one embodiment of the present disclosure.
[0155] Referring to Figure 20, the decoding device generates an inter-prediction block for the current block based on the reference block present in the reference picture corresponding to the current block (S2010). Then, the decoding device generates an intra-prediction block for the current block based on the reference block and a first reference pixel adjacent to the reference block (S2020). Generating an intra-prediction block for the current block includes generating a first intra-prediction block based on a first reference pixel adjacent to the reference block, inducing an intra-prediction mode based on the distortion of the reference block and the first intra-prediction block, and generating an intra-prediction block based on the intra-prediction mode and a second reference pixel adjacent to the current block.
[0156] The decoding device then derives weights to be assigned to the inter-prediction block and intra-prediction block, based on the fact that it does not use the current block for distortion calculation (S2030). The weights are derived based on the intra-prediction coding and inter-prediction coding of the surrounding blocks adjacent to the current block. Deriving the weights to be assigned to the inter-prediction block and intra-prediction block includes inducing distortion in the inter-prediction signal, inducing distortion in the intra-prediction signal, and deriving weights based on the distortion of the inter-prediction signal and the distortion of the intra-prediction signal. The distortion of the inter-prediction signal is derived based on the difference between the second reference pixel adjacent to the current block and the first reference pixel adjacent to the reference block.
[0157] Inducing distortion in the intra-prediction signal includes generating a second intra-prediction block based on the intra-prediction mode and a first reference pixel adjacent to the reference block, and inducing distortion in the intra-prediction signal based on the distortion of the reference block and the second intra-prediction block. Inducing distortion in the intra-prediction signal includes generating a third intra-prediction block based on the intra-prediction mode and a second reference pixel adjacent to the current block, and inducing distortion in the intra-prediction signal based on the distortion of the reference block and the third intra-prediction block. Inducing distortion in the intra-prediction signal includes generating a fourth intra-prediction block based on the planar mode and a second reference pixel adjacent to the current block, and inducing distortion in the intra-prediction signal based on the distortion of the reference block and the fourth intra-prediction block. The decoder obtains weights to be assigned to the intra-prediction block and intra-prediction block based on using the current block for distortion calculation. These weights are obtained based on index information mapped to the weights. The weighted values are derived based on at least one of the error distributions for intra-prediction and inter-prediction. The decoder then generates the CIIP prediction block for the current block based on the weighted values, the inter-prediction block, and the intra-prediction block (S2040).
[0158] Figure 21 is a diagram illustrating a video encoding process according to one embodiment of the present disclosure.
[0159] Referring to Figure 21, the encoding device generates an inter-prediction block for the current block based on the reference block that exists in the reference picture corresponding to the current block (S2110). Then, the encoding device generates an intra-prediction block for the current block based on the reference block and a first reference pixel adjacent to the reference block (S2120). Generating an intra-prediction block for the current block includes generating a first intra-prediction block based on a first reference pixel adjacent to the reference block, inducing an intra-prediction mode based on the distortion of the reference block and the first intra-prediction block, and generating an intra-prediction block based on the intra-prediction mode and a second reference pixel adjacent to the current block.
[0160] The encoding device then determines the weights to be assigned to the inter-prediction block and the intra-prediction block, based on the fact that the current block is not used in the distortion calculation (S2130). The weights are determined based on the intra-prediction coding and inter-prediction coding of the surrounding blocks adjacent to the current block. Determining the weights to be assigned to the inter-prediction block and the intra-prediction block includes determining the distortion of the inter-prediction signal, determining the distortion of the intra-prediction signal, and determining the weights based on the distortion of the inter-prediction signal and the distortion of the intra-prediction signal. The distortion of the inter-prediction signal is determined based on the difference between the second reference pixel adjacent to the current block and the first reference pixel adjacent to the reference block.
[0161] Determining the distortion of the intra-prediction signal includes generating a second intra-prediction block based on the intra-prediction mode and a first reference pixel adjacent to the reference block, and determining the distortion of the intra-prediction signal based on the distortion of the reference block and the second intra-prediction block. Determining the distortion of the intra-prediction signal includes generating a third intra-prediction block based on the intra-prediction mode and a second reference pixel adjacent to the current block, and determining the distortion of the intra-prediction signal based on the distortion of the reference block and the third intra-prediction block. Determining the distortion of the intra-prediction signal includes generating a fourth intra-prediction block based on the planar mode and a second reference pixel adjacent to the current block, and determining the distortion of the intra-prediction signal based on the distortion of the reference block and the fourth intra-prediction block.
[0162] The distortion of the inter-prediction signal is determined based on the difference between the current block and the reference block. Determining the distortion of the intra-prediction signal includes generating a third intra-prediction block based on the intra-prediction mode and a second reference pixel adjacent to the current block, and determining the distortion of the intra-prediction signal based on the distortion of the current block and the third intra-prediction block. Determining the distortion of the intra-prediction signal includes generating a fourth intra-prediction block based on the planar mode and a second reference pixel adjacent to the current block, and determining the distortion of the intra-prediction signal based on the distortion of the current block and the fourth intra-prediction block. This includes the step of encoding an index mapped to the weighted values. The encoding device then generates a CIIP prediction block of the current block based on the weighted values, the inter-prediction block, and the intra-prediction block (S2140).
[0163] Although the flowcharts / timing diagrams in this specification describe each process as being executed sequentially, this is merely an illustrative example of the technical concept of one embodiment of the present disclosure. In other words, the flowcharts / timing diagrams are not limited to a chronological order, as they can be modified and adapted in various ways by changing the order described in the flowcharts / timing diagrams, or by executing one or more of the processes in parallel, without departing from the essential characteristics of the embodiment of the present disclosure.
[0164] It should be understood that the exemplary embodiments described above can be embodied in many other ways. Functions or methods described in one or more examples can be embodied in hardware, software, firmware, or any combination thereof. It should be understood that the functional components described herein are labeled as “units” to particularly emphasize their independent implementation.
[0165] On the other hand, the various functions or methods described in this embodiment may be embodied in instruction words stored on a non-temporary recording medium that can be read and executed by one or more processors. The non-temporary recording medium includes, for example, any kind of recording device in which data is stored in a form that can be read by a computer system. For example, the non-temporary recording medium includes storage media such as EPROM (Erasable Programmable Read Only Memory), flash drives, optical drives, magnetic hard drives, and solid-state drives (SSDs).
[0166] The above description is merely illustrative of the technical concept of this embodiment, and a person with ordinary skill in the art to which this embodiment belongs could make various modifications and variations without departing from the essential characteristics of this embodiment. Therefore, this embodiment is for illustrative purposes only and not to limit the technical concept of this embodiment, and the scope of the technical concept of this embodiment is not limited by such embodiment. The scope of protection of this embodiment should be interpreted by the claims, and all technical concepts within an equivalent scope should be interpreted as being included in the scope of rights of this embodiment. [Explanation of symbols]
[0167] 122 Intra Prediction Unit 510 Entropy Decoding Unit 542 Intra Prediction Unit
Claims
1. The steps include generating an inter-prediction block for the current block based on a reference block that exists in the reference picture corresponding to the current block, The steps include generating an intra-prediction block for the current block based on the reference block and a first reference pixel adjacent to the reference block, The steps include: deriving weight values to be assigned to the inter-prediction block and the intra-prediction block, based on the principle that the current block will not be used in the strain calculation; and A video decoding method characterized by comprising the step of generating a Combined Inter Intra Prediction (CIIP) prediction block for the current block based on the weighted value, the inter prediction block, and the intra prediction block.
2. The step of generating an intra-prediction block for the current block is: The steps include generating a first intra-prediction block based on a first reference pixel adjacent to the aforementioned reference block, A step of inducing an intra-prediction mode based on the distortions of the reference block and the first intra-prediction block, and The video decoding method according to claim 1, comprising the step of generating an intra-prediction block based on the intra-prediction mode and a second reference pixel adjacent to the current block.
3. The video decoding method according to claim 1, characterized in that the weighting value is derived based on the intra-predictive coding and inter-predictive coding of the surrounding blocks adjacent to the current block.
4. The step of inducing the weight values to be assigned to the inter-prediction block and the intra-prediction block is: Steps to induce distortion in the interpretation signal, A step to induce distortion in the intra-prediction signal, and, The video decoding method according to claim 2, characterized by comprising the step of deriving the weighted value based on the distortion of the inter-prediction signal and the distortion of the intra-prediction signal.
5. The video decoding method according to claim 4, characterized in that the distortion of the interpretation signal is induced based on the difference between a second reference pixel adjacent to the current block and a first reference pixel adjacent to the reference block.
6. The step of inducing distortion in the intra-prediction signal is: The steps include generating a second intra-prediction block based on the intra-prediction mode and a first reference pixel adjacent to the reference block, and The video decoding method according to claim 4, characterized in that it includes the step of inducing distortion of an intra-prediction signal based on the distortion of the reference block and the second intra-prediction block.
7. The step of inducing distortion in the intra-prediction signal is: A step of generating a third intra-prediction block based on the intra-prediction mode and a second reference pixel adjacent to the current block, and The video decoding method according to claim 4, characterized in that it includes the step of inducing distortion of an intra-prediction signal based on the distortion of the reference block and the third intra-prediction block.
8. The step of inducing distortion in the intra-prediction signal is: A step of generating a fourth intra-prediction block based on the planar mode and a second reference pixel adjacent to the current block, and The video decoding method according to claim 4, characterized in that it includes the step of inducing distortion of an intra-prediction signal based on the distortion of the reference block and the fourth intra-prediction block.
9. Furthermore, based on using the current block in the strain calculation, the step of obtaining weighted values to be assigned to the inter-prediction block and the intra-prediction block, The step includes generating a CIIP prediction block for the current block based on the weighted value, the inter-prediction block, and the intra-prediction block, The video decoding method according to claim 1, characterized in that the weighted value is obtained based on index information mapped to the weighted value.
10. The video decoding method according to claim 1, characterized in that the weighted value is derived based on at least one of the error distributions of intra-prediction and inter-prediction.
11. The steps include generating an inter-prediction block for the current block based on a reference block that exists in the reference picture corresponding to the current block, A step of generating an intra-prediction block of the current block based on the reference block and a first reference pixel adjacent to the reference block, The steps include determining the weight values to be assigned to the inter-prediction block and the intra-prediction block, and A video coding method characterized by comprising the step of generating a CIIP prediction block for the current block based on the weighted value, the inter-prediction block, and the intra-prediction block.
12. The step of generating an intra-prediction block for the current block is: The steps include generating a first intra-prediction block based on a first reference pixel adjacent to the aforementioned reference block, The steps include determining an intra-prediction mode based on the distortions of the reference block and the first intra-prediction block, and The video coding method according to claim 11, comprising the step of generating an intra-prediction block based on the intra-prediction mode and a second reference pixel adjacent to the current block.
13. The video coding method according to claim 11, characterized in that the weighting value is determined based on the intra-predictive coding and inter-predictive coding of the surrounding blocks adjacent to the current block.
14. The step of determining the weight values to be assigned to the inter-prediction block and the intra-prediction block is: Steps to determine the distortion of the interpretation signal, A step to determine the distortion of the intra-prediction signal, and The video coding method according to claim 12, characterized by comprising the step of determining the weighting value based on the distortion of the inter-prediction signal and the distortion of the intra-prediction signal.
15. The distortion of the interpretation signal is determined based on the difference between a second reference pixel adjacent to the current block and a first reference pixel adjacent to the reference block. The step of determining the distortion of the intra-prediction signal is: The steps include generating a second intra-prediction block based on the intra-prediction mode and a first reference pixel adjacent to the reference block, and The video coding method according to claim 14, characterized by comprising the step of determining the distortion of an intra-prediction signal based on the distortion of the reference block and the second intra-prediction block.
16. The step of determining the distortion of the intra-prediction signal is: A step of generating a third intra-prediction block based on the intra-prediction mode and a second reference pixel adjacent to the current block, and The video coding method according to claim 14, characterized by comprising the step of determining the distortion of the intra-prediction signal based on the distortion of the reference block and the third intra-prediction block.
17. The step of determining the distortion of the intra-prediction signal is: A step of generating a fourth intra-prediction block based on the planar mode and a second reference pixel adjacent to the current block, and The video coding method according to claim 14, characterized by comprising the step of determining the distortion of the intra-prediction signal based on the distortion of the reference block and the fourth intra-prediction block.
18. The distortion of the interpretation signal is determined based on the difference between the current block and the reference block. The step of determining the distortion of the intra-prediction signal is: A step of generating a third intra-prediction block based on the intra-prediction mode and a second reference pixel adjacent to the current block, and The video coding method according to claim 14, characterized by including the step of determining the distortion of an intra-prediction signal based on the distortion of the current block and the third intra-prediction block.
19. The step of determining the distortion of the intra-prediction signal is: A step of generating a fourth intra-prediction block based on the planar mode and a second reference pixel adjacent to the current block, and The video coding method according to claim 14, characterized by including the step of determining the distortion of the intra-prediction signal based on the distortion of the current block and the fourth intra-prediction block.
20. Furthermore, the video encoding method according to claim 18 is characterized by including the step of encoding an index mapped to the weighted value.
21. A method for providing video data to a video decoding device, The steps include encoding the aforementioned video data into a bitstream, The step of transmitting the bitstream to the video decoding device is included, The step of encoding the video data is: The steps include generating an inter-prediction block for the current block based on a reference block that exists in the reference picture corresponding to the current block, A step of generating an intra-prediction block for the current block based on the reference block and a first reference pixel adjacent to the reference block, The steps include determining the weight values to be assigned to the inter-prediction block and the intra-prediction block, and A method characterized by comprising the step of generating a CIIP prediction block for the current block based on the weighted value, the inter-prediction block, and the intra-prediction block.