Image encoding / decoding method and recording medium for the method
By employing a combination of multi-directional prediction and motion vector candidate lists in image encoding/decoding, the problem of low efficiency in high-resolution image encoding is solved, achieving more efficient image data compression.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- ELECTRONICS & TELECOMM RES INST
- Filing Date
- 2017-05-23
- Publication Date
- 2026-06-26
AI Technical Summary
Existing technologies have limited coding efficiency in high-resolution and high-quality image encoding/decoding, especially in motion compensation where only unidirectional and bidirectional predictions are used, making it difficult to meet the requirements for efficient compression.
A multi-directional prediction method is adopted, including unidirectional, bidirectional, tridirectional and quadridirectional prediction. By generating multiple candidate lists of motion vectors and combining spatial, temporal and predefined motion vector candidates, the final prediction block is generated.
It improves image encoding/decoding efficiency and enhances image data compression performance by combining motion compensation of motion vector candidates.
Smart Images

Figure CN116567262B_ABST
Abstract
Description
[0001] This application is a divisional application of the invention patent application filed on May 23, 2017, with application number "201780032625.8" and titled "Image Encoding / Decoding Method and Recording Medium for the Method". Technical Field
[0002] This invention relates to a method and apparatus for encoding / decoding images. More specifically, this invention relates to a method and apparatus for performing motion compensation using motion vector prediction. Background Technology
[0003] Recently, the demand for high-resolution and high-quality images, such as high-definition (HD) and ultra-high-definition (UHD) images, has grown across various application areas. However, the data volume of higher-resolution and higher-quality image data increases compared to traditional image data. Therefore, the costs of transmission and storage increase when transmitting image data using media such as traditional wired and wireless broadband networks, or when storing image data using traditional storage media. To address these challenges arising from the increasing resolution and quality of image data, efficient image encoding / decoding technologies are needed for higher-resolution and higher-quality images.
[0004] Image compression techniques encompass a variety of methods, including: inter-frame prediction techniques that predict pixel values included in the current frame from previous or subsequent frames; intra-frame prediction techniques that predict pixel values included in the current frame using pixel information from the current frame; energy transformation and quantization techniques for compressing residual signals; entropy coding techniques that assign short codes to high-frequency values and long codes to low-frequency values; and so on. By using such image compression techniques, image data can be effectively compressed and transmitted or stored.
[0005] In traditional motion compensation, only spatial motion vector candidates, temporal motion vector candidates, and zero motion vector candidates are added to the list of motion vector candidates to be used, and only unidirectional and bidirectional predictions are used, which limits the improvement of coding efficiency. Summary of the Invention
[0006] Technical issues
[0007] The present invention provides a method and apparatus for improving the encoding / decoding efficiency of images by performing motion compensation through the use of combined motion vector candidates.
[0008] The present invention provides a method and apparatus for performing motion compensation by using one-way prediction, two-way prediction, three-way prediction and four-way prediction to improve the encoding / decoding efficiency of images.
[0009] Solution
[0010] According to the present invention, a method for decoding an image may include: generating a plurality of motion vector candidate lists based on the inter-frame prediction direction of the current block; obtaining a plurality of motion vectors for the current block by using the plurality of motion vector candidate lists; determining a plurality of prediction blocks for the current block by using the plurality of motion vectors; and obtaining a final prediction block for the current block based on the plurality of prediction blocks.
[0011] According to the present invention, a method for encoding an image may include: generating a plurality of motion vector candidate lists based on the inter-frame prediction direction of a current block; obtaining a plurality of motion vectors for the current block by using the plurality of motion vector candidate lists; determining a plurality of prediction blocks for the current block by using the plurality of motion vectors; and obtaining a final prediction block for the current block based on the plurality of prediction blocks.
[0012] According to the method for encoding / decoding images, the inter-frame prediction direction can indicate unidirectional or multidirectional prediction, and the multidirectional prediction can include three-directional or more-directional prediction.
[0013] According to the method for encoding / decoding images, the motion vector candidate list can be generated based on a list of reference images.
[0014] According to the method for encoding / decoding an image, the motion vector candidate list may include at least one of the following motion vector candidates: spatial motion vector candidates obtained from spatially neighboring blocks of the current block, temporal motion vector candidates obtained from co-located blocks of the current block, and motion vector candidates with predefined values.
[0015] According to the method for encoding / decoding images, the motion vector candidate list may include a combined motion vector candidate generated by combining at least two of the following motion vector candidates: the spatial motion vector candidate, the temporal motion vector candidate, and the motion vector candidate of the predefined value.
[0016] According to the method for encoding / decoding the image, the final prediction block can be determined based on the weighted sum of the multiple prediction blocks.
[0017] According to the method for encoding / decoding the image, the weights applied to the plurality of prediction blocks can be determined based on the predicted weight values and the weight differences.
[0018] Beneficial effects
[0019] In this invention, a method and apparatus are provided for improving the encoding / decoding efficiency of an image by performing motion compensation through the use of combined motion vector candidates.
[0020] In this invention, a method and apparatus are provided for performing motion compensation by using one-way prediction, two-way prediction, three-way prediction and four-way prediction to improve the encoding / decoding efficiency of images. Attached Figure Description
[0021] Figure 1 This is a block diagram illustrating the configuration of an encoding device according to an embodiment of the present invention.
[0022] Figure 2 This is a block diagram illustrating the configuration of a decoding device according to an embodiment of the present invention.
[0023] Figure 3 It is a schematic diagram illustrating the partitioning structure of an image when it is encoded and decoded.
[0024] Figure 4 This is a diagram showing the form of a prediction unit (PU) that can be included in a coding unit (CU).
[0025] Figure 5 This is a diagram showing the form of a transform unit (TU) that can be included in an encoding unit (CU).
[0026] Figure 6 This is a diagram illustrating an embodiment of the processing used to explain intra-frame prediction.
[0027] Figure 7 This is a diagram illustrating an embodiment of the processing used to explain inter-frame prediction.
[0028] Figure 8 It is a diagram used to interpret the transform set based on the intra-frame prediction mode.
[0029] Figure 9 This is a diagram used to explain the processing of transformations.
[0030] Figure 10 It is a diagram used to interpret the scanning of the transformation coefficients of quantization.
[0031] Figure 11 It is a diagram used to explain block partitioning.
[0032] Figure 12 This is a flowchart illustrating a method for encoding an image according to the present invention.
[0033] Figure 13 This is a flowchart illustrating a method for decoding an image according to the present invention.
[0034] Figure 14 This is a diagram illustrating an example of obtaining spatial motion vector candidates for the current block.
[0035] Figure 15 This is a diagram illustrating an example of obtaining candidates for the time motion vector of the current block.
[0036] Figure 16 This is a diagram illustrating an example of scaling the motion vector of a co-located block to obtain a candidate time motion vector for the current block.
[0037] Figure 17 This is a diagram illustrating an example of generating a candidate list of motion vectors.
[0038] Figure 18 This is a diagram illustrating an example of adding a motion vector with a predetermined value to a list of motion vector candidates.
[0039] Figure 19 This is a diagram illustrating an example of removing motion vector candidates from the motion vector candidate list.
[0040] Figure 20 This is a diagram showing an example of a list of motion vector candidates.
[0041] Figure 21 This is a diagram illustrating an example of obtaining predicted motion vector candidates for the current block from a list of motion vector candidates.
[0042] Figure 22a and Figure 22b This is a diagram illustrating an example of the syntax used for information about motion compensation. Detailed Implementation
[0043] Various modifications can be made to this invention, and various embodiments of the invention exist, wherein examples of the embodiments will now be provided with reference to the accompanying drawings, and examples of the embodiments will be described in detail. However, the invention is not limited thereto, although exemplary embodiments may be interpreted as including all modifications, equivalents, or substitutions within the technical concept and scope of the invention. Similar reference numerals refer to functions that are the same or similar in respect of each other. In the drawings, the shapes and sizes of elements may be exaggerated for clarity. In the following detailed description of the invention, reference is made to the accompanying drawings, which illustrate specific embodiments in which the invention may be practiced. These embodiments are described in sufficient detail to enable those skilled in the art to practice this disclosure. It should be understood that the various embodiments of this disclosure, though different, are not necessarily mutually exclusive. For example, specific features, structures, and characteristics associated with one embodiment described herein may be implemented in other embodiments without departing from the spirit and scope of this disclosure. Furthermore, it should be understood that the positions or arrangements of the various elements within each disclosed embodiment may be modified without departing from the spirit and scope of this disclosure. Therefore, the following detailed description is not intended to be limiting, and the scope of this disclosure is defined by the appended claims (and, where appropriate, the full scope of the equivalents claimed in the claims).
[0044] The terms "first," "second," etc., used in this specification may be used to describe various components, but these components are not to be construed as limiting the terms. The terms are used only to distinguish one component from another. For example, without departing from the scope of the invention, a "first" component may be referred to as a "second" component, and a "second" component may similarly be referred to as a "first" component. The term "and / or" includes a combination of multiple items or any one of multiple items.
[0045] It will be understood that in this specification, when an element is simply referred to as "connected to" or "joined to" another element rather than "directly connected to" or "directly joined to" another element, it can be "directly connected to" or "directly joined to" another element, or connected to or joined to another element with other elements inserted in between. Conversely, it should be understood that when an element is referred to as "directly joined" or "directly connected to" another element, there are no intermediate elements.
[0046] Furthermore, the components shown in the embodiments of the present invention are illustrated independently to present distinct functionalities. Therefore, this does not imply that each component is composed as a separate hardware or software unit. In other words, for convenience, each component includes every one of the enumerated components. Thus, at least two components in each component can be combined to form a single component, or a single component can be divided into multiple components to perform each function. Embodiments where each component is combined and embodiments where a component is divided are also included within the scope of the invention without departing from its spirit.
[0047] The terminology used in this specification is for describing particular embodiments only and is not intended to limit the invention. Expressions used in the singular include plural expressions unless they have a distinct meaning in the context. In this specification, it will be understood that terms such as “comprising,” “having,” etc., are intended to indicate the presence of features, quantities, steps, actions, elements, components, or combinations thereof disclosed in the specification, and are not intended to exclude the possibility that one or more other features, quantities, steps, actions, elements, components, or combinations thereof may be present or added. In other words, when a particular element is referred to as “comprising,” elements other than the corresponding element are not excluded; rather, additional elements may be included in embodiments of the invention or within the scope of the invention.
[0048] Furthermore, some components may not be essential for performing the necessary functions of the invention, but rather optional components that merely enhance its performance. The invention can be implemented by including only the essential components necessary for carrying out the invention itself, excluding components used to enhance performance. Structures that include only the essential components and exclude optional components used solely for enhancing performance are also included within the scope of the invention.
[0049] In the following, embodiments of the present invention will be described in detail with reference to the accompanying drawings. In describing exemplary embodiments of the invention, well-known functions or structures will not be described in detail, as they would unnecessarily obscure the understanding of the invention. The same constituent elements in the drawings are denoted by the same reference numerals, and repeated descriptions of the same elements will be omitted.
[0050] Furthermore, in the following text, "image" can mean either a frame that constitutes a video or the video itself. For example, "encoding or decoding an image, or both" can mean "encoding or decoding a video, or both," and can also mean "encoding or decoding one of a plurality of images in a video, or both." Here, "frame" and "image" can have the same meaning.
[0051] Terminology Description
[0052] Encoder: can mean a device that performs encoding.
[0053] Decoder: can mean a device that performs decoding.
[0054] Explanation: This could mean determining the value of a syntax element by performing entropy decoding, or it could mean entropy decoding itself.
[0055] A block can be interpreted as a sample of an M×N matrix. Here, M and N are positive integers, and a block can be interpreted as a sample matrix in two-dimensional form.
[0056] Sample: A sample is the basic unit of a block and can indicate a value ranging from 0 to 2Bd–1 depending on the bit depth (Bd). In this invention, a sample can mean a pixel.
[0057] A unit can be defined as a unit used for encoding and decoding an image. During image encoding and decoding, a unit can be a region created by partitioning an image. Furthermore, a unit can also be defined as a sub-partition unit when an image is partitioned into multiple sub-partition units during encoding or decoding. During image encoding and decoding, predetermined processing can be performed for each unit. A unit can be partitioned into sub-units smaller than the unit itself. Depending on its function, a unit can be defined as a block, macroblock, coding tree unit, coding tree block, coding unit, coding block, prediction unit, prediction block, transform unit, transform block, etc. Furthermore, to distinguish a unit from a block, a unit can include a luma component block, a chroma component block of the luma component block, and syntax elements for each chroma component block. Units can have various sizes and shapes; specifically, the shape of a unit can be a two-dimensional geometric shape, such as a rectangle, square, trapezoid, triangle, pentagon, etc. Additionally, unit information can include at least one of the following: unit type (indicating coding unit, prediction unit, transform unit, etc.), unit size, unit depth, and the order in which the unit is encoded and decoded.
[0058] Reconstructing neighboring units: This can mean reconstructing units that have been previously encoded or decoded spatially / temporally, and that are adjacent to the encoding / decoding target units. Here, reconstructing neighboring units can mean reconstructing neighboring blocks.
[0059] Neighboring block: Can be defined as a block adjacent to the target block being encoded / decoded. A block adjacent to the target block can also be defined as a block with a boundary that contacts the target block. A neighboring block can also be defined as a block located at an adjacent vertex of the target block. A neighboring block can also be defined as a reconstructed neighboring block.
[0060] Cell depth: This can be interpreted as the degree to which a cell is partitioned. In a tree structure, the root node can be the highest node, and the leaf nodes can be the lowest nodes.
[0061] Symbols: can refer to syntax elements, encoding parameters, transform coefficient values, etc. of the encoding / decoding target unit.
[0062] Parameter set: This can refer to the header information in the structure of a bitstream. A parameter set can include at least one parameter set from a video parameter set, sequence parameter set, frame parameter set, or adaptive parameter set. Furthermore, a parameter set can also refer to strip header information and tile header information, etc.
[0063] Bitstream: can be defined as a string of bits that includes encoded image information.
[0064] Prediction Unit: This can be understood as the basic unit used when performing inter-frame or intra-frame prediction and compensation for the prediction. A prediction unit can be partitioned into multiple partitions. In this case, each of the multiple partitions can be a basic unit during prediction and compensation, and each partition obtained from the prediction unit partitioning can be a prediction unit. Furthermore, a prediction unit can be partitioned into multiple smaller prediction units. Prediction units can have various sizes and shapes, and specifically, the shape of a prediction unit can be a two-dimensional geometric figure, such as a rectangle, square, trapezoid, triangle, and pentagon.
[0065] Prediction cell partitioning: can be understood as the shape of the prediction cells partitioned out.
[0066] Reference frame list: This can mean a list including at least one reference frame, wherein the at least one reference frame is used for inter-frame prediction or motion compensation. The reference frame list can be of type List Combined (LC), List 0 (L0), List 1 (L1), List 2 (L2), List 3 (L3), etc. At least one reference frame list can be used for inter-frame prediction.
[0067] Inter-frame prediction indicator: can mean one of the following: the inter-frame prediction direction (unidirectional prediction, bidirectional prediction, etc.) of the encoded / decoded target block in the case of inter-frame prediction, the number of reference frames used to generate prediction blocks through the encoded / decoded target block, and the number of reference blocks used to perform inter-frame prediction or motion compensation through the encoded / decoded target block.
[0068] Reference screen index: This can refer to the index of a specific reference screen in the list of reference screens.
[0069] Reference frame: This can refer to a frame used by a specific unit for inter-frame prediction or motion compensation. A reference image can be called a reference frame.
[0070] Motion vector: A two-dimensional vector used for inter-frame prediction or motion compensation, and can be interpreted as the offset between the target frame and the reference frame being encoded / decoded. For example, (mvX, mvY) can indicate a motion vector, where mvX indicates the horizontal component and mvY indicates the vertical component.
[0071] Motion vector candidate: can mean a cell that becomes a prediction candidate when predicting motion vectors, or it can mean the motion vector of that cell.
[0072] Motion vector candidate list: This can mean a list configured by using motion vector candidates.
[0073] Motion vector candidate index: This can be interpreted as an indicator that points to a motion vector candidate in the motion vector candidate list. The motion vector candidate index can also be referred to as the index of motion vector predictors.
[0074] Motion information: can mean motion vectors, reference frame indexes and inter-frame prediction indicators, as well as information including at least one of the following: reference frame list information, reference frames, motion vector candidates, motion vector candidate indexes, etc.
[0075] Merge candidate list: This can mean a list configured by using merge candidates.
[0076] Merging candidates can include spatial merging candidates, temporal merging candidates, combined merging candidates, combined bidirectional prediction merging candidates, zero merging candidates, etc. Merging candidates can include motion information such as prediction type information, reference frame indexes for each list, motion vectors, etc.
[0077] Merge Index: This can refer to information about merge candidates in the merge candidate list. Furthermore, the merge index can indicate which block among the reconstructed blocks spatially / temporally adjacent to the current block has become a merge candidate. Additionally, the merge index can indicate at least one of multiple motion information entries for a merge candidate.
[0078] Transform unit: This can be understood as the basic unit used to perform transformations, inverse transformations, quantization, dequantization, and encoding / decoding of transform coefficients on a residual signal. A transform unit can be divided into multiple smaller transform units. Transform units can have various sizes and shapes. Specifically, the shape of a transform unit can be a two-dimensional geometric figure, such as a rectangle, square, trapezoid, triangle, pentagon, etc.
[0079] Scaling: This can be understood as multiplying a factor by the levels of the transform coefficients, resulting in the transformation coefficients being generated. Scaling can also be referred to as inverse quantization.
[0080] Quantization parameter: This can be interpreted as the value used to scale the transform coefficient levels during quantization and dequantization. Here, the quantization parameter can be a value mapped to the quantization step size.
[0081] Variable increment (Delta) quantization parameter: can be interpreted as the difference between the quantization parameter of the encoding / decoding target unit and the predicted quantization parameter.
[0082] Scan: This can refer to a method of sorting the coefficients within a block or matrix. For example, the operation of sorting a two-dimensional matrix into a one-dimensional matrix can be called a scan, and the operation of sorting a one-dimensional matrix into a two-dimensional matrix can be called a scan or inverse scan.
[0083] Transformation coefficients: These can be understood as the coefficient values generated after a transformation is performed. In this invention, the quantized transformation coefficient levels (i.e., the transformation coefficients to which quantization has been applied) can be referred to as transformation coefficients.
[0084] Non-zero transform coefficients: can be interpreted as transform coefficients whose values are not zero, or as levels of transform coefficients whose values are not zero.
[0085] Quantization matrix: This refers to a matrix used in quantization and dequantization to improve the subject quality or object quality of an image. The quantization matrix can also be called a scaling list.
[0086] Quantization matrix coefficients: These can be understood as each element of the quantization matrix. Quantization matrix coefficients are also referred to as matrix coefficients.
[0087] Default matrix: can mean a predefined quantization matrix that is defined in the encoder and decoder.
[0088] Non-default matrix: This can mean a quantization matrix sent / received by the user without being predefined in the encoder and decoder.
[0089] A coding tree unit can consist of one luminance component (Y) coding tree unit and two associated chrominance component (Cb, Cr) coding tree units. Each coding tree unit can be partitioned using at least one partitioning method (such as a quadtree, binary tree, etc.) to form sub-units such as coding units, prediction units, transform units, etc. The coding tree unit can be used as a term to indicate pixel blocks (i.e., processing units in the decoding / encoding process of an image, such as partitions of the input image).
[0090] Coding tree block: can be used as a term to indicate one of the Y coding tree unit, Cb coding tree unit, and Cr coding tree unit.
[0091] Figure 1 This is a block diagram illustrating the configuration of an encoding device according to an embodiment of the present invention.
[0092] Encoding device 100 can be a video encoding device or an image encoding device. Video may include one or more images. Encoding device 100 can encode one or more images of the video in chronological order.
[0093] Reference Figure 1 The encoding device 100 may include a motion prediction unit 111, a motion compensation unit 112, an intra-frame prediction unit 120, a switcher 115, a subtractor 125, a transform unit 130, a quantization unit 140, an entropy coding unit 150, an inverse quantization unit 160, an inverse transform unit 170, an adder 175, a filter unit 180, and a reference frame buffer 190.
[0094] Encoding device 100 can encode the input frame in intra-frame mode, inter-frame mode, or both. Furthermore, encoding device 100 can generate a bitstream by encoding the input frame and can output the generated bitstream. When intra-frame mode is used as the prediction mode, switcher 115 can switch to intra-frame mode. When inter-frame mode is used as the prediction mode, switcher 115 can switch to inter-frame mode. Here, intra-frame mode can be referred to as intra-frame prediction mode, and inter-frame mode can be referred to as inter-frame prediction mode. Encoding device 100 can generate prediction blocks of input blocks of the input frame. Furthermore, after generating prediction blocks, encoding device 100 can encode the residual between the input block and the prediction block. The input frame can be referred to as the current image as the target of the current encoding. The input block can be referred to as the current block or as the encoding target block as the target of the current encoding.
[0095] When the prediction mode is intra-frame mode, the intra-frame prediction unit 120 can use the pixel values of the previous coded blocks adjacent to the current block as reference pixels. The intra-frame prediction unit 120 can perform spatial prediction by using reference pixels and can generate prediction samples of the input block by using spatial prediction. Here, intra-frame prediction can mean intra-frame prediction.
[0096] When the prediction mode is inter-frame mode, the motion prediction unit 111 can search for the region that best matches the input block from the reference frame during motion prediction processing, and obtain the motion vector by using the searched region. The reference frame can be stored in the reference frame buffer 190.
[0097] The motion compensation unit 112 can generate prediction blocks by performing motion compensation using motion vectors. Here, the motion vectors can be two-dimensional vectors used for inter-frame prediction. Furthermore, the motion vectors can indicate the offset between the current frame and a reference frame. Here, inter-frame prediction can mean inter-frame prediction.
[0098] When the value of the motion vector is not an integer, the motion prediction unit 111 and the motion compensation unit 112 can generate a prediction block by applying an interpolation filter to a portion of the reference frame. To perform inter-frame prediction or motion compensation based on the coding unit, the method used for motion prediction and compensation in the coding unit can be determined from among skip mode, merge mode, AMVP mode, and current frame reference mode. Inter-frame prediction or motion compensation can be performed according to each mode. Here, the current frame reference mode can be understood as a prediction mode using a pre-constructed region of the current frame with the coding target block. To specify the pre-constructed region, a motion vector can be defined for the current frame reference mode. Whether the coding target block is encoded according to the current frame reference mode can be determined by using the reference frame index of the coding target block.
[0099] Subtractor 125 can generate a residual block by using the residual between the input block and the prediction block. The residual block may be referred to as the residual signal.
[0100] Transformation unit 130 can generate transformation coefficients by transforming the residual block and can output the transformation coefficients. Here, the transformation coefficients can be coefficient values generated by transforming the residual block. In transform skip mode, transformation unit 130 can skip the transformation of the residual block.
[0101] A quantized transformation coefficient level can be generated by applying quantization to the transformation coefficients. In the following, in embodiments of the invention, the quantized transformation coefficient level may be referred to as the transformation coefficient.
[0102] The quantization unit 140 can generate quantized transformation coefficient levels by quantizing the transformation coefficients according to quantization parameters, and can output the quantized transformation coefficient levels. Here, the quantization unit 140 can quantize the transformation coefficients using a quantization matrix.
[0103] The entropy coding unit 150 can generate a bitstream by performing entropy coding on values calculated by the quantization unit 140 or on coding parameter values calculated in the coding process according to a probability distribution, and can output the generated bitstream. The entropy coding unit 150 can perform entropy coding on information used for decoding the image, and can also perform entropy coding on information about the image's pixels. For example, the information used for decoding the image may include syntax elements, etc.
[0104] When entropy coding is applied, the size of the bitstream encoding the target symbol is reduced by allocating a small number of bits to symbols with high occurrence probabilities and a large number of bits to symbols with low occurrence probabilities. Therefore, the compression performance of image coding can be improved through entropy coding. For entropy coding, the entropy coding unit 150 can use coding methods such as exponential Golomb, context-adaptive variable-length coding (CAVLC), and context-adaptive binary arithmetic coding (CABAC). For example, the entropy coding unit 150 can perform entropy coding by using a variable-length code / code (VLC) table. Furthermore, the entropy coding unit 150 can obtain a binaryization method for the target symbol and a probability model of the target symbol / bits, and can subsequently perform arithmetic coding by using the obtained binaryization method or the obtained probability model.
[0105] To encode the transform coefficient levels, the entropy coding unit 150 can transform the coefficients from two-dimensional block form to one-dimensional vector form using a transform coefficient scanning method. For example, by scanning the coefficients of the block using an upper-right scan, the two-dimensional coefficients can be transformed into one-dimensional vectors. Depending on the size of the transform unit and the intra-frame prediction mode, a vertical scan for scanning the coefficients in the two-dimensional block form along the column direction and a horizontal scan for scanning the coefficients in the two-dimensional block form along the row direction can be used instead of an upper-right scan. That is, depending on the size of the transform unit and the intra-frame prediction mode, it can be determined which scanning method among the upper-right scan, vertical scan, and horizontal scan will be used.
[0106] Encoding parameters may include information such as syntax elements encoded by the encoder and sent to the decoder, and may include information that can be obtained during the encoding or decoding process. Encoding parameters may mean information necessary for encoding or decoding an image. For example, encoding parameters may include at least one value or combination of the following: block size, block depth, block partitioning information, cell size, cell depth, cell partitioning information, quadtree partitioning flag, binary tree partitioning flag, binary tree partitioning direction, intra-frame prediction mode, intra-frame prediction direction, reference sample filtering method, prediction block boundary filtering method, filter taps, filter coefficients, inter-frame prediction mode, motion information, motion vectors, reference frame index, inter-frame prediction direction, inter-frame prediction indicator, reference frame list, motion vector prediction factor, motion vector candidate list, information on whether motion merging mode is used, motion merging candidates, motion merging candidate list, information on whether skip mode is used, and interpolation filter class. The information includes: type, motion vector size, accuracy of motion vector representation, transform type, transform size, information on whether an additional (secondary) transform is used, information on the presence of residual signals, code block style, code block flag, quantization parameters, quantization matrix, filter information within the loop, information on whether filters are applied within the loop, filter coefficients within the loop, binarization / debinarization method, context model, context bits, bypass bits, transform coefficients, transform coefficient levels, transform coefficient level scanning method, image display / output order, stripe identification information, stripe type, stripe partition information, parallel block identification information, parallel block type, parallel block partition information, frame type, bit depth, and information on luminance or chrominance signals.
[0107] The residual signal can be interpreted as the difference between the original signal and the predicted signal. Alternatively, the residual signal can be a signal generated by transforming the difference between the original signal and the predicted signal. Alternatively, the residual signal can be a signal generated by transforming and quantizing the difference between the original signal and the predicted signal. A residual block can be the residual signal of a block unit.
[0108] When the encoding device 100 performs encoding using inter-frame prediction, the encoded current frame can be used as a reference frame for another image that will be processed subsequently. Therefore, the encoding device 100 can decode the encoded current frame and store the decoded image as a reference frame. To perform decoding, inverse quantization and inverse transform can be performed on the encoded current frame.
[0109] The quantized coefficients can be dequantized by the dequantization unit 160 and inverse transformed by the inverse transform unit 170. The dequantized and inverse transformed coefficients can be added to the prediction block by the adder 175, thereby generating the reconstructed block.
[0110] The reconstructed block can be processed by filter unit 180. Filter unit 180 can apply at least one of deblocking filter, sample adaptive offset (SAO), and adaptive loop filter (ALF) to the reconstructed block or reconstructed image. Filter unit 180 may be referred to as a loop filter.
[0111] Deblocking filters remove block distortion that occurs at the boundaries between blocks. To determine whether a deblocking filter is being applied, it can be determined based on the pixels included in several rows or columns within the block. When a deblocking filter is applied to a block, a strong or weak filter can be applied depending on the desired deblocking filter strength. Furthermore, horizontal and vertical filtering can be processed in parallel when applying a deblocking filter.
[0112] Sample-adaptive offset adds an optimal offset value to a pixel value to compensate for coding errors. Sample-adaptive offset corrects the offset between the deblocked image and the original image for each pixel. To perform offset correction on a specific image, one can use a method that considers the edge information of each pixel to apply the offset, or use the following method: divide the image's pixels into a predetermined number of regions, determine the regions to be offset corrected, and apply the offset correction to the determined regions.
[0113] An adaptive loop filter performs filtering based on values obtained by comparing the reconstructed image with the original image. The pixels of the image can be partitioned into predetermined groups, a filter is determined for each group, and different filters can be performed for each group. Information regarding whether an adaptive loop filter is applied to the luminance signal can be sent for each coding unit (CU). The shape and filter coefficients of the adaptive loop filter applied to each block can vary. Furthermore, adaptive loop filters with the same form (fixed form) can be applied without considering the characteristics of the target block.
[0114] The reconstructed block after passing through the filter unit 180 can be stored in the reference frame buffer 190.
[0115] Figure 2 This is a block diagram illustrating the configuration of a decoding device according to an embodiment of the present invention.
[0116] Decoding device 200 can be a video decoding device or an image decoding device.
[0117] Reference Figure 2 The decoding device 200 may include an entropy decoding unit 210, an inverse quantization unit 220, an inverse transform unit 230, an intra-frame prediction unit 240, a motion compensation unit 250, an adder 255, a filter unit 260, and a reference frame buffer 270.
[0118] Decoding device 200 can receive bitstreams output from encoding device 100. Decoding device 200 can decode the bitstreams in intra-frame mode or inter-frame mode. In addition, decoding device 100 can generate reconstructed images by performing decoding and can output the reconstructed images.
[0119] When the prediction mode used in decoding is intra-frame mode, the switcher can be switched to intra-frame mode. When the prediction mode used in decoding is inter-frame mode, the switcher can be switched to inter-frame mode.
[0120] Decoding device 200 can obtain reconstructed residual blocks from the input bitstream and can generate prediction blocks. When the reconstructed residual blocks and prediction blocks are obtained, decoding device 200 can generate a reconstructed block as the decoding target block by adding the reconstructed residual blocks and the prediction blocks. The decoding target block can be referred to as the current block.
[0121] The entropy decoding unit 210 can generate symbols by performing entropy decoding on the bitstream according to a probability distribution. The generated symbols may include symbols with quantized transform coefficient levels. Here, the entropy decoding method may be similar to the entropy encoding method described above. For example, the entropy decoding method may be the inverse process of the entropy encoding method described above.
[0122] To decode the transform coefficient levels, the entropy decoding unit 210 can perform a transform coefficient scan, thereby transforming the coefficients from one-dimensional vector form to two-dimensional block form. For example, by scanning the coefficients of the block using an upper-right scan, the coefficients from one-dimensional vector form can be transformed into two-dimensional block form. Depending on the size of the transform unit and the intra-frame prediction mode, vertical and horizontal scans can be used instead of upper-right scans. That is, based on the size of the transform unit and the intra-frame prediction mode, it can be determined which scanning method among upper-right, vertical, and horizontal scans is used.
[0123] The quantized transform coefficient levels can be dequantized by dequantization unit 220 and inversely transformed by inverse transform unit 230. The quantized transform coefficient levels are dequantized and inversely transformed to generate a reconstruction residual block. Here, dequantization unit 220 can apply a quantization matrix to the quantized transform coefficient levels.
[0124] When the intra-frame mode is used, the intra-frame prediction unit 240 can generate a prediction block by performing spatial prediction, wherein the spatial prediction uses the pixel values of the previous decoded block adjacent to the decoded target block.
[0125] When inter-frame mode is used, motion compensation unit 250 can generate prediction blocks by performing motion compensation, which uses both the reference frame and motion vector stored in reference frame buffer 270. When the value of the motion vector is not an integer, motion compensation unit 250 can generate prediction blocks by applying an interpolation filter to a portion of the reference frame. To perform motion compensation, based on the coding unit, the motion compensation method used by the prediction unit in the coding unit can be determined among skip mode, merge mode, AMVP mode, and current frame reference mode. Furthermore, motion compensation can be performed according to the mode. Here, current frame reference mode can mean a prediction mode that uses a previously reconstructed region within the current frame with the decoding target block. The previously reconstructed region may not be adjacent to the decoding target block. To indicate the previously reconstructed region, a fixed vector can be used for the current frame reference mode. In addition, a flag or index indicating whether the decoding target block is a block decoded according to the current frame reference mode can be sent by a signal and can be obtained by using the reference frame index of the decoding target block. The current frame for the current frame reference mode can exist at a fixed position within the reference frame list for the decoded target block (e.g., the position with reference frame index 0 or the last position). Alternatively, the current frame can be variably located within the reference frame list; for this purpose, a reference frame index indicating the position of the current frame can be signaled.
[0126] The reconstructed residual block and the prediction block can be added together by adder 255. The resulting block, obtained by adding the reconstructed residual block and the prediction block, can be passed through filter unit 260. Filter unit 260 can apply at least one of deblocking filter, sample adaptive offset, and adaptive loop filter to the reconstructed block or reconstructed frame. Filter unit 260 can output the reconstructed frame. The reconstructed frame can be stored in reference frame buffer 270 and can be used for inter-frame prediction.
[0127] Figure 3 It is a schematic diagram illustrating the partitioning structure of an image when it is encoded and decoded. Figure 3 An embodiment of dividing a cell into multiple sub-cells is illustrated schematically.
[0128] To effectively partition an image, coding units (CUs) can be used in encoding and decoding. Here, a coding unit can mean a unit that is encoded, and a unit can be a combination of 1) a syntax element and 2) a block that includes image samples. For example, "partitioning of a unit" can mean "partitioning of the block associated with the unit". Block partitioning information can include information about the depth of the unit. Depth information can indicate the number of times the unit is partitioned or the degree to which the unit is partitioned, or both.
[0129] Reference Figure 3Image 300 is sequentially partitioned for each maximum coding unit (LCU), and the partitioning structure is determined for each LCU. Here, LCU and coding tree unit (CTU) have the same meaning. A unit may have depth information based on a tree structure and may be hierarchically partitioned. Each sub-unit from a partition may have depth information. The depth information indicates the number of times the unit is partitioned or the degree to which the unit is partitioned, or both; therefore, the depth information may include information about the size of the sub-units.
[0130] The partitioning structure can be understood as the distribution of coding units (CUs) in the LCU 310. A CU can be a unit used for effectively encoding an image. The distribution can be determined based on whether a CU will be partitioned multiple times (i.e., positive integers equal to or greater than 2, including 2, 4, 8, 16, etc.). The width and height of the partitioned CUs can be half the width and half the height of the original CU, respectively. Alternatively, depending on the number of partitions, the width and height of the partitioned CUs can be smaller than the width and height of the original CU, respectively. The partitioned CUs can be recursively partitioned into multiple further partitioned CUs, wherein, following the same partitioning method, the further partitioned CUs have a smaller width and height than the original partitioned CUs.
[0131] Here, the partitioning of a CU can be performed recursively until a predetermined depth is reached. Depth information can be information indicating the size of the CU and can be stored in each CU. For example, the depth of an LCU can be 0, and the depth of a minimum coding unit (SCU) can be a predetermined maximum depth. Here, an LCU can be a coding unit with the aforementioned maximum size, and an SCU can be a coding unit with the minimum size.
[0132] Whenever LCU 310 begins to be partitioned, and the width and height of the CU decrease through the partitioning operation, the depth of the CU increases by 1. In the case of a CU that cannot be partitioned, the CU can have a size of 2N×2N for each depth. In the case of a CU that can be partitioned, a CU of size 2N×2N can be partitioned into multiple CUs of size N×N. Each time the depth increases by 1, the size of N is halved.
[0133] For example, when a coding unit is partitioned into four sub-coding units, the width and height of one of the four sub-coding units can be half the width and half the height of the original coding unit, respectively. For example, when a 32×32 coding unit is partitioned into four sub-coding units, each of the four sub-coding units can have a size of 16×16. When a coding unit is partitioned into four sub-coding units, the coding unit can be partitioned in the form of a quadtree.
[0134] For example, when a coding unit is partitioned into two sub-coding units, the width or height of one of the two sub-coding units can be half the width or half the height of the original coding unit, respectively. For example, when a 32×32 coding unit is vertically partitioned into two sub-coding units, each of the two sub-coding units can have a size of 16×32. For example, when a 32×32 coding unit is horizontally partitioned into two sub-coding units, each of the two sub-coding units can have a size of 32×16. When a coding unit is partitioned into two sub-coding units, the coding unit can be partitioned in a binary tree format.
[0135] Reference Figure 3 The size of an LCU with a minimum depth of 0 can be 64×64 pixels, and the size of an SCU with a maximum depth of 3 can be 8×8 pixels. Here, a CU with 64×64 pixels (i.e., LCU) can be represented by depth 0, a CU with 32×32 pixels can be represented by depth 1, a CU with 16×16 pixels can be represented by depth 2, and a CU with 8×8 pixels (i.e., SCU) can be represented by depth 3.
[0136] Furthermore, partition information of a CU can indicate whether or not a CU will be partitioned. Partition information can be 1 bit. Partition information can be included in all CUs except the SCU. For example, when the partition information value is 0, the CU may not be partitioned; when the partition information value is 1, the CU may be partitioned.
[0137] Figure 4 This is a diagram showing the form of a prediction unit (PU) that can be included in a coding unit (CU).
[0138] The CUs that are no longer to be partitioned from the LCU can be partitioned into at least one prediction unit (PU). This process can also be referred to as partitioning.
[0139] A PU can be the basic unit used for prediction. A PU can be encoded and decoded according to any of the following modes: skip mode, inter-frame mode, and intra-frame mode. A PU can be partitioned in various forms according to the said mode.
[0140] Furthermore, the coding unit may not be partitioned into multiple prediction units, and the coding unit and the prediction unit may have the same size.
[0141] like Figure 4 As shown, in skip mode, the CU may not be partitioned. In skip mode, a 2N×2N pattern 410 with the same size as the unpartitioned CU can be supported.
[0142] In inter-frame mode, the CU supports eight partition modes. For example, in inter-frame mode, it supports 2N×2N mode 410, 2N×N mode 415, N×2N mode 420, N×N mode 425, 2N×nU mode 430, 2N×nD mode 435, nL×2N mode 440, and nR×2N mode 445. In intra-frame mode, it supports 2N×2N mode 410 and N×N mode 425.
[0143] A coding unit can be partitioned into one or more prediction units. A prediction unit can be partitioned into one or more sub-prediction units.
[0144] For example, when a prediction unit is partitioned into four sub-prediction units, the width and height of one of the four sub-prediction units can be half the width and half the height of the original prediction unit. For example, when a 32×32 prediction unit is partitioned into four sub-prediction units, each of the four sub-prediction units can have a size of 16×16. When a prediction unit is partitioned into four sub-prediction units, the prediction unit can be partitioned in a quadtree format.
[0145] For example, when a prediction unit is partitioned into two sub-prediction units, the width or height of one of the two sub-prediction units can be half the width or half the height of the original prediction unit. For example, when a 32×32 prediction unit is vertically partitioned into two sub-prediction units, each of the two sub-prediction units can have a size of 16×32. For example, when a 32×32 prediction unit is horizontally partitioned into two sub-prediction units, each of the two sub-prediction units can have a size of 32×16. When a prediction unit is partitioned into two sub-prediction units, the prediction unit can be partitioned in a binary tree format.
[0146] Figure 5 This is a diagram showing the form of a transform unit (TU) that can be included in an encoding unit (CU).
[0147] A transform unit (TU) can be a basic unit within a CU used for transforming, quantizing, inverse transforming, and dequantizing. A TU can have a square or rectangular shape, etc. A TU can be independently determined according to the size or form of the CU, or both.
[0148] The CUs that are no longer partitioned from the LCU can be partitioned into at least one TU. Here, the partitioning structure of the TU can be a quadtree structure. For example, as... Figure 5As shown, a CU 510 can be partitioned once or more according to a quadtree structure. A CU being partitioned at least once is referred to as recursive partitioning. By partitioning, a CU 510 can be formed from TUs of different sizes. Optionally, a CU can be partitioned into at least one TU based on the number of vertical lines or horizontal lines used to partition the CU, or both. A CU can be partitioned into TUs that are symmetrical to each other, or it can be partitioned into TUs that are asymmetrical to each other. To partition a CU into symmetrical TUs, information about the size / shape of the TUs can be transmitted by signals and can be obtained from the size / shape information of the CUs.
[0149] Furthermore, coding units may not be partitioned into transform units, and coding units and transform units may have the same size.
[0150] A coding unit can be partitioned into at least one transform unit, and a transform unit can be partitioned into at least one sub-transform unit.
[0151] For example, when a transform unit is partitioned into four sub-transform units, the width and height of one of the four sub-transform units can be half the width and half the height of the original transform unit, respectively. For example, when a 32×32 transform unit is partitioned into four sub-transform units, each of the four sub-transform units can have a size of 16×16. When a transform unit is partitioned into four sub-transform units, the transform unit can be partitioned in a quadtree format.
[0152] For example, when a transform unit is partitioned into two sub-transform units, the width or height of one of the two sub-transform units can be half the width or half the height of the original transform unit, respectively. For example, when a 32×32 transform unit is vertically partitioned into two sub-transform units, each of the two sub-transform units can have a size of 16×32. For example, when a 32×32 transform unit is horizontally partitioned into two sub-transform units, each of the two sub-transform units can have a size of 32×16. When a transform unit is partitioned into two sub-transform units, the transform unit can be partitioned in a binary tree format.
[0153] When performing a transformation, the residual block can be transformed using at least one of a predetermined transformation method. For example, the predetermined transformation methods may include Discrete Cosine Transform (DCT), Discrete Sine Transform (DST), KLT, etc. Which transformation method is applied to the residual block can be determined by using at least one of the following: inter-frame prediction mode information of the prediction unit, intra-frame prediction mode information of the prediction unit, and the size / shape of the transformed block. Information indicating the transformation method can be transmitted using a signal.
[0154] Figure 6 This is a diagram illustrating an embodiment of the processing used to explain intra-frame prediction.
[0155] Intra-frame prediction modes can be non-directional or directional. Non-directional modes can be DC or planar modes. Directional modes can be prediction modes with a specific direction or angle, and the number of directional modes can be M, equal to or greater than 1. A directional mode can be indicated by at least one of a mode number, a mode value, and a mode angle.
[0156] The number of intra-frame prediction modes can be equal to or greater than 1, N, including non-directional and directional modes.
[0157] The number of intra-prediction modes can vary depending on the block size. For example, when the block size is 4×4 or 8×8, the number of intra-prediction modes can be 67; when the block size is 16×16, the number of intra-prediction modes can be 35; when the block size is 32×32, the number of intra-prediction modes can be 19; and when the block size is 64×64, the number of intra-prediction modes can be 7.
[0158] The number of intra-prediction modes can be fixed at N, regardless of the block size. For example, the number of intra-prediction modes can be fixed at at least one of 35 or 67, regardless of the block size.
[0159] The number of intra-frame prediction modes can vary depending on the type of color component. For example, the number of prediction modes can vary depending on whether the color component is a luminance signal or a chrominance signal.
[0160] Intra-frame coding and / or decoding can be performed using sample values or coding parameters included in the reconstructed neighboring blocks.
[0161] In order to encode / decode the current block according to intra-frame prediction, it is possible to identify whether samples included in the reconstructed neighboring blocks can be used as reference samples for encoding / decoding the target block. When there are samples that cannot be used as reference samples for encoding / decoding the target block, the sample values are copied and / or interpolated to the samples that cannot be used as reference samples by using at least one of the samples included in the reconstructed neighboring blocks, thereby making the samples that cannot be used as reference samples usable as reference samples for encoding / decoding the target block.
[0162] In intra-frame prediction, filters can be applied to at least one of reference samples or prediction samples based on at least one of the intra-frame prediction mode and the size of the encoding / decoding target block. Here, the encoding / decoding target block can mean the current block, and can mean at least one of the encoding block, prediction block, and transform block. The type of filter applied to the reference sample or prediction sample can vary depending on at least one of the intra-frame prediction mode or the size / shape of the current block. The type of filter can vary depending on at least one of the number of filter taps, filter coefficient values, or filter strength.
[0163] In the non-directional plane mode of intra-frame prediction mode, when generating a prediction block for an encoded / decoded target block, the sample value in the prediction block can be generated by using the weighted sum of the upper reference sample of the current sample, the left reference sample of the current sample, the upper right reference sample of the current block, and the lower left reference sample of the current block, based on the sample position.
[0164] In the non-directional DC mode of intra-frame prediction, when generating a prediction block for an encoded / decoded target block, the prediction block can be generated using the average of the upper reference sample and the left reference sample of the current block. Furthermore, filtering can be performed on one or more upper rows and one or more left columns of the encoded / decoded block adjacent to the reference sample using the reference sample values.
[0165] In the case of multiple orientation modes (angle modes) within intra-frame prediction modes, prediction blocks can be generated using upper-right reference samples and / or lower-left reference samples, and these multiple orientation modes can have different orientations. Real-valued interpolation can be performed to generate prediction sample values.
[0166] To perform intra-prediction, the intra-prediction mode of the current prediction block can be predicted from the intra-prediction modes of neighboring prediction blocks. When predicting the intra-prediction mode of the current prediction block using mode information predicted from neighboring intra-prediction modes, if the current prediction block and neighboring prediction blocks have the same intra-prediction mode, this information can be transmitted using predetermined flag information. If the intra-prediction mode of the current prediction block differs from that of neighboring prediction blocks, entropy coding can be performed to encode the intra-prediction mode information of the target block being encoded / decoded.
[0167] Figure 7 This is a diagram illustrating an embodiment of the processing used to explain inter-frame prediction.
[0168] Figure 7 The quadrilaterals shown can indicate images (or screens). Furthermore, Figure 7The arrows indicate the prediction direction. That is, an image can be encoded or decoded, or encoded and decoded, depending on the prediction direction. Based on the encoding type, each image can be classified as an I-frame (intra-frame), P-frame (one-way prediction frame), B-frame (two-way prediction frame), etc. Each frame can be encoded and decoded according to its own encoding type.
[0169] When the target image is an I-frame, the frame itself can be intra-coded without inter-frame prediction. When the target image is a P-frame, the image can be encoded using inter-frame prediction or motion compensation performed only on the forward reference frame. When the target image is a B-frame, the image can be encoded using inter-frame prediction or motion compensation performed on both the forward and backward reference frames. Alternatively, the image can be encoded using inter-frame prediction or motion compensation performed on either the forward or backward reference frame. Here, when inter-frame prediction mode is used, the encoder can perform inter-frame prediction or motion compensation, and the decoder can perform motion compensation in response to the encoder. Images of P-frames and B-frames that are encoded or decoded using reference frames, or encoded and decoded, can be considered as images used for inter-frame prediction.
[0170] The inter-frame prediction according to the embodiments will be described in detail below.
[0171] Inter-frame prediction or motion compensation can be performed using both reference frames and motion information. Furthermore, inter-frame prediction can utilize the skip mode described above.
[0172] The reference frame can be at least one of the previous and subsequent frames of the current frame. Here, inter-frame prediction can predict blocks of the current frame based on the reference frame. Here, the reference frame can be the image used when predicting blocks. Here, the region within the reference frame can be indicated by using a reference frame index (refIdx) indicating the reference frame, motion vectors, etc.
[0173] Inter-frame prediction can select a reference frame and a reference block within that frame that is related to the current block. The predicted block for the current block can be generated using the selected reference block. The current block can be a block within the current frame that is the current encoding target or the current decoding target.
[0174] Motion information can be obtained from inter-frame prediction processing by encoding device 100 and decoding device 200. Furthermore, the obtained motion information can be used when performing inter-frame prediction. Here, encoding device 100 and decoding device 200 can improve encoding efficiency or decoding efficiency, or both, by using motion information of reconstructed neighboring blocks or co-occurring blocks (col blocks), or both. A col block can be a block within a previously reconstructed co-occurring frame (col frame) related to the spatial location of the encoded / decoded target block. Reconstructed neighboring blocks can be blocks within the current frame, as well as blocks previously reconstructed through encoding or decoding, or both. Furthermore, a reconstructed block can be a block adjacent to the encoded / decoded target block, or a block located at the outer corner of the encoded / decoded target block, or both. Here, a block located at the outer corner of the encoded / decoded target block can be a block vertically adjacent to a horizontally adjacent neighboring block of the encoded / decoded target block. Alternatively, a block located at the outer corner of the encoded / decoded target block can be a block horizontally adjacent to a vertically adjacent neighboring block of the encoded / decoded target block.
[0175] Encoding device 100 and decoding device 200 can respectively determine blocks existing within the col frame at positions related to the encoding / decoding target block space, and can determine predefined relative positions based on the determined blocks. The predefined relative position can be an internal or external position of the block existing at a position related to the encoding / decoding target block space, or both an internal and external position. Furthermore, encoding device 100 and decoding device 200 can respectively obtain col blocks based on the determined predefined relative positions. Here, the col frame can be one of at least one reference frame included in a reference frame list.
[0176] The method for obtaining motion information can vary depending on the prediction mode of the encoded / decoded target block. For example, prediction modes applied to inter-frame prediction may include Advanced Motion Vector Prediction (AMVP), merging mode, etc. Here, merging mode can be referred to as motion merging mode.
[0177] For example, when AMVP is applied as a prediction mode, encoding device 100 and decoding device 200 can generate motion vector candidate lists by reconstructing motion vectors of neighboring blocks or motion vectors of col blocks, or both. The motion vectors of reconstructing neighboring blocks or motion vectors of col blocks, or both, can be used as motion vector candidates. Here, the motion vectors of col blocks can be referred to as temporal motion vector candidates, and the motion vectors of reconstructing neighboring blocks can be referred to as spatial motion vector candidates.
[0178] Encoding device 100 can generate a bitstream, which may include motion vector candidate indices. That is, encoding device 100 can generate a bitstream by entropy encoding the motion vector candidate indices. The motion vector candidate indices can indicate the optimal motion vector candidate selected from the motion vector candidates included in the motion vector candidate list. The motion vector candidate indices can be transmitted from encoding device 100 to decoding device 200 via the bitstream.
[0179] The decoding device 200 can entropy decode the motion vector candidate index from the bit stream, and can select the motion vector candidate of the target block from the motion vector candidates included in the motion vector candidate list by using the entropy-decoded motion vector candidate index.
[0180] Encoding device 100 can calculate the motion vector difference (MVD) between the motion vector of the target block and the motion vector candidates, and entropy encode the MVD. The bitstream may include the entropy-encoded MVD. The MVD can be sent from encoding device 100 to decoding device 200 via the bitstream. Here, decoding device 200 can entropy decode the MVD received from the bitstream. Decoding device 200 can obtain the motion vector of the target block by summing the decoded MVD and the motion vector candidates.
[0181] The bitstream may include a reference frame index indicating a reference frame, and the reference frame index may be entropy encoded and transmitted from the encoding device 100 to the decoding device 200 via the bitstream. The decoding device 200 may predict the motion vector of the target block to be decoded using motion information from neighboring blocks, and may obtain the motion vector of the target block to be decoded using the predicted motion vector and the motion vector difference. The decoding device 200 may generate a predicted block of the target block to be decoded based on the obtained motion vector and the reference frame index information.
[0182] As another method for obtaining motion information, a merging mode is used. A merging mode can mean merging the motion of multiple blocks. A merging mode can also mean that the motion information of one block is applied to another block. When a merging mode is applied, the encoding device 100 and the decoding device 200 can generate a merging candidate list respectively by reconstructing the motion information of neighboring blocks or the motion information of the col block, or both. The motion information may include at least one of the following: 1) a motion vector, 2) a reference frame index, and 3) an inter-frame prediction indicator. The prediction indicator may indicate unidirectional (L0 prediction, L1 prediction) or bidirectional prediction.
[0183] Here, the merging mode can be applied to each CU or each PU. When the merging mode is executed in each CU or each PU, the encoding device 100 can generate a bitstream by entropy decoding of predefined information and can send the bitstream to the decoding device 200. The bitstream may include the predefined information. The predefined information may include: 1) a merging flag indicating whether the merging mode is executed for each block partition; and 2) a merging index indicating which block among the neighboring blocks adjacent to the target block is merged. For example, the neighboring blocks adjacent to the target block may include the left neighboring block of the target block, the upper neighboring block of the target block, the time neighboring block of the target block, etc.
[0184] The merge candidate list indicates a list storing motion information. Furthermore, the merge candidate list can be generated before executing the merge mode. The motion information stored in the merge candidate list can be at least one of the following: motion information of neighboring blocks adjacent to the encoding / decoding target block, motion information of co-occurring blocks in the reference frame related to the encoding / decoding target block, newly generated motion information through pre-combining of motion information existing in the motion candidate list, and zero merge candidates. Here, the motion information of neighboring blocks adjacent to the encoding / decoding target block can be referred to as spatial merge candidates. The motion information of co-occurring blocks in the reference frame related to the encoding / decoding target block can be referred to as temporal merge candidates.
[0185] A skip mode can be a mode that applies the mode information of neighboring blocks themselves to the encoded / decoded target block. The skip mode can be one of the modes used for inter-frame prediction. When a skip mode is used, the encoding device 100 can entropy-encode information about which block's motion information is used as the motion information for the encoded target block, and can send this information to the decoding device 200 via a bitstream. The encoding device 100 may not send other information (e.g., syntax element information) to the decoding device 200. The syntax element information may include at least one of motion vector difference information, a coded block flag, and a transform coefficient level.
[0186] The residual signal generated after intra-frame or inter-frame prediction can be transformed to the frequency domain through a transform process as part of the quantization process. Here, the initial transform can use DCT Type 2 (DCT-II) and various DCT and DST kernels. These transform kernels can perform separable transforms on the residual signal for 1D transforms along the horizontal and / or vertical directions, or they can perform 2D non-separable transforms on the residual signal.
[0187] For example, in the case of 1D transforms, the DCT and DST types used in the transform can be DCT-II, DCT-V, DCT-VIII, DST-I, and DST-VII as shown in the following tables. For example, as shown in Tables 1 and 2, the DCT or DST types used in the transform by synthesizing transform sets can be obtained.
[0188] [Table 1]
[0189] Transform set Transformation 0 DST_VII, DCT-VIII 1 DST-VII, DST-I 2 DST-VII, DCT-V
[0190] [Table 2]
[0191] Transform set Transformation 0 DST_VII, DCT-VIII, DST-I 1 DST-VII, DST-I, DCT-VIII 2 DST-VII, DCT-V, DST-I
[0192] For example, such as Figure 8 As shown, different transform sets are defined for the horizontal and vertical directions according to the intra-prediction mode. Next, the encoder / decoder can perform transforms and / or inverse transforms using the intra-prediction mode of the current encoding / decoding target block and the transforms of the associated transform sets. In this case, entropy encoding / decoding is not performed on the transform sets, and the encoder / decoder can define the transform sets according to the same rules. In this case, entropy encoding / decoding indicating which transform among the transforms of the transform set is used can be performed. For example, when the block size is equal to or less than 64×64, three transform sets are synthesized according to the intra-prediction mode, as shown in Table 2, and three transforms are used for each horizontal and vertical transform to combine and perform a total of nine multi-transform methods. Next, the residual signal is encoded / decoded using the optimal transform method, thereby improving coding efficiency. Here, truncated unary binaryization can be used to entropy encode / decode information about which transform method among the three transforms in a transform set is used. Here, to perform at least one of the vertical and horizontal transforms, entropy encoding / decoding can be performed on information indicating which transform among the transforms of the transform set is used.
[0193] After completing the first transformation described above, as follows Figure 9 As shown, the encoder can perform a secondary transformation on the transform coefficients to improve energy concentration. The secondary transformation can perform a separable transformation for 1D transformation along the horizontal and / or vertical directions, or a non-separable 2D transformation. The transformation information used can be transmitted or obtained by the encoder / decoder based on current and neighboring encoding information. For example, as with 1D transformation, a transform set for the secondary transformation can be defined. Entropy encoding / decoding is not performed on this transform set, and the encoder / decoder can define the transform set according to the same rules. In this case, information indicating which transform among the transforms in the transform set is used can be transmitted, and this information can be applied to at least one residual signal via intra-frame prediction or inter-frame prediction.
[0194] At least one of the number or type of transform candidates varies for each transform set. At least one of the number or type of transform candidates may be determined differently based on at least one of the following: the location, size, partitioning form of the block (CU, PU, TU, etc.), and the orientation / non-orientation of the prediction mode (intra-frame / inter-frame mode) or intra-frame prediction mode.
[0195] The decoder can perform a second inverse transform based on whether the second inverse transform has been performed, and can perform a first inverse transform based on whether the first inverse transform has been performed from the result of the second inverse transform.
[0196] The aforementioned first and second transformations can be applied to at least one signal component in the luminance / chrominance components, or can be applied according to the size / shape of any coded block. Entropy encoding / decoding can be performed on indices indicating whether the first / second transformation is used and both the first / second transformation used in any coded block. Optionally, these indices can be obtained by the encoder / decoder by default based on at least one current / nearby coded information.
[0197] The residual signal obtained after intra-frame prediction or inter-frame prediction is quantized after the first and / or second transforms, and the quantized transform coefficients are entropy-coded. Here, as... Figure 10 As shown, the quantized transform coefficients can be scanned in the diagonal, vertical and horizontal directions based on at least one of the intra-frame prediction mode or the size / shape of the minimum block.
[0198] Furthermore, the quantized transform coefficients that have undergone entropy decoding can be arranged in blocks by inverse scanning, and at least one of dequantization or inverse transform can be performed on the relevant blocks. Here, as a method of inverse scanning, at least one of diagonal scanning, horizontal scanning, and vertical scanning can be performed.
[0199] For example, when the current coded block size is 8×8, the residual signal for the 8×8 block can be subjected to a first transform, a second transform, and quantization. Then, according to... Figure 10 At least one of the three scanning order methods shown performs scanning and entropy encoding on the quantized transform coefficients for each of the four 4×4 sub-blocks. Furthermore, an inverse scan can be performed on the quantized transform coefficients by performing entropy decoding. The quantized transform coefficients that have undergone inverse scanning become transform coefficients after dequantization, and at least one of a second inverse transform or a first inverse transform is performed, thereby generating the reconstructed residual signal.
[0200] In video encoding processing, a block can be like... Figure 11The blocks are partitioned as shown, and indicators corresponding to the partition information can be transmitted using signals. Here, the partition information can be at least one of the following: a partition flag (split_flag), a quadtree / binary tree flag (QB_flag), a quadtree partition flag (quadtree_flag), a binary tree partition flag (binarytree_flag), and a binary tree partition type flag (Btype_flag). Here, split_flag indicates whether the block is partitioned, QB_flag indicates whether the block is partitioned in quadtree or binary tree form, quadtree_flag indicates whether the block is partitioned in quadtree form, binarytree_flag indicates whether the block is partitioned in binary tree form, and Btype_flag indicates whether the block is vertically or horizontally partitioned in the case of binary tree partitioning.
[0201] When the partition flag is 1, it indicates that the partition is executed; when the partition flag is 0, it indicates that the partition is not executed. In the case of the quadtree / binary tree flag, 0 indicates a quadtree partition, and 1 indicates a binary tree partition. Optionally, 0 can indicate a binary tree partition, and 1 can indicate a quadtree partition. In the case of the binary tree partition type flag, 0 can indicate a horizontal partition, and 1 can indicate a vertical partition. Optionally, 0 can indicate a vertical partition, and 1 can indicate a horizontal partition.
[0202] For example, it can be obtained by sending at least one of the quadtree_flag, binarytree_flag, and Btype_flag as shown in Table 3 using a signal. Figure 11 Partition information.
[0203] [Table 3]
[0204]
[0205] For example, it can be obtained by transmitting at least one of the split_flag, QB_flag, and Btype_flag as shown in Table 4 using a signal. Figure 11 Partition information.
[0206] [Table 4]
[0207]
[0208] The partitioning method can be performed either in quadtree or binary tree form only, depending on the size / shape of the block. In this case, `split_flag` can be interpreted as a flag indicating whether partitioning is performed in quadtree or binary tree form. The size / shape of the block can be obtained from the block's depth information, which can be transmitted using signals.
[0209] When the block size is within a predetermined range, partitioning can be performed only in quadtree form. Here, the predetermined range can be defined as at least one of the largest block size or the smallest block size that can be partitioned in quadtree form. Information indicating the largest / minimum block size that allows quadtree-form partitioning can be transmitted via signaling through the bitstream, and this information can be transmitted via signaling in units of at least one of sequence, frame parameters, or stripes (segments). Alternatively, the largest / minimum block size can be a fixed size preset in the encoder / decoder. For example, when the block size ranges from 256x256 to 64x64, partitioning can be performed only in quadtree form. In this case, split_flag can be interpreted as a flag indicating whether partitioning is performed in quadtree form.
[0210] When the block size is within a predetermined range, partitioning can be performed solely in a binary tree format. Here, the predetermined range can be defined as at least one of the largest block size or the smallest block size that can be partitioned in a binary tree format. Information indicating the largest / minimum block size that allows binary tree partitioning can be transmitted via a bitstream signal, and this information can be transmitted via signal in units of at least one of sequence, frame parameters, or stripes (segments). Alternatively, the largest / minimum block size can be a fixed size preset in the encoder / decoder. For example, when the block size ranges from 16x16 to 8x8, partitioning can be performed solely in a binary tree format. In this case, `split_flag` can be interpreted as a flag indicating whether partitioning is performed in a binary tree format.
[0211] After partitioning a block according to a binary tree, when the partitioned block is further partitioned, partitioning can be performed only according to the binary tree.
[0212] When the width or length of a partitioned block cannot be further partitioned, at least one indicator may not be transmitted by signal.
[0213] In addition to binary tree partitioning based on quadtrees, quadtree-based partitioning can be performed after binary tree partitioning.
[0214] Based on the above description, the method for encoding / decoding images using motion vectors according to the present invention will be disclosed in detail.
[0215] Figure 12 This is a flowchart illustrating a method for encoding an image according to the present invention. Figure 13 This is a flowchart illustrating a method for decoding an image according to the present invention.
[0216] Reference Figure 12 In step S1201, the encoding device may obtain motion vector candidates, and in step S1202, it may generate a motion vector candidate list based on the obtained motion vector candidates. When the motion vector candidate list is generated, in step S1203, motion vectors can be determined by using the generated motion vector candidates. In step S1204, motion compensation can be performed by using the motion vectors. Next, in step S1205, the encoding device may perform entropy encoding on the information regarding motion compensation.
[0217] Reference Figure 13 In step S1301, the decoding device performs entropy decoding on the motion compensation information received from the encoding device, and in step S1302, it obtains motion vector candidates. Furthermore, in step S1303, the decoding device generates a motion vector candidate list based on the obtained motion vector candidates, and in step S1304, it determines a motion vector using the generated motion vector candidate list. Next, in step S1305, the decoding device performs motion compensation using the motion vectors.
[0218] The details will be disclosed below. Figure 12 and Figure 13 The steps are shown in the diagram.
[0219] First, the operations S1201 and S1302 for obtaining motion vector candidates will be disclosed in detail.
[0220] The motion vector candidates for the current block may include at least one of spatial motion vector candidates or temporal motion vector candidates.
[0221] The spatial motion vector of the current block can be obtained from the reconstructed blocks adjacent to the current block. For example, the motion vector of the reconstructed block adjacent to the current block can be determined as a candidate spatial motion vector for the current block.
[0222] Figure 14 This is a diagram illustrating an example of obtaining spatial motion vector candidates for the current block.
[0223] Reference Figure 14 The candidate spatial motion vector for the current block can be obtained from the neighboring blocks adjacent to the current block X. Here, the neighboring blocks adjacent to the current block may include at least one of the following blocks: block B1 adjacent to the top of the current block, block A1 adjacent to the left side of the current block, block B0 adjacent to the upper right corner of the current block, block B2 adjacent to the upper left corner of the current block, and block A0 adjacent to the lower left corner of the current block.
[0224] When a motion vector exists at a neighboring block adjacent to the current block, the motion vector of that neighboring block can be determined as a spatial motion vector candidate for the current block. The existence of a neighboring block's motion vector or its suitability as a spatial motion vector candidate for the current block can be determined based on factors such as the presence of a neighboring block or whether the neighboring block is encoded via inter-frame prediction. Here, the existence of a neighboring block's motion vector or its suitability as a spatial motion vector candidate for the current block can be determined according to a predetermined priority. For example, in... Figure 14 In this context, the availability of motion vectors can be determined by the order of the blocks at positions A0, A1, B0, B1, and B2.
[0225] When the reference image of the current block differs from the reference image of a neighboring block that has motion vectors, the scaled motion vectors of the neighboring blocks can be determined as spatial motion vector candidates for the current block. Here, scaling can be performed based on at least one of the distance between the current image and the reference image referenced by the current block, and the distance between the current image and the reference images referenced by the neighboring blocks. For example, the motion vectors of the neighboring blocks can be scaled according to the difference between the distance between the current image and the reference image referenced by the current block and the distance between the current image and the reference images referenced by the neighboring blocks, thereby obtaining spatial motion vector candidates for the current block.
[0226] Even if the reference image list of the current block is different from that of the neighboring blocks, it is possible to determine whether to scale the motion vectors of the neighboring blocks based on whether the reference image of the current block is the same as that of the neighboring blocks. Here, the reference image list may include at least one of List0 (L0), List1 (L1), List2 (L2), List3 (L3), etc.
[0227] In summary, spatial motion vector candidates can be obtained based on at least one of the following: availability of neighboring blocks, whether neighboring blocks are encoded in intra-prediction mode, whether neighboring blocks have the same list of reference frames as the current block, or whether neighboring blocks have the same reference image as the current block. When neighboring blocks are available and are not encoded in intra-prediction mode, spatial motion vector candidates for the current block can be generated using the methods shown in Table 5 below.
[0228] [Table 5]
[0229]
[0230] As shown in Table 5, even if the reference image list of the current block is different from that of the neighboring blocks, the motion vector of the neighboring block can still be determined as a candidate for the spatial motion vector of the current block when the current block and the neighboring blocks have the same reference image.
[0231] Conversely, when the reference image of the current block is different from the reference image of the neighboring blocks, the motion vectors of the neighboring blocks can be scaled to be identified as candidates for the spatial motion vectors of the current block, regardless of whether the reference image list of the current block is the same as that of the neighboring blocks.
[0232] When obtaining spatial motion vector candidates for the current block from neighboring blocks, the order in which the spatial motion vector candidates for the current block are obtained can be determined based on whether the current block and neighboring blocks have the same reference image. For example, it is preferable to obtain spatial vector candidates from neighboring blocks that have the same reference image as the current block, and when the number of obtained spatial motion vector candidates (or the number of obtained motion vector candidates) is equal to or less than a preset maximum value, spatial vector candidates can be obtained from neighboring blocks that have a different reference image than the current block.
[0233] Alternatively, spatial motion vector prediction candidates for the current block can be determined based on whether the current block and neighboring blocks have the same reference image and based on the position of the neighboring blocks.
[0234] For example, depending on whether the reference image is the same, spatial motion vector candidates for the current block can be obtained from neighboring blocks A0 and A1 adjacent to the left of the current block. Next, depending on whether the reference image is the same, spatial motion vector candidates for the current block can be obtained from neighboring blocks B0, B1, and B2 adjacent to the top of the current block. Table 6 shows the order in which the spatial motion vector candidates for the current block are obtained.
[0235] [Table 6]
[0236]
[0237] The maximum number of spatial motion vector candidates for the current block can be preset to the same value in both the encoding and decoding devices. Alternatively, the encoding device can encode information indicating the maximum number of spatial motion vector candidates for the current block and send this information to the decoding device via a bitstream. For example, the encoding device can encode "maxNumSpatialMVPCand," indicating the maximum number of spatial motion vector candidates for the current block, and send "maxNumSpatialMVPCand" to the decoding device via a bitstream. Here, "maxNumSpatialMVPCand" can be set to a positive integer including zero. For example, "maxNumSpatialMVPCand" can be set to 2.
[0238] Candidate temporal motion vectors for the current block can be obtained from reconstructed blocks included in the co-frame of the current frame. Here, the co-frame is a frame that has been encoded / decoded before the current frame, and may be a frame with a different temporal order than the current frame.
[0239] Figure 15 This is a diagram illustrating an example of obtaining candidates for the time motion vector of the current block.
[0240] Reference Figure 15 In a co-positional frame of the current frame, a candidate time motion vector for the current block can be obtained from a block whose outer position corresponds to the block whose spatial position is the same as the current block X, or from a block whose inner position corresponds to the block whose spatial position is the same as the current block X. For example, a candidate time motion vector for the current block X can be obtained from block H (where block H is adjacent to the lower left corner of block C, which corresponds to the block whose spatial position is the same as the current block), or from block C3, which includes the center point of said block C. Blocks such as H or C3 used to obtain the candidate time motion vector for the current block can be referred to as "co-positional blocks".
[0241] When the temporal motion vector candidate of the current block can be obtained from block H, which includes a position outside of block C, block H can be set as the co-occurring block of the current block. In this case, the temporal motion vector of the current block can be obtained based on the motion vector of block H. Conversely, when the temporal motion vector candidate of the current block cannot be obtained from block H, block C3, which includes a position inside block C, can be set as the co-occurring block of the current block. In this case, the temporal motion vector of the current block can be included based on the motion vector of block C3. When the temporal motion vector candidate of the current block cannot be obtained from both block H and block C3 (e.g., when blocks H and C3 are intra-coded), the temporal motion vector candidate of the current block may be obtained from a block that has a different position than blocks H and C3, or it may be obtained from a block that has a different position than blocks H and C3.
[0242] As another example, the temporal motion vector candidates for the current block can be obtained from multiple blocks in the same frame. For example, multiple temporal motion vector candidates for the current block can be obtained from blocks H and C3.
[0243] exist Figure 15 In this context, the candidate temporal motion vector for the current block can be obtained from the block adjacent to its lower-left corner or from the block that includes the center point of the corresponding block. However, the location of the block used to obtain the candidate temporal motion vector for the current block is not limited to... Figure 15Examples are shown in the figure. For example, the candidate time motion vector for the current block may be obtained from a block adjacent to the top / bottom boundary, left / right boundary, or corner of the co-block, or it may be obtained from a block that includes a specific location within the co-block (e.g., a block adjacent to the corner boundary of the co-block).
[0244] The candidate time motion vector for the current block can be determined based on a list of reference frames (or prediction directions) of blocks located inside or outside the co-located block, as well as the current block.
[0245] For example, when the list of reference frames available for the current block is L0 (i.e., the intra-prediction indicator indicates PRED_L0), the motion vectors of blocks within or outside the co-block that use L0 as their reference frame can be obtained as candidates for the temporal motion vectors of the current block. In other words, when the list of reference frames available for the current block is LX (where X is an integer such as 0, 1, 2, or 3 indicating the index of the reference frame list), the motion vectors of blocks within or outside the co-block that use LX as their reference frame can be obtained as candidates for the temporal motion vectors of the current block.
[0246] Even if the current block uses multiple reference screen lists, the candidate for the current block's time motion vector can be determined based on whether the current block has the same reference screen list as blocks located inside or outside its co-located block.
[0247] For example, when performing bidirectional prediction in the current block (i.e., when the inter-frame prediction indicator is PRED_BI), the motion vectors of blocks within or outside the co-located block that use L0 and L1 as reference frames can be obtained as candidates for the temporal motion vectors of the current block. When performing tridirectional prediction in the current block (i.e., when the inter-frame prediction indicator is PRED_TRI), the motion vectors of blocks within or outside the co-located block that use L0, L1, and L2 as reference frames can be obtained as candidates for the temporal motion vectors of the current block. When performing quadridirectional prediction in the current block (i.e., when the inter-frame prediction indicator is PRED_QUAD), the motion vectors of blocks within or outside the co-located block that use L0, L1, L2, and L3 as reference frames can be obtained as candidates for the temporal motion vectors of the current block.
[0248] Alternatively, when the current block is set to perform multi-directional prediction via a reference picture, the temporal motion prediction vector candidate for the current block can be determined based on whether the external blocks have the same list of reference pictures and the same prediction direction as the current block.
[0249] For example, when the current block performs bidirectional prediction against the reference frame list L0 (i.e., when the inter-frame prediction indicator for list L0 is PRED_BI), the motion vectors of blocks located inside or outside the co-located block that perform bidirectional prediction against L0 using L0 as the reference frame can be obtained as candidates for the temporal motion vectors of the current block.
[0250] Furthermore, time motion vector candidates can be obtained based on at least one encoded parameter.
[0251] When the number of spatial motion vector candidates obtained is less than the maximum number of motion vector candidates, temporal motion vector candidates can be obtained in advance. Therefore, when the number of spatial motion vector candidates obtained is equal to the maximum number of motion vector candidates, the operation of obtaining temporal motion vector candidates can be omitted.
[0252] For example, when the maximum number of motion vector candidates is 2 and the two obtained spatial motion vector candidates have different values, the operation of obtaining temporal motion vector candidates can be omitted.
[0253] As another example, the temporal motion vector candidates for the current block can be obtained based on the maximum number of temporal motion vector candidates. Here, the maximum number of temporal motion vector candidates can be preset to have the same value in both the encoding and decoding devices. Alternatively, information indicating the maximum number of temporal motion vector candidates for the current block can be encoded and sent to the decoding device via a bitstream. For example, the encoding device can encode "maxNumTemporalMVPCand" indicating the maximum number of temporal motion vector candidates for the current block and send "maxNumTemporalMVPCand" to the decoding device via a bitstream. Here, "maxNumTemporalMVPCand" can be set to a positive integer including zero. For example, "maxNumTemporalMVPCand" can be set to 1.
[0254] When the distance between the current frame (including the current block) and the reference frame of the current block is different from the distance between the co-frame (including the co-frame) and the reference frame of the co-frame, the candidate time motion vector of the current block can be obtained by scaling the motion vector of the co-frame.
[0255] Figure 16 This is a diagram illustrating an example of scaling the motion vector of a co-located block to obtain a candidate time motion vector for the current block.
[0256] The motion vector of the co-position vector can be scaled based on at least one of the differences (td) between the POC (Plot Order Count) indicating the display order of the co-positioned screen and the POC of the reference screen of the co-positioned block, and the differences (tb) between the POC of the current screen and the POC of the reference screen of the current block.
[0257] Before scaling, td or tb can be adjusted to exist within a predetermined range. For example, when the predetermined range indicates -128 to 127 and td or tb is less than -128, td or tb can be adjusted to -128. When td or tb is greater than 127, td or tb can be adjusted to 127. When td or tb is within the range of -128 to 127, td or tb is not adjusted.
[0258] The scaling factor DistScaleFactor can be calculated based on td or tb. Here, the scaling factor can be calculated based on the following formula 1.
[0259] [Formula 1]
[0260] DistScaleFactor=(tb*tx+32)>>6
[0261] tx = (16384 + Abs(td / 2)) / td
[0262] In Formula 1, the absolute value function is represented as Abs(), and the output value of this function is the absolute value of the input value.
[0263] The value of the scaling factor DistScaleFactor calculated based on Formula 1 can be adjusted to a predetermined range. For example, DistScaleFactor can be adjusted to exist in the range of -1024 to 1023.
[0264] Candidate time motion vectors for the current block can be determined by scaling the motion vectors of the corresponding blocks using a scaling factor. For example, candidates for time motion vectors for the current block can be determined using the following Equation 2.
[0265] [Formula 2]
[0266] Sign(DistScaleFactor*mvCol)*((Abs(DistScaleFactor*mvCol)+127)>>8)
[0267] In Equation 2, Sign() is a function that outputs the sign information of the value contained in parentheses. For example, Sign(-1) outputs -(negative value). In Equation 2, mvCol indicates the motion vector of the co-position block, i.e., the time motion vector predictor before scaling.
[0268] Next, motion vector candidate lists S1202 and S1303 will be generated based on the obtained motion vector candidates.
[0269] The operations for generating a motion vector candidate list may include adding motion vector candidates to the motion vector candidate list or removing motion vector candidates from the motion vector candidate list, as well as adding combined motion vector candidates to the motion vector candidate list.
[0270] When adding or removing a obtained motion vector candidate from the motion vector candidate list, the encoding and decoding devices may add the obtained motion vector candidates to the motion vector candidate list in the order in which they were obtained.
[0271] The generated list of motion vector candidates can be determined based on the inter-frame prediction direction of the current block. For example, a motion vector candidate list can be generated for each list of reference frames, and a motion vector candidate list can be generated for each reference frame. Multiple lists of reference frames or multiple reference frames can share a single motion vector candidate list.
[0272] In the following embodiments, it is assumed that the motion vector candidate list mvpListLX means the motion vector candidate list corresponding to the reference frame lists L0, L1, L2, and L3. For example, the motion vector candidate list corresponding to the reference frame list L0 may be referred to as mvpListL0.
[0273] The number of motion vector candidates included in the motion vector candidate list can be set to the same preset value in both the encoding and decoding devices. Alternatively, the maximum number of motion vector candidates included in the motion vector candidate list can be transmitted to the decoding device via a bitstream after being encoded in the encoding device.
[0274] For example, the maximum number of motion vector candidates that can be included in the motion vector candidate list `mvpListLX`, `maxNumMVPCandList`, can be a positive integer including zero. For example, `maxNumMVPCandList` can be an integer such as 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, or 16. When `maxNumMVPCandList` is 2, this means that `mvpListLX` can include a maximum of two motion vector candidates. Therefore, the index value of the motion vector candidate first added to `mvpListLX` can be set to zero, and the index value of the motion vector candidate subsequently added can be set to 1. The maximum number of motion vector candidates can be defined for each motion vector candidate list, and can be defined the same for all motion vector candidate lists. For example, the maximum number of motion vector candidates for `mvpListL0` and `mvpListL1` can have different values, and can also have the same value.
[0275] Figure 17 This is a diagram illustrating an example of generating a candidate list of motion vectors.
[0276] Assume that the unscaled spatial motion vector candidate (1,0) is located at Figure 17 The block at position A1 shown in (a) is obtained, and the scaled time motion vector candidate (2,3) is obtained from the block located at position A1. Figure 17 The block at position H shown in (b) is obtained. In this case, as... Figure 17 As shown in (c), spatial motion vector candidates obtained from the block located at position A1 and temporal motion vector candidates obtained from the block located at position H can be added to the motion vector candidate list in sequence.
[0277] The obtained motion vector candidates can be added to the motion vector candidate list in a predetermined order. For example, after adding spatial motion vector candidates to the motion vector candidate list, if the number of motion vector candidates included in the list is less than the maximum number of motion vector candidates, temporal motion vector candidates can be added to the motion vector candidate list. Conversely, temporal motion vector candidates can be added to the motion vector candidate list by having a higher priority than spatial motion vector candidates. In this case, spatial motion vector candidates can be optionally added to the motion vector candidate list depending on whether they are the same as temporal motion vector candidates.
[0278] Furthermore, the encoding and decoding devices can assign indices to identify each motion vector candidate in the order they are added to the motion vector candidate list. Figure 17In (c), the index value of the motion vector candidate obtained from the block located at position A1 is set to 0, and the index value of the motion vector candidate obtained from the block located at position H is set to 1.
[0279] In addition to spatial motion vector candidates and temporal motion vector candidates, motion vectors with predetermined values can be added to the motion vector candidate list. For example, when the number of motion vector candidates included in the motion vector candidate list is less than the maximum number of motion vector candidates, motion vectors with a value of 0 can be added to the motion vector candidate list.
[0280] Figure 18 This is a diagram illustrating an example of adding a motion vector with a predetermined value to a list of motion vector candidates.
[0281] exist Figure 18 In the example shown, "numMVPCandLX" indicates the number of motion vector candidates included in the motion vector candidate list mvpListLX. For example, numMVPCandL0 could indicate the number of motion vector candidates included in the motion vector candidate list mvpListL0.
[0282] In addition, maxNumMVPCand indicates the maximum number of motion vector candidates that can be included in the motion vector candidate list mvpListLX. numMVPCandLX and maxNumMVPCand can have integer values including zero.
[0283] When numMVPCandLX is less than maxNumMVPCand, a motion vector with a predetermined value can be added to the motion vector candidate list, and the value of numMVPCandLX can be incremented by 1. Here, the motion vector added to the motion vector candidate list can have a fixed value and can be added to the end of the motion vector candidate list. For example, a motion vector with a predetermined value added to the motion vector candidate list can be a zero motion vector candidate with the value (0,0).
[0284] For example, such as Figure 18 As shown in (a), when numMVPCandLX is 1 and maxNumMVPCand is 2, a zero motion vector candidate with the value (0,0) can be added to the motion vector candidate list, and the value of numMVPCandLX can be increased by 1.
[0285] When the difference between maxNumMVPCand and numMVPCandLX is equal to or greater than 2, a motion vector with a predetermined value can be repeatedly added to the motion vector candidate list according to the difference.
[0286] For example, when maxNumMVPCand is 2 and numMVPCandLX is 0, motion vector candidates with predetermined values can be repeatedly added to the motion vector candidate list until numMVPCandLX becomes equal to maxNumMVPCand. Figure 18 In (b), two zero motion vector candidates with the value (0,0) are added to the motion vector candidate list.
[0287] As another example, a motion vector with a predetermined value may be added to the motion vector candidate list only if a motion vector candidate equal to the motion vector with the predetermined value is not included in the motion vector candidate list.
[0288] For example, when numMVPCandLX is less than maxNumMVPCand and the motion vector with (0,0) is not included in the motion vector candidate list, such as Figure 18 As shown in (c), a motion vector with the value (0,0) can be added to the motion vector candidate list, and numMVPCandLX can be incremented by 1.
[0289] Figure 18 The predefined value for a motion vector added to the motion vector candidate list is (0,0), but the predefined values for motion vectors added to the motion vector candidate list are not limited to this. Furthermore, as... Figure 18 As shown in (b), when multiple predefined motion vector candidates are added, the multiple predefined motion vectors added to the motion vector candidate list can have different values.
[0290] The encoding and decoding devices can adjust the size of the motion vector candidate list by removing motion vector candidates that are included in the motion vector candidate list.
[0291] For example, the encoding and decoding devices can identify whether identical motion vector candidates exist in the motion vector candidate list. When identical motion vector candidates exist in the motion vector candidate list, residual motion vector candidates, excluding the motion vector candidate with the smallest motion vector candidate index among the identical motion vector candidates, can be removed from the motion vector candidate list.
[0292] The operation of determining whether motion vector candidates are the same can be applied only to the space between spatial motion vectors or the space between temporal motion vectors, and can be applied to the space between both spatial and temporal motion vectors.
[0293] When the number of motion vector candidates included in the motion vector candidate list is greater than the maximum number of motion vector candidates that can be included in the motion vector candidate list, motion vector candidates can be removed from the motion vector candidate list according to the difference between the number of motion vector candidates included in the motion vector candidate list and the maximum number of motion vector candidates.
[0294] Figure 19 This is a diagram illustrating an example of removing motion vector candidates from the motion vector candidate list.
[0295] When numMVPCandLX is equal to or greater than maxNumMVPCand, motion vector candidates with an index value greater than maxNumMVPCand-1 can be removed from the motion vector candidate list.
[0296] For example, in Figure 19 In the example shown, when numMVPCandLX is 3 and maxNumMVPCand is 2, the motion vector candidate (4, -3) with an index of 2 that is assigned greater than maxNumMVPCand-1 is removed from the motion vector candidate list.
[0297] Next, the operation of adding combined motion vector candidates to the motion vector candidate list will be disclosed.
[0298] When the number of motion vector candidates included in the motion vector candidate list is less than the maximum number of motion vector candidates, a combined motion vector can be added to the motion vector candidate list by using at least one of the motion vector candidates included in the list. For example, a combined motion vector candidate is generated by using at least one of the following: a spatial motion vector candidate, a temporal motion vector candidate, and a zero motion vector candidate included in the motion vector candidate list, and the resulting combined motion vector candidate can be included in the motion vector candidate list.
[0299] Alternatively, combined motion vector candidates can be generated by using motion vector candidates that are not included in the motion vector candidate list. For example, combined motion vector candidates can be generated by using motion vector candidates obtained from a block that can be used to obtain at least one of spatial motion vector candidates or temporal motion vector candidates, which are not included in the motion vector candidate list, or by using motion vector candidates with predefined values (e.g., zero motion vectors) that are not included in the motion vector candidate list.
[0300] Alternatively, a combined motion vector candidate can be generated based on at least one of the encoding parameters, or a combined motion vector candidate can be added to the motion vector candidate list based on at least one of the encoding parameters.
[0301] After adding at least one of spatial motion vector candidates, temporal motion vector candidates, or motion vector candidates with preset values, the maximum number of motion vector candidates that can be included in the motion vector candidate list can be increased by the number of combined motion vectors, or increased by a smaller number. For example, maxNumMVPCandList has a first value for spatial motion vector candidates or temporal motion vector candidates, and after adding spatial motion vector candidates or temporal motion vector candidates, maxNumMVPCandList can be increased to a second value greater than the first value in order to add combined motion vector candidates.
[0302] Figure 20 This is a diagram showing an example of a list of motion vector candidates.
[0303] Motion compensation for the current block can be performed using motion vector candidates included in the motion vector candidate list. Motion compensation for the current block can be performed using one motion vector for a list of reference frames or by using multiple motion vectors for a list of reference frames. For example, when the inter-frame prediction direction of the current block is bidirectional, motion compensation for the current block can be performed by obtaining one motion vector for each of the reference frame lists L0 and L1, or by obtaining two motion vectors for reference frame list L0.
[0304] The motion vector candidate list may include combined motion vector candidates generated by combining spatial motion vector candidates, temporal motion vector candidates, and zero motion vector candidates, as well as at least one of combined motion vector candidates generated by combining at least two of the spatial motion vector candidates, temporal motion vector candidates, and zero motion vector candidates. Each motion vector candidate can be identified by a motion vector candidate index.
[0305] Based on the inter-frame prediction direction of the current block, a motion vector candidate set including multiple motion vector candidates can be identified using a motion vector candidate index. Here, the motion vector candidate set can include N motion vector candidates depending on the number N of inter-frame prediction directions of the current block. For example, the motion vector candidate set can include multiple motion vector candidates, such as a first motion vector candidate, a second motion vector candidate, a third motion vector candidate, and a fourth motion vector candidate, etc.
[0306] A set of motion vector candidates can be generated by combining at least two of the following: spatial motion vector candidates, temporal motion vector candidates, and zero motion vector candidates. For example, in Figure 20In this context, motion vector candidate sets, including two motion vector candidates, are assigned motion vector candidate indices 4 to 13. Furthermore, each motion vector candidate set can be generated by combining spatial motion vector candidates (mxLXA, mxLXB), temporal motion vector candidates (mxLCol), and zero motion vector (mvZero).
[0307] Based on the prediction direction for the reference frame list LX, at least one motion vector can be obtained from the reference frame list. For example, when performing unidirectional prediction on the reference frame list LX, the motion vector of the current block can be obtained by using one of a plurality of motion vector candidates assigned to motion vector indices 0 to 3. Conversely, when performing bidirectional prediction on the reference frame list LX, the motion vector of the current block can be obtained by using a plurality of motion vector candidates assigned to motion vector indices 4 to 13. That is, in the encoding / decoding process, at least one motion vector can be obtained based on motion vector candidates included in the motion vector candidate list.
[0308] The motion vector of the current block can be obtained by adding the motion vector difference to the motion vector candidate. For example, Figure 20 The diagram shows that when a motion vector candidate with one of the motion vector candidate indices 0 to 3 is selected, the motion vector difference (MVD) is added to the selected motion vector candidate to obtain the motion vector.
[0309] When a motion vector candidate set including multiple motion vector candidates is selected, multiple motion vectors for the current block can be obtained based on the multiple motion vector candidates included in the motion vector candidate set. Here, the motion vector difference for each of the multiple motion vector candidates included in the motion vector candidate set can be encoded / decoded. In this case, for the current block, multiple motion vectors can be obtained by adding the motion vector difference corresponding to the motion vector candidate.
[0310] As another example, motion vector differences can be encoded / decoded for a subset of motion vector candidates included in a set of multiple motion vector candidates. For instance, a motion vector difference can be encoded / decoded for a set of motion vector candidates that includes multiple motion vector candidates. In this case, the current block can use the motion vector obtained by adding the motion vector difference to a motion vector candidate included in the set of motion vector candidates, and can also use the motion vector obtained from the motion vector candidate. Figure 20 The diagram illustrates a motion vector candidate set comprising two motion vector candidates, where a first motion vector or a second motion vector is obtained by adding the motion vector difference to one of the motion vector candidates, and the remaining one is the same as the motion vector candidate.
[0311] As another example, multiple motion vector candidates included in the motion vector candidate set can share the same motion vector difference.
[0312] Inter-frame prediction indicators can indicate unidirectional or multidirectional prediction for a predetermined list of reference frames. For example, an inter-frame prediction indicator can be specified as PRED_LX indicating unidirectional prediction for a list of reference frames LX, and can be specified as PRED_BI_LX indicating bidirectional prediction for a list of reference frames LX. Here, the index of the list of reference frames can be indicated as X, where X is an integer including 0 (such as 0, 1, 2, 3, etc.).
[0313] For example, when performing unidirectional prediction for reference frame list L0, the inter-frame prediction indicator can be set to PRED_L0. Similarly, when performing unidirectional prediction for reference frame list L1, the inter-frame prediction indicator can be set to PRED_L1.
[0314] Conversely, when performing bidirectional prediction for reference frame list L1, the inter-frame prediction indicator can be set to PRED_BI_L1. When the inter-frame prediction indicator for reference frame list L1 is PRED_BI_L1, the current block uses a motion vector candidate list to obtain two motion vectors, and inter-frame prediction can be performed by obtaining two prediction blocks from reference frames included in reference frame list L1. Here, two prediction blocks can be obtained from two different reference frames included in reference frame list L1, or from one reference frame included in reference frame list L1.
[0315] Inter-frame prediction indicators can be encoded / decoded to indicate the number of prediction directions for the current block, and can be encoded / decoded to indicate the number of prediction directions for each reference frame list.
[0316] For example, the inter-frame prediction indicator (PRED_L0) indicating unidirectional prediction for reference frame list L0 and the inter-frame prediction indicator (PRED_BI_L1) indicating bidirectional prediction for reference frame list L1 can be encoded for the current block. Alternatively, when performing unidirectional prediction for reference frame list L0 and bidirectional prediction for reference frame list L1, the inter-frame prediction indicator for the current block can indicate PRED_TRI.
[0317] Figure 20An example of a motion vector candidate list mvpListLX for a specific reference frame list LX is shown. When multiple reference frame lists such as L0, L1, L2, L3, etc. exist, a motion vector candidate list can be generated for each reference frame list. Therefore, at least one prediction block of up to N prediction blocks can be generated for use in inter-frame prediction or motion compensation of the current block. Here, N indicates an integer equal to or greater than 1, such as 2, 3, 4, 5, 6, 7, 8, etc.
[0318] At least one of the motion vector candidates included in the motion vector candidate list can be determined as a predicted motion vector (or motion vector predictor) for the current block. The determined predicted motion vector can be used in the operation of calculating the motion vector of the current block, and the motion vector of the current block can be used in the inter-frame prediction or motion compensation of the current block.
[0319] In the current block, when a motion vector candidate set including multiple motion vector candidates is selected, the multiple motion vector candidates included in the motion vector candidate set, as well as the motion vectors of the current block calculated based on the multiple motion vector candidates, can be stored as information about the motion compensation of the current block. Here, the stored information about the motion compensation of the current block can be used later when generating a list of motion vector candidates or when performing motion compensation in neighboring blocks.
[0320] Figure 20 An example of generating a motion vector candidate list for each reference frame list is shown. A motion vector candidate list can be generated for each reference frame. For example, when performing bidirectional prediction for a reference frame list LX, a first motion vector candidate list can be generated for a first reference frame used in the bidirectional prediction, and a second motion vector candidate list can be generated for a second reference frame used in the bidirectional prediction, among the reference frames included in the reference frame list LX.
[0321] Next, operations S1203 and S1304 will be disclosed to determine the motion vectors predicted from the list of motion vector candidates.
[0322] Among the motion vector candidates included in the motion vector candidate list, the motion vector candidate indicated by the motion vector candidate index can be determined as the motion vector for the current block.
[0323] Figure 21 This is a diagram illustrating an example of obtaining predicted motion vector candidates for the current block from a list of motion vector candidates.
[0324] Figure 21The maximum number of motion vector candidates that can be included in the motion vector candidate list is maxNumMVPC, which is 2, and the number of motion vector candidates included in the motion vector candidate list is 2. Here, when the motion vector candidate index indicates index 1, the second motion vector candidate with (2,3) included in the motion vector candidate list (i.e., the motion vector candidate assigned index 1) can be determined as the predicted motion vector of the current block.
[0325] The encoding device obtains the motion vector difference by calculating the difference between the motion vector and the predicted motion vector. The decoding device obtains the motion vector by adding the predicted motion vector to the motion vector difference.
[0326] Although not shown, the motion vector candidate index indicates a set of motion vector candidates from which multiple motion vectors can be obtained. Here, the motion vector of the current block can be the sum of the motion vector candidates and the motion vector differences, and can have the same value as the motion vector candidates.
[0327] Next, operations S1204 and S1305 for performing motion compensation using motion vectors will be disclosed.
[0328] Encoding and decoding devices can calculate motion vectors using predicted motion vectors and motion vector differences. Once the motion vectors are calculated, inter-frame prediction or motion compensation can be performed using them. Alternatively, as... Figure 20 As shown, the predicted value of the motion vector can be determined as the motion vector.
[0329] Depending on the prediction direction, the current block may have at least one motion vector from at most N motion vectors. The final prediction block of the current block can be obtained by using the motion vectors to generate at least one prediction block from at most N prediction blocks.
[0330] For example, when the current block has a motion vector, the predicted block generated by using the motion vector can be determined as the final predicted block for the current block.
[0331] Conversely, when the current block has multiple motion vectors, multiple prediction blocks can be generated using these multiple motion vectors, and the final prediction block for the current block can be determined based on a weighted sum of the multiple prediction blocks. Reference frames that include multiple prediction blocks indicated by multiple motion vectors can be included in different reference frame lists, or they can be included in the same reference frame list.
[0332] The weights applied to each prediction block can have the same value in the order of 1 / N (where N is the number of prediction blocks generated). For example, when two prediction blocks are generated, the weight applied to each prediction block can be 1 / 2. When three prediction blocks are generated, the weight applied to each prediction block can be 1 / 3. When four prediction blocks are generated, the weight applied to each prediction block can be 1 / 4. Alternatively, the final prediction block for the current block can be determined by assigning different weights to each prediction block.
[0333] The weights do not need to have fixed values for each prediction block; instead, they can have variable values for each prediction block. Here, the weights applied to each prediction block can be the same or different. To apply variable weights, one or more weight information for the current block can be transmitted via a bitstream using signaling. Weight information can be transmitted for each prediction block, and weight information can also be transmitted for each reference frame. Multiple prediction blocks can share a single weight information.
[0334] Equations 3 through 5 described below indicate examples of generating the final predicted block for the current block when the inter-frame prediction indicator for the current block is PRED_BI, PRED_TRI, and PRED_QUAD, and the prediction direction for each reference frame list is unidirectional.
[0335] [Formula 3]
[0336] P_BI=(WF_L0*P_L0+OFFSET_L0+WF_L1*P_L1+OFFSET_L1+RF)>>1
[0337] [Formula 4]
[0338] P_TRI=(WF_L0*P_L0+OFFSET_L0+WF_L1*P_L1+OFFSET_L1+
[0339] WF_L2*P_L2+OFFSET_2+RF) / 3
[0340] [Formula 5]
[0341] P+QUAD=(WF_L0*P_L0+OFFSET_L0+WF_L1*P_L1+OFFSET_L1+
[0342] WF_L2*P_L2+OFFSET_2+WF_L3*P_L3+OFFSET_3+RF)>>2
[0343] In Formulas 3 through 5, P_BI, P_TRI, and P_QUAD indicate the final predicted block of the current block, and LX (X = 0, 1, 2, 3) can be interpreted as a list of reference frames. WF_LX indicates the weight value of the predicted block generated using LX, and OFFSET_LX indicates the offset value of the predicted block generated using LX. P_LX can be interpreted as the predicted block generated using the motion vector of the current block relative to LX. RF can be interpreted as a rounding factor that can be set to zero, an integer, or a negative number.
[0344] Even when there are multiple prediction directions for a predetermined list of reference screens, the final prediction block for the current block can be obtained based on the weighted sum of the prediction blocks. Here, the weights applied to prediction blocks obtained from the same list of reference screens can have the same value or different values.
[0345] At least one of the weights (WF_LX) and offsets (OFFSET_LX) for multiple prediction blocks can be encoded parameters that are entropy encoded / decoded. As another example, the weights and offsets can be obtained from encoded / decoded neighboring blocks adjacent to the current block. Here, the neighboring blocks adjacent to the current block can include at least one block used to obtain spatial motion vector candidates for the current block or a block used to obtain temporal motion vector candidates for the current block.
[0346] As another example, the weights and offsets can be determined based on the display order (POC) of the current frame and the reference frame. In this case, the weights or offsets can be set to smaller values when the current frame is far from the reference frame, and larger values when the current frame is close to the reference frame. For example, when the POC difference between the current frame and the L0 reference frame is 2, the weight value applied to the prediction block generated by referring to the L0 reference frame can be set to 1 / 3. Conversely, when the POC difference between the current frame and the L0 reference frame is 1, the weight value applied to the prediction block generated by referring to the L0 reference frame can be set to 2 / 3. As mentioned above, the weights or offsets can be inversely proportional to the difference in display order between the current frame and the reference frame. As another example, the weights or offsets can be directly proportional to the difference in display order between the current frame and the reference frame.
[0347] As another example, at least one of the weights or offset values can be entropy-encoded / decoded based on at least one of the encoding parameters. Furthermore, a weighted sum of the prediction blocks can be calculated based on at least one of the encoding parameters.
[0348] Next, the entropy encoding / decoding processes S1205 and S1301 for motion compensation information will be disclosed in detail.
[0349] Figure 22a and Figure 22b This is a diagram illustrating an example of the syntax used for information about motion compensation.
[0350] The encoding device can entropy encode information about motion compensation in the bitstream, and the decoding device can entropy decode the information about motion compensation included in the bitstream. Here, the entropy-encoded / decoded information about motion compensation may include at least one of the following: inter-frame prediction indicator (inter_pred_idc), reference frame index (ref_idx_l0, ref_idx_l1, ref_idx_l2, ref_idx_l3), motion vector candidate index (mvp_l0_idx, mvp_l1_idx, mvp_l2_idx, mvp_l3_idx), motion vector difference, weight value (wf_l0, wf_l1, wf_l2, wf_l3), and offset value (offset_l0, offset_l1, offset_l2, offset_l3).
[0351] The inter-frame prediction indicator can be interpreted as the direction of inter-frame prediction for the current block when it is encoded / decoded via inter-frame prediction. For example, the inter-frame prediction indicator can indicate unidirectional or multi-directional prediction (such as bidirectional, tridirectional, or quaddirectional prediction). The inter-frame prediction indicator can also be interpreted as the number of reference frames used when generating prediction blocks for the current block. Optionally, a reference frame can be used for multi-directional prediction. In this case, M reference frames can be used to perform N-directional prediction (N > M). The inter-frame prediction indicator can also be interpreted as the number of prediction blocks used when performing inter-frame prediction or motion compensation for the current block.
[0352] As described above, based on the inter-frame prediction indicator, the number of reference frames used when generating the prediction block for the current block, the number of prediction blocks used when performing inter-frame prediction or motion compensation for the current block, or the number of reference frame lists available for the current block can be determined. Here, the number N of the reference frame list is a positive integer, such as 1, 2, 3, 4, or a larger value. For example, the reference frame list may include L0, L1, L2, and L3, etc. Motion compensation can be performed on the current block by using at least one reference frame list.
[0353] For example, the current block can generate at least one prediction block by using at least one list of reference frames, thereby enabling motion compensation for the current block. For instance, one or more prediction blocks can be generated using reference frame list L0 to perform motion compensation, or one or more prediction blocks can be generated using reference frame lists L0 and L1. Alternatively, one or more prediction blocks, or up to N prediction blocks (where N is a positive integer equal to or greater than 2 or 3), can be generated using reference frame lists L0, L1, and L2 to perform motion compensation. Alternatively, one or more prediction blocks, or up to N prediction blocks (where N is a positive integer equal to or greater than 2 or 4), can be generated using reference frame lists L0, L1, L2, and L3 to perform motion compensation for the current block.
[0354] The reference screen indicator can indicate one-way (PRED_LX), two-way (PRED_BI), three-way (PRED_TRI), four-way (PRED_QUAD) or more directions depending on the number of predicted directions for the current block.
[0355] For example, when performing unidirectional prediction for each reference frame list, the inter-frame prediction indicator PRED_LX can mean generating one prediction block using the reference frame list LX (where X is an integer, such as 0, 1, 2, or 3), and performing inter-frame prediction or motion compensation using the generated prediction block. The inter-frame prediction indicator PRED_BI can mean generating two prediction blocks using reference frame lists L0 and L1, and performing inter-frame prediction or motion compensation using the generated two prediction blocks. The inter-frame prediction indicator PRED_TRI can mean generating three prediction blocks using reference frame lists L0, L1, and L2, and performing inter-frame prediction or motion compensation using the generated three prediction blocks. The inter-frame prediction indicator PRED_QUAD can mean generating four prediction blocks using reference frame lists L0, L1, L2, and L3, and performing inter-frame prediction or motion compensation using the generated four prediction blocks. In other words, the total number of prediction blocks used when performing inter-frame prediction for the current block can be set as the inter-frame prediction indicator.
[0356] When performing multi-directional prediction for a reference frame list, the inter-frame prediction indicator PRED_BI can mean performing bi-directional prediction for reference frame list L0. The inter-frame prediction indicator PRED_TRI can mean: performing tri-directional prediction for reference frame list L0; performing uni-directional prediction for reference frame list L0 and bi-directional prediction for reference frame list L1; or performing bi-directional prediction for reference frame list L0 and uni-directional prediction for reference frame list L1.
[0357] As described above, an inter-frame prediction indicator may be intended to generate at least one to at most N prediction blocks from at least one list of reference frames in order to perform motion compensation (here, N is the number of prediction directions indicated by the inter-frame prediction indicator). Alternatively, an inter-frame prediction indicator may be intended to generate at least one to at most N prediction blocks from N reference frames and perform motion compensation for the current block by using the generated prediction blocks.
[0358] For example, the inter-frame prediction indicator PRED_TRI can mean generating three prediction blocks using at least one of the reference frame lists L0, L1, L2, and L3 to perform inter-frame prediction or motion compensation for the current block. Alternatively, the inter-frame prediction indicator PRED_TRI can mean generating three prediction blocks using at least three of the reference frame lists L0, L1, L2, and L3 to perform inter-frame prediction or motion compensation for the current block. Furthermore, the inter-frame prediction indicator PRED_QUAD can mean generating four prediction blocks using at least one of the reference frame lists L0, L1, L2, and L3 to perform inter-frame prediction or motion compensation for the current block. Alternatively, the inter-frame prediction indicator PRED_QUAD can mean generating four prediction blocks using at least four of the reference frame lists L0, L1, L2, and L3 to perform inter-frame prediction or motion compensation for the current block.
[0359] Available inter-frame prediction directions can be determined based on inter-frame prediction indicators, and all or some of the available inter-frame prediction directions can be selectively used based on the size and / or shape of the current block.
[0360] The number of reference frames included in each reference frame list can be predefined, or it can be sent to the decoding device by entropy encoding in the encoding device. For example, the syntax element "num_ref_idx_lX_active_minus1" (where X indicates the index of the reference frame list, such as 0, 1, 2, 3, etc.) can indicate the number of reference frames for a reference frame list (such as L0, L1, L2, or L3).
[0361] A reference frame index specifies the reference frame referenced by the current block in each reference frame list. At least one reference frame index can be entropy-encoded / decoded for each reference frame list. Motion compensation can be performed on the current block using at least one reference frame index.
[0362] When N reference frames are selected by indexing N reference frames, motion compensation for the current frame can be performed by generating at least one to N (or more than N) prediction blocks.
[0363] The motion vector candidate index indicates a motion vector candidate for the current block from the motion vector candidate list generated for each reference frame list or for each reference frame index. At least one motion vector candidate index for each motion vector candidate list can be entropy encoded / decoded. Motion compensation can be performed on the current block using at least one motion vector candidate index.
[0364] For example, based on N motion vector candidate indices, motion compensation for the current block can be performed by generating at least one to N (or more than N) prediction blocks.
[0365] Motion vector difference indicates the difference between a motion vector and a predicted motion vector. For each list of motion vector candidates generated for the current block, or for each list of reference frames or each reference frame index, at least one motion vector difference can be entropy encoded / decoded. Motion compensation can be performed on the current block using at least one motion vector difference.
[0366] For example, motion compensation can be performed on the current block by generating at least one to at most N (or more than N) prediction blocks via N motion vector differences.
[0367] When two or more prediction blocks are generated during motion compensation for the current block, a final prediction block for the current block can be generated by weighted summation for each prediction block. When calculating the weighted sum, at least one of weights and offsets can be applied to each prediction block. The weighting factors (such as weights or offsets) used in calculating the weighted sum can be encoded / decoded for at least one of a list of reference frames, reference frames, motion vector candidate indices, motion vector differences, or motion vectors.
[0368] The weighted sum factor can be obtained by specifying index information from one of a predefined set of encoding and decoding devices. In this case, the index information used to specify at least one of the weights and offsets can be entropy encoded / decoded.
[0369] Information related to weighting and factors can be entropy encoded / decoded at block units and at higher levels. For example, weights or offsets can be entropy encoded / decoded at block units (such as CTU, CU, or PU) or at higher levels (such as video parameter sets, sequence parameter sets, picture parameter sets, adaptive parameter sets, or strip headers).
[0370] The weighted sum factor can be entropy-encoded / decoded based on the weighted sum factor difference, which indicates the difference between the weighted sum factor and the weighted sum factor prediction. For example, the weight prediction and the weight difference can be entropy-encoded / decoded, or the offset prediction and the offset difference can be entropy-encoded / decoded. Here, the weight difference can indicate the difference between the weight and the weight prediction, and the offset difference can indicate the difference between the offset and the offset prediction.
[0371] Here, the weighted sum and factor difference can be entropy encoded / decoded according to block units, and the weighted sum and factor prediction can be entropy encoded / decoded at a higher level. When the weighted sum prediction (such as weighted prediction or offset prediction) is entropy encoded / decoded according to frame or strip unit, blocks included in a frame or strip can use a common weighted sum and factor prediction.
[0372] Weighted sum and factor predictions can be obtained from specific regions within an image, strip, or parallel block, or from specific regions within a CTU or CU. For example, weight values or offset values of specific regions within an image, strip, parallel block, CTU, or CU can be used as weighted or offset predictions. In this case, entropy encoding / decoding of the weighted sum and factor predictions can be omitted, and entropy encoding / decoding of only the weighted sum and factor differences can be performed.
[0373] Alternatively, weighted factor predictions can be obtained from encoded / decoded neighboring blocks adjacent to the current block. For example, the weight or offset values of encoded / decoded neighboring blocks adjacent to the current block can be set as the weight prediction or offset prediction value of the current block. Here, the neighboring blocks of the current block may include at least one of the following blocks: blocks used when obtaining spatial motion vector candidates and blocks used when obtaining temporal motion vector candidates.
[0374] When using weighted predicted values and weighted differences, the decoding device can calculate the weight value for the prediction block by adding the weighted predicted values and weighted differences. Similarly, when using offset predicted values and offset differences, the decoding device can calculate the offset value for the prediction block by adding the offset predicted values and offset differences.
[0375] Instead of entropy encoding / decoding the information about the weighting factor of the current block, the weighting factor of the neighboring blocks adjacent to the current block, which has been encoded / decoded, can be used as the weighting factor of the current block. For example, the weight or offset of the current block can be set to have the same value as the weight or offset of the neighboring blocks adjacent to the current block, which has been encoded / decoded.
[0376] At least one piece of information about motion compensation can be obtained by entropy encoding / decoding via a bitstream using encoding parameters, or at least one piece of information about motion compensation can be obtained by using at least one encoding parameter.
[0377] When entropy encoding / decoding information about motion compensation, binarization methods can be used, such as truncated Rice binarization, K-order Exp_Golomb binarization, finite K-order Exp_Golomb binarization, fixed-length binarization, univariate binarization, or truncated univariate binarization.
[0378] When entropy encoding / decoding information about motion compensation, the context model can be determined by using at least one of the following: information about the motion compensation of neighboring blocks adjacent to the current block, information about previously encoded / decoded motion compensation, information about the depth of the current block, and information about the size of the current block.
[0379] Furthermore, when entropy encoding / decoding information about motion compensation, entropy encoding / decoding can be performed by using at least one of the following: information about motion compensation of neighboring blocks, information about previously encoded / decoded motion compensation, information about the depth of the current block, and information about the size of the current block, as a prediction value for the motion compensation information about the current block.
[0380] Inter-frame coding / decoding processing can be performed for each of the luma and chroma signals. For example, in inter-frame coding / decoding processing, at least one method of obtaining an inter-frame prediction indicator, generating a list of motion vector candidates, obtaining motion vectors, and performing motion compensation can be applied differently to the luma and chroma signals.
[0381] Inter-frame coding / decoding can be performed on both luma and chroma signals. For example, in inter-frame coding / decoding for luma signals, at least one of the following can be applied to chroma signals: inter-frame prediction indicator, motion vector candidate list, motion vector candidate, motion vector, and reference frame.
[0382] The methods described can be executed in the encoder and decoder in the same manner. For example, in inter-frame coding / decoding processing, at least one method for obtaining inter-frame prediction indicators, obtaining a list of motion vector candidates, obtaining motion vectors, and performing motion compensation can be applied equivalently in the encoder and decoder. Furthermore, the order in which the methods are applied may differ in the encoder and decoder.
[0383] Embodiments of the present invention can be applied according to the size of at least one of the coding block, prediction block, block, and unit. Here, the size can be defined as a minimum size and / or a maximum size for applying the embodiment, and can be defined as a fixed size for applying the embodiment. Furthermore, a first embodiment can be applied according to a first size, and a second embodiment can be applied according to a second size. That is, the embodiment can be applied multiple times according to the size. In addition, embodiments of the present invention can be applied only when the size is equal to or greater than the minimum size and equal to or less than the maximum size. That is, the embodiment can be applied only when the block size is within a predetermined range.
[0384] For example, the embodiment can be applied only when the size of the encoded / decoded target block is equal to or greater than 8×8. For example, the embodiment can be applied only when the size of the encoded / decoded target block is equal to or greater than 16×16. For example, the embodiment can be applied only when the size of the encoded / decoded target block is equal to or greater than 32×32. For example, the embodiment can be applied only when the size of the encoded / decoded target block is equal to or greater than 64×64. For example, the embodiment can be applied only when the size of the encoded / decoded target block is equal to or greater than 128×128. For example, the embodiment can be applied only when the size of the encoded / decoded target block is 4×4. For example, the embodiment can be applied only when the size of the encoded / decoded target block is equal to or less than 8×8. For example, the embodiment can be applied only when the size of the encoded / decoded target block is equal to or greater than 16×16. For example, the embodiment can be applied only when the size of the encoded / decoded target block is equal to or greater than 8×8 and equal to or less than 16×16. For example, the embodiment can only be applied if the size of the encoded / decoded target block is equal to or greater than 16×16 and equal to or less than 64×64.
[0385] Embodiments of the present invention can be applied according to time layers. Identifiers for indicating the time layers to which embodiments can be applied can be transmitted by signals, and embodiments can be applied to the time layers indicated by the identifiers. Here, the identifiers can be defined as indicating the minimum and / or maximum layers to which embodiments can be applied, and can be defined as indicating a specific layer to which embodiments can be applied.
[0386] For example, the embodiment can only be applied when the time layer of the current frame is the lowest layer. For example, the embodiment can only be applied when the time layer identifier of the current frame is 0. For example, the embodiment can only be applied when the time layer identifier of the current frame is equal to or greater than 1. For example, the embodiment can only be applied when the time layer of the current frame is the highest layer.
[0387] As described in the embodiments of the present invention, the reference screen set used in the process of reference screen list construction and reference screen list modification may use at least one of reference screen lists L0, L1, L2 and L3.
[0388] According to an embodiment of the present invention, when the deblocking filter calculates the boundary strength, at least one to at most N motion vectors of the encoded / decoded target block can be used. Here, N indicates a positive integer equal to or greater than 1, such as 2, 3, 4, etc.
[0389] In motion vector prediction, embodiments of the present invention can be applied when the motion vector has at least one of the following units: 16-pixel (16-pel) unit, 8-pixel (8-pel) unit, 4-pixel (4-pel) unit, integer-pixel (integer-pel) unit, 1 / 2-pixel (1 / 2-pel) unit, 1 / 4-pixel (1 / 4-pel) unit, 1 / 8-pixel (1 / 8-pel) unit, 1 / 16-pixel (1 / 16-pel) unit, 1 / 32-pixel (1 / 32-pel) unit, and 1 / 64-pixel (1 / 64-pel) unit. Furthermore, when performing motion vector prediction, the motion vector can be optionally used for each pixel unit.
[0390] The strip type for applying embodiments of the present invention can be defined, and embodiments of the present invention can be applied according to the strip type.
[0391] For example, when the stripe type is T (three-way prediction)-strip, the prediction block can be generated using at least three motion vectors, and can be used as the final prediction block for the encoding / decoding target block by calculating the weighted sum of at least three prediction blocks. Similarly, when the stripe type is Q (four-way prediction)-strip, the prediction block can be generated using at least four motion vectors, and can be used as the final prediction block for the encoding / decoding target block by calculating the weighted sum of at least four prediction blocks.
[0392] The embodiments of the present invention can be applied to inter-frame prediction and motion compensation methods that use motion vector prediction, as well as inter-frame prediction and motion compensation methods that use skip mode, merge mode, etc.
[0393] The shape of the block to which the present invention is applied can be square or non-square.
[0394] In the above embodiments, the method is described based on a flowchart having a series of steps or units. However, the present invention is not limited to the order of the steps; rather, some steps may be performed simultaneously with other steps, or may be performed with other steps in a different order. Furthermore, those skilled in the art should understand that the steps in the flowchart are not mutually exclusive, and other steps may be added to the flowchart, or some steps may be deleted from the flowchart, without affecting the scope of the present invention.
[0395] The embodiments include various aspects of the examples. All possible combinations of these aspects may not be described, but those skilled in the art will recognize the different combinations. Therefore, the invention may include all substitutions, modifications, and alterations within the scope of the claims.
[0396] Embodiments of the present invention can be implemented in the form of program instructions, which can be executed by various computer components and recorded on a computer-readable recording medium. The computer-readable recording medium may include individual program instructions, data files, data structures, etc., or combinations thereof. The program instructions recorded in the computer-readable recording medium may be specially designed and constructed for the present invention, or may be known to those skilled in the art of computer software. Examples of computer-readable recording media include: magnetic recording media (such as hard disks, floppy disks, and magnetic tapes); optical data storage media (such as CD-ROMs or DVD-ROMs); magneto-optical media (such as floppy disks); and hardware devices (such as read-only memory (ROM), random access memory (RAM), flash memory, etc.) specially constructed for storing and implementing program instructions. Examples of program instructions include not only machine language code generated by a compiler, but also high-level language code that can be implemented by a computer using an interpreter. The hardware device may be configured to operate by one or more software modules to perform the processing according to the present invention, or vice versa.
[0397] Although the invention has been described with reference to specific terminology (such as detailed elements) and limited embodiments and drawings, these are provided only to aid in a more general understanding of the invention, and the invention is not limited to the embodiments described above. Those skilled in the art will understand that various modifications and changes can be made from the above description.
[0398] Therefore, the spirit of the present invention should not be limited to the above embodiments, and the full scope of the appended claims and their equivalents shall fall within the scope and spirit of the present invention.
[0399] Industrial availability
[0400] This invention can be used in devices for encoding / decoding images.
Claims
1. A method for decoding an image, the method comprising: Determine whether inter-frame prediction for the current block is performed unidirectionally or bidirectionally; In response to the bidirectional execution of inter-frame prediction for the current block, the first and second motion vectors for the current block are derived. The first predicted block for the current block is obtained based on the first motion vector; A second predicted block is obtained for the current block based on the second motion vector; The final prediction block for the current block is obtained by weighting the first and second prediction blocks. as well as Reconstruct the current block based on the final predicted block. The weighted sum is performed by applying a first weight to a first prediction block and a second weight to a second prediction block. The set of the first weight and the second weight is determined based on a single index information explicitly transmitted via a signal through a bitstream, and Specifically, the index information is explicitly sent via signaling via bitstream only when inter-frame prediction for the current block is performed bidirectionally.
2. The method as described in claim 1, wherein, Based on the index information, the first weight and the second weight are determined differently.
3. The method as described in claim 1, wherein, The index information specifies one of the weight candidates included in the predefined weight set.
4. The method of claim 3, wherein, The index information is decoded from the bitstream only when the size of the current block is equal to or greater than a predetermined value.
5. The method of claim 1, wherein, The index information is binarized using the truncated Rice binarization method.
6. A method for encoding an image, the method comprising: The first predicted block for the current block is obtained based on the first motion vector; A second predicted block is obtained for the current block based on the second motion vector; The final prediction block for the current block is obtained by weighting the first and second prediction blocks. The residual block of the current block is obtained by subtracting the final predicted block from the original block; as well as The inter-frame prediction indicator, which indicates whether inter-frame prediction for the current block is performed unidirectionally or bidirectionally, is encoded. The weighted sum is performed by applying a first weight to the first prediction block and a second weight to the second prediction block. Specifically, a single index of the set specifying the first and second weights is explicitly encoded into the bitstream, and Specifically, the index information is explicitly encoded into the bitstream only when the inter-frame prediction indicator is encoded using a value that indicates bidirectional inter-frame prediction.
7. The method of claim 6, wherein, The first weight and the second weight were determined differently.
8. The method of claim 6, wherein, The index information specifies one of the weight candidates included in the predefined weight set.
9. The method of claim 6, wherein, The index information is encoded using the truncated Rice binarization method.
10. A method for storing a bit stream, characterized in that: The bitstream is generated by performing the image encoding method according to claim 6; and The bitstream is stored in a computer-readable recording medium.