Decoding apparatus, encoding apparatus, and data transmitting apparatus

By using a candidate list of motion information inherited from affine candidates for inter-frame prediction in video coding, the problem of low video coding efficiency for high-resolution and high-quality images is solved, and coding efficiency and prediction performance are improved.

CN116708819BActive Publication Date: 2026-06-16TCL KING ELECTRICAL APPLIANCES HUIZHOU

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
TCL KING ELECTRICAL APPLIANCES HUIZHOU
Filing Date
2019-04-24
Publication Date
2026-06-16

AI Technical Summary

Technical Problem

Video encoding of high-resolution and high-quality images is inefficient, leading to increased transmission and storage costs.

Method used

A motion information candidate list is generated by using inherited affine candidates, and the inherited affine candidates are derived based on spatial neighbor blocks to generate the motion information candidate list and perform inter-frame prediction.

🎯Benefits of technology

It improves the overall efficiency of video coding and inter-frame prediction performance.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN116708819B_ABST
    Figure CN116708819B_ABST
Patent Text Reader

Abstract

The present application relates to a decoding apparatus, an encoding apparatus and a data transmitting apparatus. The present application relates to a method of video encoding performed by a decoding apparatus, comprising the steps of: generating a motion information candidate list of a current block; selecting one of the candidates included in the motion information candidate list; deriving a control point motion vector (CPMV) of the current block based on the selected candidate; deriving a sub-block unit motion vector or a sample unit motion vector of the current block based on the CPMV; deriving a prediction block based on the motion vector; and reconstructing a current picture based on the prediction block, wherein the motion information candidate list comprises an inherited affine candidate, the inherited affine candidate is derived based on a candidate block encoded by affine prediction among spatial neighboring blocks of the current block, and the inherited affine candidate is generated until a predetermined maximum number.
Need to check novelty before this filing date? Find Prior Art

Description

[0001] This application is a divisional application of the original invention patent application No. 201980036473.8 (International Application No.: PCT / KR2019 / 004957, Application Date: April 24, 2019, Invention Title: Method and Apparatus for Inter-Frame Prediction in Video Coding System). Technical Field

[0002] This disclosure relates to video coding techniques, and more specifically, to inter-frame prediction methods and apparatus using inherited affine candidates in video processing systems. Background Technology

[0003] The demand for high-resolution, high-quality images, such as high-definition (HD) and ultra-high-definition (UHD) images, is increasing across various sectors. Because image data is high-resolution and high-quality, the amount of information or bits that needs to be transmitted increases compared to traditional image data. Therefore, transmission and storage costs increase when using media such as traditional wired / wireless broadband lines to transmit image data or when storing image data using existing storage media.

[0004] Therefore, there is a need for an efficient image compression technology for effectively sending, storing, and reproducing information from high-resolution and high-quality images. Summary of the Invention

[0005] Technical issues

[0006] One object of this disclosure is to provide a method and apparatus for enhancing video coding efficiency.

[0007] Another object of this disclosure is to provide an inter-frame prediction method and apparatus in a video coding system.

[0008] Another object of this disclosure is to provide a method and apparatus for deriving a candidate list of motion information including inherited affine candidates.

[0009] Another object of this disclosure is to provide a method and apparatus for deriving inheritance affine candidates based on spatially adjacent blocks.

[0010] Another object of this disclosure is to provide a method and apparatus for grouping spatially adjacent blocks.

[0011] Another object of this disclosure is to provide a method and apparatus for deriving affine candidates based on groups.

[0012] Technical solution

[0013] An exemplary embodiment of this disclosure provides a video decoding method performed by a decoding device. The decoding method includes: generating a motion information candidate list for a current block; selecting one of the candidates included in the motion information candidate list; deriving a control point motion vector (CPMV) for the current block based on the selected candidate; deriving sub-block unit motion vectors or sample unit motion vectors for the current block based on the CPMV; deriving a prediction block based on the sub-block unit motion vectors or sample unit motion vectors; and reconstructing the current image based on the prediction block. The motion information candidate list includes inherited affine candidates; inherited affine candidates are derived based on candidate blocks encoded by affine prediction among the spatially neighboring blocks of the current block; and inherited affine candidates are generated up to a predetermined maximum number.

[0014] Another exemplary embodiment of this disclosure provides a video encoding method performed by an encoding apparatus. The encoding method includes: generating a motion information candidate list for a current block; selecting one of the candidates included in the motion information candidate list; deriving a control point motion vector (CPMV) for the current block based on the selected candidate; deriving a sub-block unit motion vector or a sample unit motion vector for the current block based on the CPMV; deriving a prediction block based on the sub-block unit motion vector or the sample unit motion vector; generating a residual block for the current block based on the prediction block; and outputting a bitstream by encoding image information including information about the residual block, wherein the motion information candidate list includes inherited affine candidates; inherited affine candidates are derived based on candidate blocks encoded by affine prediction among spatially neighboring blocks of the current block; and inherited affine candidates are generated up to a predetermined maximum number.

[0015] Another exemplary embodiment of this disclosure provides a decoding apparatus for performing video decoding. The decoding apparatus includes: a predictor configured to generate a motion information candidate list for a current block, select one of the candidates included in the motion information candidate list, derive a control point motion vector (CPMV) for the current block based on the selected candidate, derive sub-block unit motion vectors or sample unit motion vectors for the current block based on the CPMV, and derive a predicted block based on the motion vectors; and a reconstructor configured to reconstruct the current image based on the predicted block, wherein the motion information candidate list includes inherited affine candidates, inherited affine candidates are derived based on candidate blocks encoded by affine prediction among the spatially neighboring blocks of the current block, and inherited affine candidates are generated up to a predetermined maximum number.

[0016] Another exemplary embodiment of this disclosure provides an encoding apparatus for performing video encoding. The encoding apparatus includes: a predictor for generating a motion information candidate list for a current block, selecting one of the candidates included in the motion information candidate list, deriving a control point motion vector (CPMV) for the current block based on the selected candidate, deriving sub-block unit motion vectors or sample unit motion vectors for the current block based on the CPMV, and deriving a prediction block based on the sub-block unit motion vectors or sample unit motion vectors; a subtractor for generating a residual block for the current block based on the prediction block; and an entropy encoder for outputting a bitstream by encoding image information including information about the residual block, wherein the motion information candidate list includes inherited affine candidates, inherited affine candidates are derived based on candidate blocks encoded by affine prediction in spatially neighboring blocks of the current block, and inherited affine candidates are generated up to a predetermined maximum number.

[0017] Technical effect

[0018] This disclosure can improve overall coding efficiency by performing inter-frame prediction through inheritance of affine candidates.

[0019] This disclosure allows for the configuration of a candidate list of motion information that includes inherited affine candidates, thereby improving the performance and efficiency of inter-frame prediction. Attached Figure Description

[0020] Figure 1 This is a block diagram schematically illustrating a video encoding apparatus according to an exemplary embodiment of the present disclosure.

[0021] Figure 2 This is a block diagram schematically illustrating a video decoding apparatus according to an exemplary embodiment of the present disclosure.

[0022] Figure 3 An exemplary illustration is shown of a content streaming system according to an exemplary embodiment of the present disclosure.

[0023] Figure 4 An affine motion model according to an exemplary embodiment of the present disclosure is illustrated.

[0024] Figure 5a and Figure 5b Exemplary illustrations show 4-parameter affine models and 6-parameter affine models according to exemplary embodiments of the present disclosure.

[0025] Figure 6 The illustration exemplarily depicts the situation where an affine motion vector field according to an exemplary embodiment of the present disclosure is determined on a sub-block basis.

[0026] Figure 7 An exemplary flowchart of an affine motion prediction method according to an exemplary embodiment of the present disclosure is illustrated.

[0027] Figure 8 An exemplary illustration shows the position of a neighboring block for checking a neighboring affine block according to an exemplary embodiment of the present disclosure.

[0028] Figure 9 An exemplary illustration shows the use of two groups to examine the condition of adjacent affine blocks according to an exemplary embodiment of this disclosure.

[0029] Figure 10 An exemplary illustration shows the use of three groups to examine the condition of adjacent affine blocks according to an exemplary embodiment of this disclosure.

[0030] Figure 11 The illustration schematically depicts a video encoding method using an encoding apparatus according to an exemplary embodiment of the present disclosure.

[0031] Figure 12 The illustration schematically depicts a video decoding method using a decoding apparatus according to an exemplary embodiment of the present disclosure. Detailed Implementation

[0032] Because this disclosure can be modified in various forms and can have various exemplary embodiments, specific exemplary embodiments will be described in detail and illustrated in the accompanying drawings. However, this is not intended to limit this disclosure to the specific exemplary embodiments. The terminology used in this specification is used only to describe specific exemplary embodiments and is not intended to limit the technical spirit of this disclosure. Unless the context clearly indicates otherwise, singular expressions include plural expressions. In this specification, terms such as “comprising” and “having” are intended to indicate the presence of features, numbers, steps, operations, elements, components, or combinations thereof used in this specification, and therefore it should be understood that the possibility of having or adding one or more different features, numbers, steps, operations, elements, components, or combinations thereof is not excluded.

[0033] Furthermore, for ease of explanation of the different specific functions in the video encoding / decoding apparatus, the various elements in the accompanying drawings described in this disclosure are drawn independently, but this does not imply that each element is implemented by independent hardware or independent software. For example, two or more corresponding elements may be combined to form a single element, and a single element may be divided into multiple elements. Embodiments of combining and / or dividing the various elements without departing from the concept of this disclosure also fall within the scope of this disclosure.

[0034] In this disclosure, the terms “ / ” and “,” should be interpreted as indicating “and / or”. For example, the expression “A / B” can mean “A and / or B”, while “A, B” can mean “A and / or B”. Furthermore, “A / B / C” can mean “at least one of A, B and / or C”. Additionally, “A, B, C” can mean “at least one of A, B and / or C”.

[0035] Furthermore, in this disclosure, the term "or" should be interpreted as indicating "and / or". For example, the expression "A or B" can include 1) only A, 2) only B, and / or 3) both A and B. In other words, the term "or" in this document can be interpreted as indicating "additionally or alternatively".

[0036] This disclosure can be modified in various forms, and specific embodiments thereof will be described and illustrated in the accompanying drawings. However, these embodiments are not intended to limit this disclosure. The terminology used in the following description is for the purpose of describing particular embodiments only and is not intended to limit this disclosure. Singular expressions include plural expressions, provided that they are clearly read differently. Terms such as “comprising” and “having” are intended to indicate the presence of the features, numbers, steps, operations, elements, components or combinations thereof used in the following description, and therefore it should be understood that the possibility of having or adding one or more different features, numbers, steps, operations, elements, components or combinations thereof is not excluded.

[0037] Furthermore, for ease of explanation of different specific functions, the elements in the accompanying drawings described in this disclosure are drawn independently, but this does not imply that these elements are implemented by independent hardware or independent software. For example, two or more elements may be combined to form a single element, or a single element may be divided into multiple elements. Embodiments of combining and / or dividing elements are also part of this disclosure without departing from its concept.

[0038] The following description can be applied to the technical field of processing video, images, or pictures. For example, the methods or exemplary implementations disclosed in the following description can be associated with the disclosures of the Universal Video Coding (VVC) standard (ITU-T H.266 Recommendation), next-generation video / image coding standards after VVC, or standards before VVC (e.g., the High Efficiency Video Coding (HEVC) standard (ITU-T H.265 Recommendation), etc.).

[0039] Hereinafter, examples of this embodiment will be described in detail with reference to the accompanying drawings. Furthermore, throughout the drawings, similar reference numerals are used to indicate similar elements, and identical descriptions of similar elements will be omitted.

[0040] In this disclosure, video can refer to a collection of images over time. Generally, picture refers to a unit of image representing a specific time, and slice is a unit that constitutes a part of picture. A picture can be composed of multiple slices, and the terms picture and slice can be mixed together as needed.

[0041] A pixel, or image unit, can refer to the smallest unit that makes up a picture (or image). Additionally, the term "sample" can be used as the counterpart to a pixel. A sample can typically represent a pixel or a pixel value; it can represent a pixel containing only the luminance component (pixel value) or a pixel containing only the chrominance component (pixel value).

[0042] A unit refers to a basic unit of image processing. A unit may include at least one of a specific region and information associated with that region. Optionally, a unit may be combined with terms such as block, region, etc. Typically, an M×N block may represent a set of samples or transform coefficients arranged in M ​​columns and N rows.

[0043] Figure 1 This is a block diagram that schematically illustrates the structure of an encoding apparatus according to an embodiment of the present disclosure. In the following, the encoding / decoding apparatus may include a video encoding / decoding apparatus and / or an image encoding / decoding apparatus, and a video encoding / decoding apparatus may be used as a concept that includes an image encoding / decoding apparatus, or an image encoding / decoding apparatus may be used as a concept that includes a video encoding / decoding apparatus.

[0044] refer to Figure 1 The video encoding apparatus 100 may include an image segmenter 105, a predictor 110, a residual processor 120, an entropy encoder 130, an adder 140, a filter 150, and a memory 160. The residual processor 120 may include a subtractor 121, a transformer 122, a quantizer 123, a rearranger 124, an inverse quantizer 125, and an inverse transformer 126.

[0045] Image segmenter 105 can separate an input image into at least one processing unit.

[0046] In one example, the processing unit may be referred to as a coding unit (CU). In this case, coding units can be recursively separated from the maximum coding unit (LCU) according to a quadtree-binary tree (QTBT) structure. For example, a coding unit can be separated into multiple coding units of deeper depth based on a quadtree structure, a binary tree structure, and / or a ternary tree structure. In this case, for example, a quadtree structure can be applied first, and a binary tree structure and a ternary tree structure can be applied later. Alternatively, a binary tree structure / ternary tree structure can be applied first. The coding process according to this embodiment can be performed based on the final coding unit that is no longer further separated. In this case, the maximum coding unit can be used as the final coding unit based on image characteristics such as coding efficiency, or the coding unit can be recursively separated into lower-depth coding units as needed, and the coding unit with the optimal size can be used as the final coding unit. Here, the coding process may include processes such as prediction, transformation, and reconstruction, which will be described later.

[0047] In another example, the processing unit may include a coding unit (CU), a prediction unit (PU), or a transformer (TU). The coding unit can be separated from the maximum coding unit (LCU) into deeper coding units according to a quadtree structure. In this case, the maximum coding unit can be directly used as the final coding unit based on image characteristics such as coding efficiency, or the coding unit can be recursively separated into deeper coding units as needed, and the coding unit with the optimal size can be used as the final coding unit. When a minimum coding unit (SCU) is set, the coding unit may not be separated into coding units smaller than the minimum coding unit. Here, the final coding unit refers to the coding unit that has been segmented or separated into a prediction unit or a transformer. The prediction unit is a unit segmented from the coding unit and can be a unit for sample prediction. Here, the prediction unit can be divided into sub-blocks. The transformer can be partitioned from the coding unit according to a quadtree structure, and the transformer can be a unit that derives the transform coefficients and / or a unit that derives the residual signal from the transform coefficients. In the following text, the coding unit may be referred to as a coding block (CB), the prediction unit may be referred to as a prediction block (PB), and the transformer may be referred to as a transform block (TB). A prediction block or prediction unit can refer to a specific region in the form of a block in an image and include an array of prediction samples. Similarly, a transform block or transformer can refer to a specific region in the form of a block in an image and include an array of transform coefficients or residual samples.

[0048] Predictor 110 can perform predictions on a target block (hereinafter, it can represent the current block or a residual block) and can generate a prediction block that includes prediction samples for the current block. The unit of prediction performed in predictor 110 can be a coded block, a transform block, or a prediction block.

[0049] Predictor 110 can determine whether to apply intra-frame prediction or inter-frame prediction to the current block. For example, predictor 110 can determine whether to apply intra-frame prediction or inter-frame prediction on a CU-by-CU basis.

[0050] In the case of intra-frame prediction, predictor 110 can derive a prediction sample for the current block based on reference samples outside the current block in the image to which the current block belongs (hereinafter, the current image). In this case, predictor 110 can derive the prediction sample based on the average or interpolation of the neighboring reference samples of the current block (case (i)), or it can derive the prediction sample based on a reference sample among the neighboring reference samples of the current block that exists in a specific (prediction) direction relative to the prediction sample (case (ii)). Case (i) can be referred to as a non-directional mode or a non-angular mode, and case (ii) can be referred to as a directional mode or an angular mode. In intra-frame prediction, the prediction modes can include 33 directional modes and at least two non-directional modes, as an example. Non-directional modes can include DC mode and planar mode. Predictor 110 can determine the prediction mode to be applied to the current block by using the prediction modes applied to neighboring blocks.

[0051] In the case of inter-frame prediction, predictor 110 can derive predicted samples for the current block based on samples specified by motion vectors on a reference image. Predictor 110 can derive predicted samples for the current block by applying any of the skip mode, merge mode, and motion vector prediction (MVP) mode. In the skip mode and merge mode, predictor 110 can use motion information of neighboring blocks as motion information for the current block. In the skip mode, unlike the merge mode, the difference (residual) between the predicted sample and the original sample is not sent. In the MVP mode, the motion vectors of neighboring blocks are used as a motion vector predictor to derive the motion vector of the current block.

[0052] In the case of inter-frame prediction, neighboring blocks can include spatially neighboring blocks existing in the current image and temporally neighboring blocks existing in a reference image. The reference image that includes temporally neighboring blocks can also be referred to as a colpic. Motion information can include motion vectors and reference image indices. Information such as prediction mode information and motion information can be (entropy-encoded) and then output as a bitstream.

[0053] When using motion information from temporally adjacent blocks in skip and merge modes, the highest-ranking image in the reference image list can be used as the reference image. Reference images included in the reference image list can be aligned based on the difference in Picture Order Number (POC) between the current image and its corresponding reference image. The POC corresponds to the display order and can be distinguished from the encoding order.

[0054] Subtractor 121 generates a residual sample, which is the difference between the original sample and the predicted sample. If the skip mode is applied, residual samples may not be generated as described above.

[0055] Transformer 122 transforms residual samples on a block-by-block basis to generate transform coefficients. Transformer 122 can perform the transform based on the size of the corresponding transform block and the prediction mode applied to the prediction block or coding block that spatially overlaps with the transform block. For example, if intra-frame prediction is applied to the prediction block or coding block that overlaps with the transform block and the transform block is a 4×4 residual array, a Discrete Sine Transform (DST) kernel can be used to transform the residual samples, and in other cases, a Discrete Cosine Transform (DCT) kernel is used to transform the residual samples.

[0056] Quantizer 123 can quantize the transform coefficients to generate quantized transform coefficients.

[0057] Rearranger 124 rearranges the quantized transform coefficients. Rearranger 124 can rearrange the quantized transform coefficients in block form into a one-dimensional vector using a coefficient sweep method. Although rearranger 124 is described as a separate component, it can be part of quantizer 123.

[0058] The entropy encoder 130 can perform entropy coding on quantized transform coefficients. Entropy coding can include coding methods such as Exponential Columbus, Context Adaptive Variable Length Coding (CAVLC), Context Adaptive Binary Arithmetic Coding (CABAC), etc. In addition to the quantized transform coefficients, the entropy encoder 130 can also encode information required for video reconstruction (e.g., syntax element values, etc.) either together or separately according to entropy coding or according to a pre-configured method. The entropy-coded information can be transmitted or stored in the form of a bitstream at the Network Abstraction Layer (NAL). The bitstream can be transmitted via a network or stored in a digital storage medium. Here, the network can include a broadcast network or a communication network, and the digital storage medium can include various storage media such as USB, SD, CD, DVD, Blu-ray, HDD, SDD, etc.

[0059] Dequantizer 125 dequantizes the values ​​(transformation coefficients) quantized by quantizer 123, and inverse transformer 126 performs an inverse transformation on the values ​​dequantized by dequantizer 125 to generate residual samples.

[0060] Adder 140 adds residual samples to the prediction samples to reconstruct the image. Residual samples can be added to the prediction samples in blocks to generate reconstructed blocks. Although adder 140 is described as a separate component, adder 140 can be part of predictor 110. Additionally, adder 140 can be referred to as a reconstructor or a reconstructed block generator.

[0061] Filter 150 can apply deblocking filtering and / or adaptive sample shifting to the reconstructed image. Deblocking filtering and / or adaptive sample shifting can correct artifacts at block boundaries or distortions during quantization in the reconstructed image. After deblocking filtering is complete, adaptive sample shifting can be applied on a sample-by-sample basis. Filter 150 can also apply an adaptive loop filter (ALF) to the reconstructed image. An ALF can be applied to a reconstructed image that has already undergone deblocking filtering and / or adaptive sample shifting.

[0062] Memory 160 can store reconstructed images (decoded images) or information required for encoding / decoding. Here, the reconstructed image can be a reconstructed image filtered by filter 150. The stored reconstructed image can be used as a reference image for (inter-frame) prediction of other images. For example, memory 160 can store (reference) images for inter-frame prediction. Here, the images used for inter-frame prediction can be specified according to a set of reference images or a list of reference images.

[0063] Figure 2 This is a block diagram that briefly illustrates a video / image decoding apparatus according to an embodiment of the present disclosure.

[0064] In the following text, a video decoding device may include an image decoding device.

[0065] refer to Figure 2 The video decoding device 200 may include an entropy decoder 210, a residual processor 220, a predictor 230, an adder 240, a filter 250, and a memory 260. The residual processor 220 may include a rearranger 221, an inverse quantizer 222, and an inverse transformer 223.

[0066] Additionally, although not depicted, the video decoding apparatus 200 may include a receiver for receiving a bitstream including video information. The receiver may be configured as a separate module or may be included within the entropy decoder 210.

[0067] When the input includes a bitstream containing video / image information, the video decoding device 200 can reconstruct the video / image / picture in association with the process of processing video information in the video encoding device.

[0068] For example, the video decoding apparatus 200 can use processing units applied in a video encoding apparatus to perform video decoding. Therefore, the processing unit block for video decoding can be, for example, an encoding unit, and in another example, an encoding unit, a prediction unit, or a transformer. Encoding units can be separated from the maximum encoding unit according to a quadtree structure and / or a binary tree structure and / or a ternary tree structure.

[0069] In some cases, prediction units and transformers can be further used, and in this case, the prediction block is a block derived or segmented from the coding unit, and can be a unit for sample prediction. Here, the prediction unit can be divided into sub-blocks. The transformer can be separated from the coding unit according to a quadtree structure, and can be a unit that derives the transform coefficients or a unit that derives the residual signal from the transform coefficients.

[0070] The entropy decoder 210 can parse a bitstream to output the information needed for video reconstruction or image reconstruction. For example, the entropy decoder 210 can decode information in the bitstream based on encoding methods such as exponential Golomb coding, CAVLC, CABAC, etc., and can output the values ​​of the syntax elements needed for video reconstruction and the quantization values ​​of the transform coefficients with respect to the residuals.

[0071] More specifically, the CABAC entropy decoding method can receive a bin corresponding to each syntax element in the bitstream, use information about the target syntax element to be decoded, as well as decoding information of neighboring blocks and the target block, or information about symbols / bins decoded in previous steps, to determine a context model. Based on the determined context model, it predicts the bin generation probability and performs arithmetic decoding of the bins to generate symbols corresponding to each syntax element value. Here, the CABAC entropy decoding method can update the context model after determining it using information about symbols / bins decoded for the next symbol / bin using the context model.

[0072] Information about the prediction from the information decoded in the entropy decoder 210 can be provided to the predictor 230, and the residual value (i.e., the quantized transform coefficient) that has been entropy decoded by the entropy decoder 210 can be input to the rearranger 221.

[0073] Rearranger 221 can rearrange the quantized transform coefficients into a two-dimensional block form. Rearranger 221 can perform a rearrangement corresponding to the coefficient scan performed by the encoding device. Although rearranger 221 is described as a separate component, rearranger 221 can be part of dequantizer 222.

[0074] The dequantizer 222 can dequantize the quantized transform coefficients based on the (de)quantization parameters to output the transform coefficients. In this case, the encoding device can signal information for deriving the quantization parameters.

[0075] The inverse transformer 223 can perform an inverse transformation on the transformation coefficients to derive the residual samples.

[0076] Predictor 230 can perform prediction on the current block and generate a prediction block that includes prediction samples for the current block. The unit of prediction performed in predictor 230 can be a coded block, a transform block, or a prediction block.

[0077] Predictor 230 can determine whether to apply intra-frame prediction or inter-frame prediction based on information about the prediction. In this case, the unit used to determine which one to use between intra-frame and inter-frame prediction can be different from the unit used to generate prediction samples. Furthermore, the unit used to generate prediction samples can also be different in inter-frame and intra-frame prediction. For example, it can be determined on a CU (unit of measurement) basis to determine which one to apply between inter-frame and intra-frame prediction. Additionally, for example, in inter-frame prediction, prediction samples can be generated by determining the prediction mode on a PU (unit of measurement), and in intra-frame prediction, prediction samples can be generated on a TU (unit of measurement) basis by determining the prediction mode on a PU basis.

[0078] In the case of intra-frame prediction, predictor 230 can derive prediction samples for the current block based on neighboring reference samples in the current image. Predictor 230 can derive prediction samples for the current block by applying either a directional or non-directional mode based on the neighboring reference samples of the current block. In this case, the prediction mode to be applied to the current block can be determined by using the intra-frame prediction modes of neighboring blocks.

[0079] In the case of inter-frame prediction, predictor 230 can derive prediction samples for the current block based on samples specified in the reference image according to motion vectors. Predictor 230 can derive prediction samples for the current block using one of skip mode, merge mode, and MVP mode. Here, the motion information (e.g., motion vectors and information about the reference image index) required for inter-frame prediction of the current block provided by the video coding apparatus can be obtained or derived based on information about the prediction.

[0080] In skip and merge modes, motion information from neighboring blocks can be used as motion information for the current block. Here, neighboring blocks can include spatially neighboring blocks and temporally neighboring blocks.

[0081] Predictor 230 can construct a merge candidate list using motion information of available neighboring blocks, and use the information indicated by the merge index on the merge candidate list as the motion vector of the current block. The merge index can be signaled by the encoding device. The motion information can include motion vectors and reference pictures. In skip mode and merge mode, when using motion information of temporally neighboring blocks, the first sorted picture in the reference picture list can be used as the reference picture.

[0082] In the skip mode, unlike the merge mode, the difference (residual) between the predicted sample and the original sample is not sent.

[0083] In MVP mode, the motion vectors of neighboring blocks can be used as motion vector predictors to derive the motion vector of the current block. Here, neighboring blocks can include spatially neighboring blocks and temporally neighboring blocks.

[0084] When applying a merge pattern, for example, a merge candidate list can be generated using the motion vectors of reconstructed spatially neighboring blocks and / or motion vectors corresponding to Col blocks that are temporally neighboring blocks. The motion vectors of candidate blocks selected from the merge candidate list are used as the motion vectors of the current block in the merge pattern. The aforementioned information about the prediction may include a merge index that indicates the candidate block with the best motion vector selected from the candidate blocks included in the merge candidate list. Here, predictor 230 can use the merge index to derive the motion vector of the current block.

[0085] When applying the MVP (Motion Vector Prediction) mode as another example, a motion vector predictor candidate list can be generated using the motion vectors of the reconstructed spatially neighboring blocks and / or the motion vectors corresponding to the Col blocks, which are temporally neighboring blocks. That is, the motion vectors of the reconstructed spatially neighboring blocks and / or the motion vectors corresponding to the Col blocks, which are temporally neighboring blocks, can be used as motion vector candidates. The aforementioned prediction information may include a predicted motion vector index indicating the best motion vector selected from the motion vector candidates included in the list. Here, predictor 230 can use the motion vector index to select the predicted motion vector of the current block from the motion vector candidates included in the motion vector candidate list. The predictor of the encoding device can obtain the motion vector difference (MVD) between the motion vector of the current block and the motion vector predictor, encode the MVD, and output the encoded MVD as a bitstream. That is, the MVD can be obtained by subtracting the motion vector predictor from the motion vector of the current block. Here, predictor 230 can obtain the motion vectors included in the prediction information and derive the motion vector of the current block by adding the motion vector difference to the motion vector predictor. Additionally, the predictor can obtain or derive a reference image index indicating a reference image from the aforementioned prediction information.

[0086] Adder 240 can add residual samples to the prediction samples to reconstruct the current block or the current image. Adder 240 can reconstruct the current image by adding residual samples to the prediction samples on a block-by-block basis. When a skip mode is applied, no residuals are sent, and therefore the prediction samples can become the reconstructed samples. Although adder 240 is described as a separate component, adder 240 can be part of predictor 230. Additionally, adder 240 can be referred to as a reconstructor or a reconstructed block generator.

[0087] Filter 250 can apply deblocking filtering, adaptive sample shifting, and / or ALF to the reconstructed image. Here, adaptive sample shifting can be applied on a sample-by-sample basis after deblocking filtering. ALF can be applied after deblocking filtering and / or after applying adaptive sample shifting.

[0088] Memory 260 can store reconstructed images (decoded images) or information required for decoding. Here, the reconstructed image can be a reconstructed image filtered by filter 250. For example, memory 260 can store images used for inter-frame prediction. Here, the images used for inter-frame prediction can be specified according to a set of reference images or a list of reference images. The reconstructed image can be used as a reference image for other images. Memory 260 can output the reconstructed images in the output order.

[0089] Furthermore, as mentioned above, prediction is performed during video encoding to improve compression efficiency. Therefore, a prediction block can be generated that includes prediction samples for the current block, which is the block to be encoded (i.e., the target block for encoding). Here, the prediction block includes prediction samples in the spatial domain (or pixel domain). The prediction block is derived in the same manner in both the encoding and decoding devices, and the encoding device can signal information about the residual between the original block and the prediction block (residual information) rather than the original sample values ​​of the original block to the decoding device, thereby improving image encoding efficiency. The decoding device can derive a residual block including residual samples based on the residual information, add the residual block to the prediction block to generate a reconstructed block including reconstructed samples, and generate a reconstructed image including the reconstructed block.

[0090] Residual information can be generated through transformation and quantization processes. For example, the encoding device can derive a residual block between the original block and the prediction block, perform a transformation process on the residual samples (residual sample array) included in the residual block to derive transform coefficients, perform a quantization process on the transform coefficients to derive quantized transform coefficients, and signal the relevant residual information to the decoding device (via bitstream). Here, the residual information can include the value information, position information, transform technique, transform kernel, and quantization parameters of the quantized transform coefficients. The decoding device can perform inverse quantization / inverse transform processes based on the residual information and derive residual samples (or residual blocks). The decoding device can generate a reconstructed image based on the prediction block and the residual block. In addition, for inter-frame prediction of the reference image later, the encoding device can also perform inverse quantization / inverse transform on the quantized transform coefficients to derive residual blocks and generate a reconstructed image based on the residual blocks.

[0091] Figure 3 An exemplary illustration is shown of a content streaming system according to an embodiment of the present disclosure.

[0092] refer to Figure 3The embodiments described in this disclosure can be embodied and executed on a processor, microprocessor, controller, or chip. For example, the functional units shown in each figure can be embodied and executed on a computer, processor, microprocessor, controller, or chip. In this case, information for the implementation (e.g., information about instructions) or algorithms can be stored in a digital storage medium.

[0093] Furthermore, the decoding and encoding devices used in this disclosure can include multimedia broadcast transceivers, mobile communication terminals, home theater video equipment, digital cinema video equipment, surveillance cameras, video chat devices, real-time communication devices such as video communication, mobile streaming devices, storage media, portable cameras, video-on-demand (VoD) service providers, over-the-top (OTT) video devices, internet streaming service providers, three-dimensional (3D) video devices, video telephony devices, and medical video devices, and can be used to process video signals or data signals. For example, over-the-top (OTT) video devices can include game consoles, Blu-ray players, internet access televisions, home theater systems, smartphones, tablets, digital video recorders (DVRs), and so on.

[0094] Furthermore, the processing methods applied in this disclosure can be generated in the form of a program executed by a computer and stored in a computer-readable recording medium. Multimedia data having the data structure according to this disclosure can also be stored in a computer-readable recording medium. Computer-readable recording media include various storage devices and distributed storage devices in which computer-readable data is stored. Such computer-readable recording media can include, for example, Blu-ray discs (BD), Universal Serial Bus (USB), ROM, PROM, EPROM, EEPROM, RAM, CD-ROM, magnetic tape, floppy disks, and optical data storage devices. In addition, computer-readable recording media include media embodied in carrier wave form (e.g., transmission via the Internet). Furthermore, bitstreams generated by encoding methods can be stored in a computer-readable recording medium or transmitted via wired or wireless communication networks.

[0095] Furthermore, the exemplary embodiments of this disclosure can be embodied in program code as a computer program product, and the program code can be executed by a computer according to the exemplary embodiments of this disclosure. The program code can be stored on a computer-readable medium.

[0096] The content streaming system used in this disclosure may mainly include encoding servers, streaming servers, web servers, media storage devices, user devices, and multimedia input devices.

[0097] An encoding server is used to compress content input from multimedia input devices (such as smartphones, cameras, camcorders, etc.) into digital data to generate a bitstream, which is then sent to a streaming server. As another example, in cases where multimedia input devices such as smartphones, cameras, camcorders, etc., directly generate bitstreams, the encoding server can be omitted.

[0098] Bitstreams can be generated using the encoding methods or bitstream generation methods applied in this disclosure. Furthermore, the streaming server can temporarily store the bitstream during the sending or receiving process.

[0099] A streaming server sends multimedia data to a user's device via a web server based on a user's request. The web server acts as a tool to notify the user of available services. When a user requests a desired service, the web server transmits the request to the streaming server, which then sends the multimedia data to the user. In this respect, a content streaming system may include a separate control server, which in this case acts as a command / response controller between the respective devices within the content streaming system.

[0100] A streaming server can receive content from media storage devices and / or encoding servers. For example, when receiving content from an encoding server, the content can be received in real time. In this case, the streaming server can store the bitstream over a predetermined period of time to smoothly provide streaming services.

[0101] For example, user equipment may include mobile phones, smartphones, laptops, digital broadcasting terminals, personal digital assistants (PDAs), portable multimedia players (PMPs), navigation devices, tablet PCs, tablet computers, ultrabooks, wearable devices (e.g., smartwatches, smart glasses, head-mounted displays (HMDs)), digital televisions, desktop computers, digital signage, etc.

[0102] Each server in a content streaming system can operate as a distributed server, and in this case, the data received by each server can be processed in a distributed manner.

[0103] The reference will be described in detail below. Figure 1 and Figure 2 The inter-frame prediction method is described.

[0104] Various inter-frame prediction modes can be used to predict the current block within an image. For example, modes such as merge mode, skip mode, motion vector prediction (MVP) mode, affine mode, and history motion vector prediction (HMVP) mode can be used. Decoder-side motion vector refinement (DMVR) mode, adaptive motion vector resolution (AMVR) mode, etc., can be further used as additional modes. Affine mode can also be referred to as affine motion prediction mode. MVP mode can also be referred to as advanced motion vector prediction (AMVP). In this document, some modes and / or motion information candidates derived from some modes can also be included as one of the motion information-related candidates in other modes.

[0105] Prediction mode information, indicating the inter-frame prediction mode of the current block, can be signaled from the encoding device to the decoding device. The prediction mode information can be included in the bitstream and received by the decoding device. The prediction mode information may include index information indicating one of several candidate modes. Alternatively, the prediction mode information may also indicate the inter-frame prediction mode via hierarchical signaling of flag information. In this case, the prediction mode information may include one or more flags. For example, the prediction mode information may indicate whether to apply a skip mode by signaling a skip flag, whether to apply a merge mode by signaling a merge flag when a skip mode is not applied, whether to apply an MVP mode when a merge mode is not applied, or further signal flags for additional identification. Affine modes may be signaled as independent modes or as modes dependent on merge modes, MVP modes, etc. For example, affine modes may include affine merge modes and affine MVP modes.

[0106] Motion information from the current block can be used to perform inter-frame prediction. The encoding device can derive the optimal motion information for the current block through a motion estimation process. For example, the encoding device can use original blocks within the original image of the current block to search for highly correlated similar reference blocks within a predetermined search range in the reference image, in fractional pixel units, thereby deriving motion information. Block similarity can be derived based on the difference between phase-based sample values. For example, block similarity can be calculated based on the sum of absolute differences (SAD) between the current block (or its template) and a reference block (or its template). In this case, motion information can be derived based on the reference block with the minimum SAD within the search area. The derived motion information can be signaled to the decoding device based on the inter-frame prediction mode using various methods.

[0107] Figure 4 An affine motion model according to an exemplary embodiment of the present disclosure is illustrated.

[0108] Typical video coding systems use a single motion vector to represent the motion of a coded block. However, while this method can represent optimal motion at the block level, it may not actually represent optimal motion for every single pixel. Therefore, to further improve coding efficiency, affine mode or affine motion prediction mode can be used. This mode uses an affine motion model capable of determining the optimal motion vector at the pixel level to perform coding. Here, affine mode can also determine the optimal motion vector within the sub-block units of the current block, further improving coding efficiency. Affine motion prediction mode can use two, three, or four motion vectors to represent the motion vector in each pixel unit of a block.

[0109] refer to Figure 4 The affine motion model may include four motion models, but these are exemplary motion models and the scope of this disclosure is not limited thereto. The four motions mentioned above may include translation, scaling, rotation, and shearing.

[0110] Figure 5a and Figure 5b Exemplary illustrations show 4-parameter affine models and 6-parameter affine models according to exemplary embodiments of the present disclosure.

[0111] refer to Figure 5a and Figure 5b Affine motion prediction can define control points (CPs) to use an affine motion model, and use two or more control point motion vectors (CPMVs) to determine the motion vectors of the pixel locations or sub-blocks included in a block. Here, a set of motion vectors of the pixel locations or sub-blocks included in a block can be called an affine motion vector field (affine MVF).

[0112] refer to Figure 5a A 4-parameter affine model can refer to a model that uses two CPMVs to determine the motion vectors of pixel positions or sub-blocks, and the motion vectors or affine motion vector fields of pixel positions or sub-blocks can be derived as expressed in Equation 1.

[0113] [Formula 1]

[0114]

[0115] refer to Figure 5b The 6-parameter affine model can refer to a model that uses three CPMV motion vectors to determine the motion vectors of a pixel position or sub-block, and the motion vectors or affine motion vector fields of the pixel position or sub-block can be derived as expressed in Equation 2.

[0116] [Equation 2]

[0117]

[0118] In Equations 1 and 2, mv 0xand MV 0y It can refer to the CPMV of the top-left corner of the current block, mv 1x and MV 1y It can refer to the CPMV of the top-right corner of the current block, and mv 2x and MV 2y This can refer to the CPMV of the bottom-left corner of the current block. Additionally, W can refer to the width of the current block, and H can refer to the height of the current block. x and MV y It can refer to a pixel at position (x,y) or a motion vector that includes a sub-block at position (x,y).

[0119] In other words, the exemplary embodiments of this disclosure can provide an affine motion prediction method.

[0120] Typically, in video coding, motion estimation (ME) and motion compensation (MC) are performed based on translational motion models that effectively represent simple motions. However, this model may not be able to effectively represent complex motions in natural videos, such as scaling, rotation, and other irregular motions. Therefore, affine motion prediction can be proposed based on affine motion models to overcome the limitations of translational motion models.

[0121] If a 4-parameter affine motion model is used, the affine motion vector field (MVF) can be represented by two motion vectors. (Reference) Figure 5a The top-left and top-right points can be represented as the 0th control point (CP0) and the first control point (CP1), and their corresponding motion vectors can be represented as the 0th control point motion vector (CPMV0) and the first control point motion vector (CPMV1). Figure 5a In this context, mv0 can refer to CPMV0, and mv1 can refer to CPMV1.

[0122] Figure 6 The illustration exemplarily depicts the situation where an affine motion vector field according to an exemplary embodiment of the present disclosure is determined on a sub-block basis.

[0123] In affine motion compensation, to reduce its complexity, the affine MVF can be determined at the sub-block level. If a 4-parameter affine motion model is used, the motion vector at the center position of each sub-block can be calculated as expressed in Equation 1. For example, Figure 6 An example could be that the affine MVF is determined at the 4×4 sub-block level, but it could also be determined at the sub-block level with different sizes, and it could also be determined on a sample basis, so that the scope of this disclosure is not limited thereto.

[0124] Figure 7A flowchart of an affine motion prediction method according to an exemplary embodiment of the present disclosure is shown as an example.

[0125] refer to Figure 7 The affine motion prediction method can be roughly represented as follows. When the affine motion prediction method begins, the CPMV pair (S700) can be obtained first. Here, if a 4-parameter affine model is used, the CPMV pair can include CPMV0 and CPMV1.

[0126] First, the CPMV pair (S700) can be obtained. Here, if a 4-parameter affine model is used, the CPMV pair can include CPMV0 and CPMV1.

[0127] After that, affine motion compensation can be performed based on CPMV (S710), and affine motion prediction can be terminated.

[0128] To determine CPMV0 and CPMV1, two affine prediction modes can exist. These modes can include an inter-affine prediction mode and an affine merging mode. The inter-affine prediction mode can clearly determine CPMV0 and CPMV1 by signaling the motion vector difference (MVD) information between them. Conversely, the affine merging mode can derive the CPMV pair without signaling MVD information.

[0129] In other words, the affine merging mode can derive the CPMV of the current block using the CPMV of neighboring blocks encoded in the affine mode, and if the motion vector is determined on a sub-block basis, the affine merging mode can also be called the sub-block merging mode.

[0130] In affine merging mode, the encoding device can signal the decoding device to the index of neighboring blocks encoded in affine mode for deriving the CPMV of the current block, and also signal the difference between the CPMV of the neighboring blocks and the CPMV of the current block. Here, the affine merging mode can configure an affine merging candidate list based on neighboring blocks, and the index of the neighboring block can represent the neighboring block that will be referenced in the affine merging candidate list to derive the CPMV of the current block. The affine merging candidate list can also be called a sub-block merging candidate list.

[0131] Affine inter-frame mode can also be called affine MVP mode. Affine MVP mode derives the CPMV of the current block based on the Control Point Motion Vector Predictor (CPMVP) and the Control Point Motion Vector Difference (CPMVD). That is, the encoding device can determine the CPMVP for the CPMV of the current block and derive the CPMVD as the difference between the CPMVP and CPMV of the current block, signaling the information about the CPMVP and the CPMVD to the decoding device. Here, affine MVP mode can configure an affine MVP candidate list based on neighboring blocks, and the information about the CPMVP can represent the neighboring blocks referenced in order to derive the CPMVP of the current block's CPMV from the affine MVP candidate list. The affine MVP candidate list can also be called the Control Point Motion Vector Predictor candidate list.

[0132] Figure 8 An exemplary illustration shows the location of a neighboring block for checking a neighboring affine block according to an exemplary embodiment of the present disclosure.

[0133] Exemplary embodiments of this disclosure can provide inherited affine candidates for affine merge patterns. That is, inherited affine candidates can be considered as candidates for affine merge patterns.

[0134] Here, the method for using inherited affine candidates can be as follows: if a neighboring block is a block encoded via affine prediction (hereinafter referred to as a neighboring affine block), then the affine motion model of the neighboring affine block is used to induce motion information (motion vector and reference image index) for the current block, and the induced motion information is used to encode / decode the encoded block. Therefore, inherited affine candidates are only effective when neighboring affine blocks exist, and a predetermined maximum of n inherited affine merging candidates can be generated. Here, n can be 0 or a natural number.

[0135] Assuming n is 1, if the number of neighboring affine blocks is 1, an affine merge candidate can be generated. If the number of neighboring affine blocks is two or more, neighboring affine blocks can be selected to generate an affine merge candidate, and any of the following methods can be used as the selection method.

[0136] (1) The neighboring affine block first identified by checking neighboring blocks according to a predetermined order can be used in the affine merge mode. Neighboring blocks may include... Figure 8Blocks A, B, C, D, E, F, G, or some of them are shown. Here, various considerations can be made regarding the inspection order. (2) The neighboring affine block with the smallest reference index or the reference frame closest to the current block can be used in the affine merging mode. (3) Blocks determined according to a predetermined priority can be used from among the blocks including the neighboring affine blocks with the most frequently occurring reference index. Here, the most frequently occurring reference index may refer to the most common reference index based on the number of reference indices of all neighboring blocks or the neighboring affine blocks. (4) The block with the largest block size among the neighboring affine blocks can be used. Here, if there are two or more blocks with the largest block size, these blocks can be determined according to a predetermined order.

[0137] The aforementioned methods have already been described under the assumption that n is 1, but the case where n is 2 or greater can also be considered extensively. For example, assuming n is 2, each method can perform a pruning check as follows. Furthermore, in each method, the case where n exceeds 2 can also be considered extensively.

[0138] (1) The two neighboring affine blocks first identified by checking neighboring blocks according to a predetermined order can be used in the affine merging mode. Neighboring blocks may include... Figure 8 (1) Blocks A, B, C, D, E, F, G, or some of them are shown. (2) The neighboring affine block with the smallest reference index or the reference frame closest to the current block can be used in the affine merging mode. If there are three or more neighboring affine blocks with the smallest reference index, two neighboring affine blocks determined according to a predetermined priority can be used in the affine merging mode. (3) Two blocks determined according to a predetermined priority can be used from among the blocks including the neighboring affine block with the most frequent reference index. Here, the most frequent reference index may refer to the most frequent reference index based on the number of reference indices of all neighboring blocks or the reference indices of neighboring affine blocks. (4) The block with the largest block size among the neighboring affine blocks can be used. Here, if there are three or more blocks with the largest block size, these blocks can be determined according to a predetermined order.

[0139] Exemplary embodiments of this disclosure can provide inherited affine candidates for affine inter-frame modes. That is, inherited affine candidates can be considered as candidates for affine inter-frame modes.

[0140] Here, the method for using inherited affine candidates can be as follows: induce the motion vector of the current block using an affine motion model and use the induced motion vector to encode / decode the coded block. Therefore, inherited affine candidates are only effective when neighboring affine blocks exist, and a predetermined maximum of n inherited affine candidates can be generated. Here, n can be 0 or a natural number.

[0141] Assuming n is 1, if the number of neighboring affine blocks is 1, an inherited affine candidate can be generated. Here, if the reference image of the current block is different from the reference images of the neighboring affine blocks, the affine merge candidate can be scaled and used based on the reference image of the current block. This can be called a scaled affine candidate. If the number of neighboring affine blocks is two or more, neighboring affine blocks can be selected to generate an affine merge candidate, and any of the following methods can be used as the selection method.

[0142] (1) The neighboring affine block first identified by checking neighboring blocks according to a predetermined order can be used in the affine merge mode. Neighboring blocks may include... Figure 8 Blocks A, B, C, D, E, F, G, or some of them are shown. If the reference images of the current block and neighboring affine blocks are different, scaled affine candidates can be used. (2) Neighboring affine blocks that have the same reference image or index as the current (encoded) block can be used as affine candidates. If there are two or more neighboring affine blocks with the same reference index, the neighboring affine blocks determined according to a predetermined priority can be used as affine candidates. If there are no reference affine blocks with the same reference index, scaled affine candidates of neighboring affine blocks in a predetermined order can be used. Alternatively, scaled affine candidates of neighboring affine blocks with reference images close to the current block can be used. Inherited affine candidates may not be considered.

[0143] Assuming n is 2, if the number of neighboring affine blocks is 1, an affine merge candidate can be generated. Here, if the reference image of the current block and the reference images of neighboring affine blocks are different, the affine merge candidate can be scaled and used based on the reference image of the current block. This can be called a scaled affine merge candidate. If the number of neighboring affine blocks is two or more, neighboring affine blocks can be selected to generate an affine merge candidate, and any of the following methods can be used as the selection method.

[0144] (1) The two neighboring affine blocks first identified by checking neighboring blocks according to a predetermined order can be used in the affine merging mode. Neighboring blocks may include... Figure 8Blocks A, B, C, D, E, F, G, or some of them are shown. Here, if the reference images of the current block and the neighboring affine blocks are different, scaled affine merging candidates can be used. (2) Neighboring affine blocks that have the same reference image or index as the current (encoded) block can be used as affine candidates. If there are three or more neighboring affine blocks with the same reference index, the neighboring affine blocks determined according to a predetermined priority can be used as affine candidates. If there are fewer than two neighboring affine blocks with the same reference index, scaled affine candidates of neighboring affine blocks in a predetermined order can be used. Alternatively, scaled affine candidates of neighboring affine blocks that have a reference image close to the current block can be used. Alternatively, scaled affine candidates of neighboring affine blocks that have a reference image close to the reference image of the current block can be used. Alternatively, inherited affine candidates can be disregarded.

[0145] Figure 9 The illustration exemplarily depicts the use of two groups to examine adjacent affine blocks according to an exemplary embodiment of this disclosure.

[0146] Exemplary embodiments of this disclosure may propose a method for considering inherited affine candidates as candidates for affine inter-frame modes using groups. Two or three groups can be configured; the case of using two groups will be described below, and references will follow. Figure 10 Describe the scenario where three groups are used.

[0147] refer to Figure 9 This allows dividing the block to be inspected into two groups and identifying a candidate block within each group. Checking the location of neighboring affine blocks can be... Figure 9 Blocks A, B, C, D, E, F, G, or some of them are shown, and these can be referred to as neighboring blocks. Two groups can include group A and group B. Group A can include blocks A, D, G, or some of them in the neighboring blocks, and group B can include blocks B, C, E, F, or some of them in the neighboring blocks.

[0148] The inspection order of groups can be group A → group B, but is not limited to this. The inspection order of group A can be block A → block D → block G, but the inspection can also be performed in various orders, so it is not limited to this. The inspection order of group B can be block B → block C → block F → block E, but the inspection can also be performed in various orders, so it is not limited to this.

[0149] As a detailed method for determining affine candidates in group A, any of the following methods can be used, and can also be applied to group B. (1) The first neighboring affine block in the inspection order of group A can be considered as an inheritance candidate. Here, if the reference image of the current block is different from the reference image of the neighboring affine block, a scaled inheritance candidate can be considered. (2) A neighboring affine block in the inspection order of group A that has the same reference image as the current reference image can be considered as a candidate, and if no neighboring affine block exists, a scaled candidate can be considered. (3) A neighboring affine block in the inspection order of group A that has the same reference image as the current reference image can be considered as a candidate, and if no neighboring affine block exists, a neighboring affine block can not be considered as a candidate.

[0150] Figure 10 An exemplary illustration shows the use of three groups to examine adjacent affine blocks according to an exemplary embodiment of this disclosure.

[0151] refer to Figure 10 This allows dividing the block to be inspected into three groups and identifying one candidate block within each group. Checking the positions of neighboring affine blocks can be... Figure 10 The blocks shown are A, B, C, D, E, F, G, or some of them, and these can be referred to as neighboring blocks. Three groups can be included: group A, group B, and group C. Group A can include blocks A, D, or some of them from the neighboring blocks; group B can include blocks B, C, or some of them from the neighboring blocks; and group C can include blocks E, F, G, or some of them from the neighboring blocks.

[0152] The inspection order of groups can be group A → group B → group C, but is not limited to this. The inspection order of group A can be block A → block D or block D → block A, the inspection order of group B can be block B → block C or block C → block B, and the inspection order of group C can be block G → block E → block F. However, the inspection can be performed in various orders, so it is not limited to this.

[0153] As a detailed method for determining affine candidates in group A, any of the following methods can be used, and can also be applied to groups B and C. (1) The first neighboring affine block in the inspection order of group A can be considered as an inheritance candidate. Here, if the reference image of the current block is different from the reference image of the neighboring affine block, a scaled inheritance candidate can be considered. (2) A neighboring affine block in the inspection order of group A that has the same reference image as the current reference image can be considered as a candidate, and if no neighboring affine block exists, a scaled candidate can be considered. (3) A neighboring affine block in the inspection order of group A that has the same reference image as the current reference image can be considered as a candidate, and if no neighboring affine block exists, a neighboring affine block can not be considered as a candidate.

[0154] Figure 11 The illustration schematically depicts a video encoding method using an encoding apparatus according to an exemplary embodiment of the present disclosure.

[0155] Figure 11 The method shown can be derived from Figure 1 The encoding device shown is executed. For example, Figure 11 S1100 to S1140 shown can be executed by the predictor of the encoding device, S1150 can be executed by the subtractor of the encoding device, and S1160 can be executed by the entropy encoder of the encoding device.

[0156] The encoding device generates a motion information candidate list for the current block (S1100). Here, the motion information candidate list may include an affine candidate list. Alternatively, the motion information candidate list may include inherited affine candidates. Inherited affine candidates can be derived based on candidate blocks encoded by affine prediction among the spatially neighboring blocks of the current block.

[0157] Candidate blocks can be some blocks from the spatial neighboring blocks of the current block. That is, candidate blocks can be included in the spatial neighboring blocks. Inherited affine candidates can be generated up to a predetermined maximum number. Inherited affine candidates can be candidates based on an affine merging pattern, and can also be candidates based on an affine inter-frame pattern. Therefore, the motion information candidate list can include a merge candidate list or an affine merge candidate list, or it can include an MVP candidate list or an affine MVP candidate list.

[0158] For example, an inherited affine candidate can be a candidate based on an affine merging pattern. If the number of candidate blocks equals the maximum number, an inherited affine candidate can be derived one by one for each candidate block. For example, if the number of candidate blocks encoded by affine prediction is 1, an inherited affine candidate can be derived based on the above.

[0159] However, if the number of candidate blocks exceeds the maximum number, inherited affine candidates can be derived based on candidate blocks encoded by affine prediction that are preferentially confirmed by checking spatially neighboring blocks according to a predetermined scan order. Here, the maximum number of candidate blocks can be used, and the predetermined scan order can refer to either a predetermined order or a checking order.

[0160] Alternatively, if the number of candidate blocks is greater than the maximum, inherited affine candidates can be derived based on the candidate block with the smallest reference index or the candidate block with the closest reference image to the current image. Here, the current image can refer to the image that includes the current block.

[0161] Alternatively, if the number of candidate blocks exceeds the maximum, inherited affine candidates can be derived based on the candidate block with the reference index that appears most frequently among the reference indices of spatially neighboring blocks or among the reference indices of candidate blocks. Alternatively, inherited affine candidates can be derived based on the candidate block with the largest block size. (See reference...) Figure 8 It was described in detail. Figure 8 In this context, the maximum number can refer to n, and the cases of n=1 and n=2 have been described as examples, but the value of n is not limited to these and can be increased.

[0162] For example, inherited affine candidates can be candidates based on affine inter-frame patterns. If the number of candidate blocks equals the maximum number, inherited affine candidates can be derived one by one for each candidate block. For example, if the number of candidate blocks encoded by affine prediction is 1, an inherited affine candidate can be derived based on the above.

[0163] Here, if the reference image of the current block and the reference image of the candidate block are different, an inherited affine candidate can be derived based on the motion vector of the candidate block, and the motion vector of the candidate block can be scaled based on the reference image of the current block. Alternatively, the motion vector of the candidate block can be scaled based on the distance between the current block and its reference image, and the distance between the candidate block and its reference image.

[0164] However, if the number of candidate blocks is greater than the maximum number, inherited affine candidates can be derived based on candidate blocks with the same reference image or reference index as the current block. Alternatively, if the number of candidate blocks is greater than the maximum number and no candidate block has the same reference image or reference index as the current block, inherited affine candidates can be derived based on the motion vector of a candidate block in a predetermined scan order, the motion vector of a candidate block with the reference image closest to the current image, or the motion vector of a candidate block with the reference image closest to the current block's reference image, and the motion vector of the candidate block can be scaled based on the reference image of the current block. Alternatively, the motion vector of the candidate block can be scaled based on the distance between the current block and its reference image, and the distance between candidate blocks and their reference images. Here, the current image can refer to the image including the current block, and the predetermined scan order can refer to a predetermined order or an inspection order. Figure 8 It was described in detail. Figure 8 In this context, the maximum number can refer to n, and the cases of n=1 and n=2 have been described as examples, but the value of n is not limited to these and can be increased.

[0165] For example, if the inherited affine candidate is a candidate based on an affine inter-frame pattern, the spatial neighboring blocks of the current block can be grouped. Alternatively, the spatial neighboring blocks of the current block can be divided into two or more groups. Inherited affine candidates can be derived based on the groups. Alternatively, inherited affine candidates can be derived for each group individually. Alternatively, inherited affine candidates can be derived for each group based on the candidate blocks within that group. Alternatively, inherited affine candidates can select candidate blocks for each group individually, and inherited affine candidates can be derived based on the selected candidate blocks.

[0166] For example, these groups can include a first group and a second group. The first group can include the bottom-left neighboring block of the current block and the top-left neighboring block of the current block. Additionally, the first group can also include the bottom-left neighboring block of the current block. The second group can include the top-left neighboring block, the top-right neighboring block of the current block, and the top-left neighboring block of the current block. Additionally, the second group can also include the top-right neighboring block of the current block. (See reference) Figure 9 It was described in detail. Figure 9 In this context, group A can refer to the first group, and group B can refer to the second group.

[0167] For example, these groups can include a first group, a second group, and a third group. The first group can include the bottom-left neighboring block of the current block and the top-left neighboring block of the current block; the second group can include the top-right neighboring block of the current block and the top-left neighboring block of the current block; and the third group can include the top-left neighboring block of the current block, the top-right neighboring block of the current block, and the bottom-left neighboring block of the current block. (See reference) Figure 10 It was described in detail. Figure 10 In this context, group A can refer to the first group, group B can refer to the second group, and group C can refer to the third group.

[0168] Here, inherited affine candidates can be derived based on candidate blocks encoded by affine prediction, which are preferentially identified by examining blocks within each group according to a predetermined scan order. Alternatively, if the reference image of the current block is different from the reference image of the candidate block, inherited affine candidates can be derived based on the motion vector of the candidate block, and the motion vector of the candidate block can be scaled based on the reference image of the current block. Alternatively, the motion vector of the candidate block can be scaled based on the distance between the current block and its reference image, and the distance between the candidate block and its reference image. Alternatively, inherited affine candidates can be derived based on candidate blocks in each group that have the same reference image as the reference image of the current block. Figure 9 and Figure 10 It has been described in detail. If there are two or three groups, it has been referenced. Figure 9 and Figure 10 The inspection order between groups and within each group is described, but this is for ease of interpretation and the inspection order applicable to this disclosure is not limited thereto. Furthermore, candidate blocks described above can be used interchangeably with neighboring blocks.

[0169] The encoding device selects one of the candidates included in the motion information candidate list (S1110). Here, selection information can be generated. The selection information may include information about the candidate selected from the motion information candidate list, and may also include index information about the candidate selected from the motion information candidate list.

[0170] The encoding device derives the control point motion vector (CPMV) for the current block based on the selected candidate (S1120). The control point motion vector can refer to the motion vector at the control point. (See reference...) Figure 8 Control points can include control point CP0 located at the top-left sample position of the current block and control point CP1 located at the top-right sample position of the current block, and can also include control point CP2 located at the bottom-left sample position of the current block. (Already referenced...) Figure 5a and Figure 5b It was described in detail.

[0171] The encoding device derives the sub-block unit motion vector or sample unit motion vector of the current block based on the CPMV (S1130). The encoding device can derive an affine motion vector field based on the CPMV. The affine motion vector field can derive the sub-block unit motion vector or sample unit motion vector based on the x and y components of the CPMV. Here, the sub-block unit motion vector can represent the motion vector at the center of the sub-block. The affine motion vector field can be derived according to the number of CPMVs using Equation 1 or Equation 2, but is not limited thereto.

[0172] The encoding device derives a prediction block based on the motion vector of the sub-block unit or the motion vector of the sample unit (S1140). Here, the prediction block can refer to a block that is highly correlated with the current block.

[0173] The encoding device generates a residual block of the current block based on the prediction block (S1150). The residual block can be derived based on the prediction block and the current block. Alternatively, the residual block can be derived based on the difference between the prediction block and the current block.

[0174] The encoding device outputs a bitstream by encoding image information, including information about the residual block (S1160). The information about the residual block may include the residual block itself and information about it. Here, the image information may also include selection information, and the encoding device may signal the image information that also includes selection information. Alternatively, the encoding device may output a bitstream by encoding image information that also includes selection information. The bitstream can be transmitted to the decoding device via a network or storage medium.

[0175] Figure 12 The illustration schematically depicts a video decoding method using a decoding apparatus according to an exemplary embodiment of the present disclosure.

[0176] Figure 12 The method shown can be derived from Figure 2 The decoding device shown performs this operation. For example, Figure 12 S1200 to S1240 shown can be executed by the predictor of the decoding device, and S1250 can be executed by the reconstructor of the decoding device.

[0177] The decoding device generates a motion information candidate list for the current block (S1200). Here, the motion information candidate list may include an affine candidate list. Alternatively, the motion information candidate list may include inherited affine candidates. Inherited affine candidates can be derived based on candidate blocks encoded by affine prediction in the spatially neighboring blocks of the current block.

[0178] Candidate blocks can be some blocks from the spatial neighboring blocks of the current block. That is, candidate blocks can be included in the spatial neighboring blocks. Inherited affine candidates can be generated up to a predetermined maximum number. Inherited affine candidates can be candidates based on an affine merging pattern, and can also be candidates based on an affine inter-frame pattern. Therefore, the motion information candidate list can include a merge candidate list or an affine merge candidate list, or it can include an MVP candidate list or an affine MVP candidate list.

[0179] For example, an inherited affine candidate can be a candidate based on an affine merging pattern. If the number of candidate blocks equals the maximum number, an inherited affine candidate can be derived for each candidate block individually. For example, if the number of candidate blocks encoded by affine prediction is 1, an inherited affine candidate can be derived based on the above.

[0180] However, if the number of candidate blocks exceeds the maximum number, inherited affine candidates can be derived based on candidate blocks encoded by affine prediction that are preferentially confirmed by checking spatially neighboring blocks according to a predetermined scan order. Here, the maximum number of candidate blocks can be used, and the predetermined scan order can also be referred to as the predetermined order or the checking order.

[0181] Alternatively, if the number of candidate blocks is greater than the maximum, inherited affine candidates can be derived based on the candidate block with the smallest reference index or the candidate block with the closest reference image to the current image. Here, the current image can refer to the image that includes the current block.

[0182] Alternatively, if the number of candidate blocks exceeds the maximum, inherited affine candidates can be derived based on the candidate block with the reference index that appears most frequently among the reference indices of spatially neighboring blocks or among the reference indices of candidate blocks. Alternatively, inherited affine candidates can be derived based on the candidate block with the largest block size. (See reference...) Figure 8 It was described in detail. Figure 8 In this context, the maximum number can refer to n, and the cases of n=1 and n=2 have been described as examples, but the value of n is not limited to these and can be increased.

[0183] For example, inherited affine candidates can be candidates based on affine inter-frame patterns. If the number of candidate blocks equals the maximum number, inherited affine candidates can be derived one by one for each candidate block. For example, if the number of candidate blocks encoded by affine prediction is 1, an inherited affine candidate can be derived based on the above.

[0184] Here, if the reference image of the current block and the reference image of the candidate block are different, an inherited affine candidate can be derived based on the motion vector of the candidate block, and the motion vector of the candidate block can be scaled based on the reference image of the current block. Alternatively, the motion vector of the candidate block can be scaled based on the distance between the current block and its reference image, and the distance between the candidate block and its reference image.

[0185] However, if the number of candidate blocks is greater than the maximum number, inherited affine candidates can be derived based on candidate blocks with the same reference image or reference index as the current block. Alternatively, if the number of candidate blocks is greater than the maximum number and no candidate block has the same reference image or reference index as the current block, inherited affine candidates can be derived based on the motion vector of a candidate block in a predetermined scan order, the motion vector of a candidate block with the reference image closest to the current image, or the motion vector of a candidate block with the reference image closest to the current block's reference image, and the motion vector of the candidate block can be scaled based on the reference image of the current block. Alternatively, the motion vector of the candidate block can be scaled based on the distance between the current block and its reference image, and the distance between candidate blocks and their reference images. Here, the current image can refer to the image including the current block, and the predetermined scan order can refer to a predetermined order or an inspection order. Figure 8 It was described in detail. Figure 8In this context, the maximum number can refer to n, and the cases of n=1 and n=2 have been described as examples, but the value of n is not limited to these and can be increased.

[0186] For example, if the inherited affine candidate is a candidate based on an affine inter-frame pattern, the spatial neighboring blocks of the current block can be grouped. Alternatively, the spatial neighboring blocks of the current block can be divided into two or more groups. Inherited affine candidates can be derived based on the groups. Alternatively, inherited affine candidates can be derived for each group individually. Alternatively, inherited affine candidates can be derived for each group based on the candidate blocks within that group. Alternatively, inherited affine candidates can select candidate blocks for each group individually, and inherited affine candidates can be derived based on the selected candidate blocks.

[0187] For example, these groups can include a first group and a second group. The first group can include the bottom-left neighboring block of the current block and the top-left neighboring block of the current block. Additionally, the first group can also include the bottom-left neighboring block of the current block. The second group can include the top-left neighboring block, the top-right neighboring block of the current block, and the top-left neighboring block of the current block. Additionally, the second group can also include the top-right neighboring block of the current block. (See reference) Figure 9 It was described in detail. Figure 9 In this context, group A can refer to the first group, and group B can refer to the second group.

[0188] For example, these groups can include a first group, a second group, and a third group. The first group can include the bottom-left neighboring block of the current block and the top-left neighboring block of the current block; the second group can include the top-right neighboring block of the current block and the top-left neighboring block of the current block; and the third group can include the top-left neighboring block of the current block, the top-right neighboring block of the current block, and the bottom-left neighboring block of the current block. (See reference) Figure 10 It was described in detail. Figure 10 In this context, group A can refer to the first group, group B can refer to the second group, and group C can refer to the third group.

[0189] Here, inherited affine candidates can be derived based on candidate blocks encoded by affine prediction, which are preferentially identified by examining blocks within each group according to a predetermined scan order. Alternatively, if the reference image of the current block is different from the reference image of the candidate block, inherited affine candidates can be derived based on the motion vector of the candidate block, and the motion vector of the candidate block can be scaled based on the reference image of the current block. Alternatively, the motion vector of the candidate block can be scaled based on the distance between the current block and its reference image, and the distance between the candidate block and its reference image. Alternatively, inherited affine candidates can be derived based on candidate blocks in each group that have the same reference image as the reference image of the current block. Figure 9 and Figure 10 It has been described in detail. If there are two or three groups, it has been referenced. Figure 9 and Figure 10 The inspection order between groups and within each group is described, but this is for ease of interpretation and the inspection order applicable to this disclosure is not limited thereto. Furthermore, candidate blocks described above can be used interchangeably with neighboring blocks.

[0190] The decoding device selects one of the candidates included in the motion information candidate list (S1210). Selection information can be used here. The selection information may include information about a candidate selected from the motion information candidate list, and may also include index information about a candidate selected from the motion information candidate list. The selection information may be included in the image information, and the image information including the selection information may be signaled to the decoding device. The decoding device can obtain the selection information by parsing the bitstream of the image information. The bitstream can be transmitted from the encoding device via a network or storage medium.

[0191] The decoding device derives the control point motion vector (CPMV) of the current block based on the selected candidate (S1220). The control point motion vector can refer to the motion vector at the control point. (See reference...) Figure 8 Control points can include a control point (CP0) located at the top-left sample position of the current block and a control point (CP1) located at the top-right sample position of the current block, and can also include a control point (CP2) located at the bottom-left sample position of the current block. (Already referenced...) Figure 5a and Figure 5b It was described in detail.

[0192] The decoding device derives the sub-block unit motion vector or sample unit motion vector of the current block based on the CPMV (S1230). The decoding device can derive an affine motion vector field based on the CPMV. The affine motion vector field can derive the sub-block unit motion vector or sample unit motion vector based on the x and y components of the CPMV. Here, the sub-block unit motion vector can represent the motion vector at the center of the sub-block. The affine motion vector field can be derived according to the number of CPMVs using Equation 1 or Equation 2, but is not limited thereto.

[0193] The decoding device derives a prediction block based on the motion vector of the sub-block unit or the motion vector of the sample unit (S1240). Here, the prediction block can refer to a block that is highly correlated with the current block.

[0194] The decoding device reconstructs the current image based on the prediction block (S1250). Here, information about the residual block can be used. This information can include the residual block itself and information about it. The residual block can be a block derived from the prediction block and the current block. Alternatively, the residual block can be a block derived from the difference between the prediction block and the current block. The decoding device can reconstruct the current image based on the prediction block and the information about the residual block. The information about the residual block can be included in the image information, and the image information including the information about the residual block can be signaled to the decoding device. The decoding device can obtain the information about the residual block by parsing the bitstream of the image information. The bitstream can be sent from the encoding device via a network or storage medium.

[0195] In the above exemplary embodiments, the method is explained based on a flowchart using a series of steps or blocks. However, this disclosure is not limited to the order of the steps, and a step may occur in a different order or in a different manner than described above, or may occur simultaneously with another step. Furthermore, those skilled in the art will understand that the steps shown in the flowchart are not exclusive, and without affecting the scope of this disclosure, another step may be combined with or one or more steps of the flowchart may be removed.

[0196] The method described above according to this disclosure can be implemented in software form, and the encoding and / or decoding apparatus according to this disclosure can be included in an apparatus for image processing, such as a television, computer, smartphone, set-top box, display device, etc.

[0197] When the exemplary embodiments of this disclosure are implemented in software, the methods described above can be implemented as modules (processes, functions, etc.) performing the functions described above. These modules can be stored in memory and can be executed by a processor. The memory can be located inside or outside the processor and can be connected to the processor via various known means. The processor may include application-specific integrated circuits (ASICs), other chipsets, logic circuits, and / or data processing devices. The memory may include read-only memory (ROM), random access memory (RAM), flash memory, memory cards, storage media, and / or other storage devices.

Claims

1. A decoding apparatus for image decoding, the decoding apparatus comprising: Memory; as well as At least one processor, connected to the memory, is configured to: Obtain image information from the bitstream, including motion vector difference (MVD) information and motion vector predictor (MVP) candidate index information; Generate an affine MVP candidate list that includes affine MVP candidates for the current block; Based on the MVP candidate index information, select one of the affine MVP candidates included in the affine MVP candidate list; Based on the selected affine MVP candidate, derive the control point motion vector predictor CPMVP for the corresponding control point CP of the current block; Based on the motion vector difference (MVD) information of the corresponding CP of the current block, the control point motion vector difference (CPMVD) of the current block is derived. Based on the CPMVP and the CPMVD, derive the control point motion vector CPMV of the corresponding CP of the current block; Based on the CPMV of the corresponding CP of the current block, the prediction sample of the current block is derived; and Reconstruct the current image based on the predicted samples. The CP of the current block includes a first CP located at the upper left of the current block, a second CP located at the upper right of the current block, and a third CP located at the lower left of the current block. The affine MVP candidates include first-inheritance affine candidates and second-inheritance affine candidates. The first inherited affine candidate CPMVP is derived from the first candidate block encoded by affine prediction in the first block group consisting of the lower left neighboring block of the current block and the left neighboring block adjacent to the top of the lower left neighboring block. The first candidate block is determined from the first block group based on a predetermined first scanning order, and has the same reference image as the reference image of the current block. Specifically, the CPMVP of the second inherited affine candidate is derived from the second candidate block encoded by affine prediction in the second block group consisting of the top-left neighbor block, the top-right neighbor block, and the top neighbor block adjacent to the left side of the top-right neighbor block. The second candidate block is determined from the second block group based on a predetermined second scanning order and has the same reference image as the reference image of the current block.

2. An encoding apparatus for image encoding, the encoding apparatus comprising: Memory; as well as At least one processor, connected to the memory, is configured to: Generate an affine MVP candidate list for the current block, including affine motion vector predictor MVP candidates; Select one of the affine MVP candidates included in the affine MVP candidate list; Generate MVP candidate index information for the selected affine MVP candidate; Based on the selected affine MVP candidate, derive the control point motion vector predictor CPMVP for the corresponding control point CP of the current block; Export the control point motion vector CPMV of the corresponding CP of the current block; Based on the CPMVP and CPMV of the corresponding CP, the control point motion vector difference CPMVD of the current block is derived; Based on the CPMV of the corresponding CP of the current block, the prediction sample of the current block is derived; Residual samples for the current block are generated based on the predicted samples; and Output a bitstream of image information, which includes the MVP candidate index information, MVD information about the CPMVD, and information about the residual samples. The CP of the current block includes a first CP located at the upper left of the current block, a second CP located at the upper right of the current block, and a third CP located at the lower left of the current block. The affine MVP candidates include first-inheritance affine candidates and second-inheritance affine candidates. The first inherited affine candidate CPMVP is derived from the first candidate block encoded by affine prediction in the first block group consisting of the lower left neighboring block of the current block and the left neighboring block adjacent to the top of the lower left neighboring block. The first candidate block is determined from the first block group based on a predetermined first scanning order, and has the same reference image as the reference image of the current block. Specifically, the CPMVP of the second inherited affine candidate is derived from the second candidate block encoded by affine prediction in the second block group consisting of the top-left neighbor block, the top-right neighbor block, and the top neighbor block adjacent to the left side of the top-right neighbor block. The second candidate block is determined from the second block group based on a predetermined second scanning order and has the same reference image as the reference image of the current block.

3. An apparatus for transmitting data for an image, the apparatus comprising: At least one processor, configured to obtain a bitstream for the image, wherein the bitstream is generated based on the following steps: generating an affine MVP candidate list including affine motion vector predictor (MVP) candidates for the current block; selecting one of the affine MVP candidates included in the affine MVP candidate list; generating MVP candidate index information for the selected affine MVP candidate; deriving a control point motion vector predictor (CPMVP) for the corresponding control point (CP) of the current block based on the selected affine MVP candidate; deriving a control point motion vector (CPMV) for the corresponding CP of the current block; deriving a control point motion vector difference (CPMVD) for the corresponding CP of the current block based on the CPMVP and the CPMV; deriving a prediction sample for the current block based on the CPMV of the corresponding CP of the current block; generating a residual sample for the current block based on the prediction sample; and outputting the bitstream of image information, the image information including the MVP candidate index information, MVD information regarding the CPMVD, and information regarding the residual sample; and A transmitter configured to transmit the data comprising the bit stream. The CP of the current block includes a first CP located at the upper left of the current block, a second CP located at the upper right of the current block, and a third CP located at the lower left of the current block. The affine MVP candidates include first-inheritance affine candidates and second-inheritance affine candidates. The first inherited affine candidate CPMVP is derived from the first candidate block encoded by affine prediction in the first block group consisting of the lower left neighboring block of the current block and the left neighboring block adjacent to the top of the lower left neighboring block. The first candidate block is determined from the first block group based on a predetermined first scanning order, and has the same reference image as the reference image of the current block. Specifically, the CPMVP of the second inherited affine candidate is derived from the second candidate block encoded by affine prediction in the second block group consisting of the top-left neighbor block, the top-right neighbor block, and the top neighbor block adjacent to the left side of the top-right neighbor block. The second candidate block is determined from the second block group based on a predetermined second scanning order and has the same reference image as the reference image of the current block.