Syntax design method and apparatus for coding using syntax

The syntax design method for image coding using high-level and low-level syntax elements for motion prediction on subblocks and affine models addresses the need for efficient compression of high-resolution and immersive media, improving coding efficiency and reducing costs.

JP2026110645APending Publication Date: 2026-07-02LG ELECTRONICS INC

Patent Information

Authority / Receiving Office
JP · JP
Patent Type
Applications
Current Assignee / Owner
LG ELECTRONICS INC
Filing Date
2026-04-16
Publication Date
2026-07-02

AI Technical Summary

Technical Problem

The increasing demand for high-resolution and high-quality images/videos, as well as immersive media, necessitates a highly efficient image/video compression technology to effectively compress, transmit, and store this data while reducing costs.

Method used

A syntax design method and apparatus for image coding that utilizes high-level and low-level syntax elements for motion prediction based on subblocks and affine models, including the determination of a merge mode flag using affine and subblock TMVP flags.

Benefits of technology

Improves image coding efficiency by enhancing motion prediction techniques, leading to more effective compression and reduced transmission and storage costs.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure 2026110645000001_ABST
    Figure 2026110645000001_ABST
Patent Text Reader

Abstract

To provide an image decoding method performed by a decoding device. [Solution] The method includes the steps of: decoding an affine flag indicating whether or not an affine prediction can be applied to the current block and a subblock TMVP flag indicating whether or not a time motion vector predictor based on the subblocks of the current block can be used, based on a bitstream; determining whether or not to decode a predetermined merge mode flag indicating whether or not to apply a predetermined merge mode to the current block, based on the decoded affine flag and the decoded subblock TMVP flag; deriving a prediction sample for the current block based on the decision on whether or not to decode the predetermined merge mode flag; and generating a recovery sample for the current block based on the prediction sample for the current block.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] The present disclosure relates to image coding technology, and more particularly, to a syntax design method and an apparatus for performing coding using syntax in an image coding system.

Background Art

[0002] In recent years, the demand for high-resolution and high-quality images / videos such as 4K or UHD (Ultra High Definition) images / videos of 8K or higher has been increasing in various fields. As the image / video data becomes higher in resolution and quality, the amount of information or bits transmitted relatively increases compared to the existing image / video data. Therefore, when transmitting image data using a medium such as an existing wired or wireless broadband line or storing image / video data using an existing storage medium, the transmission cost and storage cost increase.

[0003] In addition, in recent years, the interest and demand for immersive media such as VR (Virtual Reality), AR (Artificial Reality) contents, and holograms have been increasing, and the broadcast of images / videos having image characteristics different from real images, such as game images, has been increasing.

[0004] Thus, there is a need for a highly efficient image / video compression technology to effectively compress, transmit, store, and reproduce the information of high-resolution and high-quality images / videos having various characteristics as described above.

Summary of the Invention

Problems to be Solved by the Invention

[0005] The technical problem of the present disclosure is to provide a method and an apparatus for increasing image coding efficiency.

[0006] Another technical issue addressed by this disclosure is to provide a syntax design method and an apparatus for coding using the syntax.

[0007] A further technical issue addressed by this disclosure is to provide high-level syntax and low-level syntax design methods and apparatus for coding using syntax.

[0008] A further technical issue addressed in this disclosure is to provide a method and apparatus for using high-level and / or low-level syntax elements to perform motion prediction based on subblocks.

[0009] Another technical issue addressed in this disclosure is to provide methods and apparatus for using high-level and / or low-level syntax elements to perform motion prediction based on an affine model.

[0010] A further technical issue of this disclosure is to provide a method and apparatus for determining whether to decode a predetermined merge mode flag, which indicates whether a predetermined merge mode should be applied to the current block, based on an affine flag and a subblock TMVP flag. [Means for solving the problem]

[0011] According to one embodiment of the present disclosure, an image decoding method performed by a decoding device is provided. The method includes the steps of: decoding an affine flag indicating whether an affine prediction can be applied to the current block and a subblock TMVP flag indicating whether a temporal motion vector predictor based on a subblock of the current block can be used, based on a bitstream; determining whether to decode a predetermined merge mode flag indicating whether a predetermined merge mode should be applied to the current block, based on the decoded affine flag and the decoded subblock TMVP flag; deriving a prediction sample for the current block based on the decision to decode the predetermined merge mode flag; and generating a reconstructed sample for the current block based on the prediction sample for the current block, wherein it is determined to decode the predetermined merge mode flag if the value of the affine flag is 1 or the value of the subblock TMVP flag is 1.

[0012] Another embodiment of the present disclosure provides a decoding device for performing image decoding. The decoding device comprises an entropy decoding unit that decodes an affine flag indicating whether an affine prediction can be applied to the current block and a subblock TMVP flag indicating whether a temporal motion vector predictor based on a subblock of the current block can be used, based on a bitstream, and determines whether to decode a predetermined merge mode flag indicating whether a predetermined merge mode should be applied to the current block based on the decoded affine flag and the decoded subblock TMVP flag; a prediction unit that derives a prediction sample for the current block based on the decision on whether to decode the predetermined merge mode flag; and an addition unit that generates a restored sample for the current block based on the prediction sample for the current block, wherein it is determined to decode the predetermined merge mode flag if the value of the affine flag is 1 or the value of the subblock TMVP flag is 1.

[0013] A further embodiment of the present disclosure provides an image encoding method performed by an encoding device. The method includes the steps of: determining whether an affine prediction can be applied to a current block and whether a time motion vector predictor based on subblocks of the current block can be used; determining whether to encode a predetermined merge mode flag indicating whether a predetermined merge mode can be applied to the current block, based on the determinations regarding whether the affine prediction can be applied to the current block and whether the time motion vector predictor based on subblocks of the current block can be used; and encoding an affine flag indicating whether the affine prediction can be applied to the current block, a subblock TMVP flag indicating whether the time motion vector predictor based on subblocks of the current block can be used, and the predetermined merge mode flag, wherein it is determined to encode the predetermined merge mode flag if the value of the affine flag is 1 or the value of the subblock TMVP flag is 1.

[0014] According to yet another embodiment of the present disclosure, an encoding device for performing image encoding is provided. The encoding device comprises a prediction unit that determines whether an affine prediction can be applied to a current block and whether a temporal motion vector predictor based on a subblock of the current block can be used, and determines whether to encode a predetermined merge mode flag indicating whether a predetermined merge mode should be applied to the current block based on the determination of whether the affine prediction can be applied to the current block and whether the temporal motion vector predictor based on the subblock of the current block can be used, and an entropy encoding unit that encodes an affine flag indicating whether the affine prediction can be applied to the current block, a subblock TMVP flag indicating whether the temporal motion vector predictor based on the subblock of the current block can be used, and the predetermined merge mode flag, wherein it is determined to encode the predetermined merge mode flag if the value of the affine flag is 1 or the value of the subblock TMVP flag is 1.

[0015] According to yet another embodiment of the present disclosure, a decoder-readable storage medium is provided that stores information relating to instructions that cause a video decoding device to perform a decoding method according to some embodiment, etc.

[0016] According to yet another embodiment of the present disclosure, a decoder-readable storage medium is provided which stores information relating to instructions that cause a video decoding device to perform a decoding method according to one embodiment. The decoding method according to one embodiment includes the steps of: decoding an affine flag indicating whether an affine prediction can be applied to the current block and a subblock TMVP flag indicating whether a temporal motion vector predictor based on a subblock of the current block can be used, based on a bitstream; determining whether to decode a predetermined merge mode flag indicating whether a predetermined merge mode can be applied to the current block based on the decoded affine flag and the decoded subblock TMVP flag; deriving a prediction sample for the current block based on the decision to decode the predetermined merge mode flag; and generating a restored sample for the current block based on the prediction sample for the current block, wherein it is determined to decode the predetermined merge mode flag if the value of the affine flag is 1 or the value of the subblock TMVP flag is 1. [Effects of the Invention]

[0017] According to this disclosure, overall image / video compression efficiency can be improved.

[0018] According to this disclosure, image coding efficiency can be increased through high-level syntax and low-level syntax designs.

[0019] According to this disclosure, image coding efficiency can be improved by using high-level and / or low-level syntax elements for motion prediction based on subblocks.

[0020] According to this disclosure, image coding efficiency can be improved by using high-level and / or low-level syntax elements for motion prediction based on an affine model.

[0021] According to the present disclosure, by determining whether to decode a determined merge mode flag indicating whether to apply a predetermined merge mode to a current block based on an affinity flag and a sub-block TMVP flag, the image coding efficiency can be improved.

Brief Description of Drawings

[0022] [Figure 1] An example of a video / image coding system to which the present disclosure can be applied is schematically shown. [Figure 2] It is a diagram schematically explaining the configuration of a video / image encoding device to which the present disclosure can be applied. [Figure 3] It is a diagram schematically explaining the configuration of a video / image decoding device to which the present disclosure can be applied. [Figure 4] It is a flowchart showing the operation of an encoding device according to an embodiment. [Figure 5] It is a block diagram showing the configuration of an encoding device according to an embodiment. [Figure 6] It is a flowchart showing the operation of a decoding device according to an embodiment. [Figure 7] It is a block diagram showing the configuration of a decoding device according to an embodiment. [Figure 8] An example of a content streaming system to which the disclosure of this document can be applied is shown.

Embodiments for Carrying Out the Invention

[0023] According to one embodiment of the present disclosure, an image decoding method performed by a decoding device is provided. The method includes the steps of: decoding an affine flag indicating whether an affine prediction can be applied to the current block and a subblock TMVP flag indicating whether a temporal motion vector predictor based on a subblock of the current block can be used, based on a bitstream; determining whether to decode a predetermined merge mode flag indicating whether a predetermined merge mode should be applied to the current block, based on the decoded affine flag and the decoded subblock TMVP flag; deriving a prediction sample for the current block based on the decision to decode the predetermined merge mode flag; and generating a reconstructed sample for the current block based on the prediction sample for the current block, wherein it is determined to decode the predetermined merge mode flag if the value of the affine flag is 1 or the value of the subblock TMVP flag is 1.

[0024] This disclosure can be modified in various ways and may have various embodiments; specific embodiments are illustrated in the drawings and described in detail. However, this is not intended to limit this disclosure to any specific embodiment. Terms used herein are used solely to describe specific embodiments and are not intended to limit the technical ideas of this disclosure. Singular expressions include plural expressions unless the context clearly indicates otherwise. Terms such as “includes” or “has” herein are intended to specify the existence of features, figures, steps, actions, components, parts, or combinations thereof described in the specification, and should be understood not to preemptively exclude the existence or possibility of adding one or more other features, figures, steps, actions, components, parts, or combinations thereof.

[0025] On the other hand, each configuration shown in the drawings described in this disclosure is illustrated independently for the purpose of explaining its distinct characteristic functions, etc., and does not mean that each configuration is implemented with separate hardware or separate software. For example, two or more of the configurations can be combined to form one configuration, and one configuration can be divided into multiple configurations. Embodiments in which each configuration is integrated and / or separated are also included in the scope of the rights of this disclosure, as long as they do not deviate from the essence of this disclosure.

[0026] Hereinafter, preferred embodiments of the present disclosure will be described in more detail with reference to the attached drawings. Hereafter, the same reference numerals will be used for the same components in the drawings, and redundant descriptions of the same components will be omitted.

[0027] Figure 1 schematically illustrates an example of a video / image coding system to which this disclosure may apply.

[0028] As shown in Figure 1, a video / image coding system may comprise a first device (source device) and a second device (receiving device). The source device can transmit encoded video / image information or data to the receiving device in file or streaming form via a digital storage medium or network.

[0029] The source device may comprise a video source, an encoding device, and a transmitter. The receiving device may comprise a receiver, a decoding device, and a renderer. The encoding device may be called a video / image encoding device, and the decoding device may be called a video / image decoding device. The transmitter may be provided in the encoding device. The receiver may be provided in the decoding device. The renderer may comprise a display unit, which may consist of a separate device or external component.

[0030] A video source can acquire video / images through processes such as video / image capture, synthesis, or generation. A video source may include video / image capture devices and / or video / image generation devices. Video / image capture devices may include, for example, one or more cameras, or a video / image archive containing previously captured video / images. Video / image generation devices may include, for example, computers, tablets, and smartphones, and can generate video / images (electronically). For example, virtual video / images may be generated via a computer, in which case the video / image capture process may be replaced by the process of generating the associated data.

[0031] An encoding device can encode input video / images. For compression and coding efficiency, the encoding device can perform a series of steps including prediction, transformation, and quantization. The encoded data (encoded video / image information) can be output in bitstream format.

[0032] The transmitting unit can transmit encoded video / image information or data output in bitstream format to the receiving unit of a receiving device via a digital storage medium or network in file or streaming format. The digital storage medium can include various storage media such as USB, SD, CD, DVD, Blu-ray, HDD, SSD, etc. The transmitting unit may include elements for generating media files via a predetermined file format and may include elements for transmission via a broadcast / communication network. The receiving unit can receive / extract the bitstream and transmit it to a decoding device.

[0033] A decoding device can decode video / images by performing a series of steps, such as inverse quantization, inverse transformation, and prediction, corresponding to the operation of an encoding device.

[0034] The renderer can render the decoded video / image. The rendered video / image can be displayed via the display unit.

[0035] This document relates to video / image coding. For example, the methods / embodiments disclosed in this document can be applied to methods disclosed in the VVC (versatile video coding) standard, the EVC (essential video coding) standard, the AV1 (AOMedia Video 1) standard, the AVS2 (2nd generation of audio video coding standard), or next-generation video / image coding standards (e.g., H.267 or H.268).

[0036] This document presents various embodiments of video / image coding, and unless otherwise noted, these embodiments may be combined with each other.

[0037] In this document, "video" can mean a collection of images or other data over time. "Picture" generally refers to a single image representing a specific time period, while "slice" or "tile" is a unit that constitutes part of a picture in coding. A slice or tile can contain one or more CTUs (coding tree units). A single picture can consist of one or more slices or tiles. A single picture can consist of one or more tile groups. A tile group can contain one or more tiles. A brick can represent a rectangular region of CTU rows within a tile in a picture. A tile can be partitioned into multiple bricks, each of which consists of one or more CTU rows within the tile. A tile that is not partitioned into multiple bricks may also be referred to as a brick.A brick scan can represent a specific sequential ordering of CTUs partitioning a picture in which the CTUs are ordered consecutively in a CTU raster scan in a brick, bricks within a tile are ordered consecutively in a raster scan of the bricks of the tile, and tiles in a picture are ordered consecutively in a raster scan of the tiles of the picture. A tile is a rectangular region of CTUs within a particular tile column and a particular tile row in a picture. The tile column is a rectangular region of CTUs having a height equal to the height of the picture and a width specified by syntax elements in the picture parameter set.The tile row is a rectangular region of CTUs having a height specified by syntax elements in the picture parameter set and a width equal to the width of the picture. A tile scan can represent a specific sequential ordering of CTUs partitioning a picture in which the CTUs are ordered consecutively in CTU raster scan in a tile, whereas tiles in a picture are ordered consecutively in a raster scan of the tiles of the picture. A slice may contain an integer number of bricks of a picture, and these integer number of bricks may be contained in a single NAL unit.A slice may consist of either a number of complete tiles or only a consecutive sequence of complete bricks of one tile. In this document, the terms tile group and slice may be used interchangeably. For example, in this document, a tile group / tile group header may be referred to as a slice / slice header.

[0038] A pixel or pel can refer to the smallest unit that makes up a picture (or image). Alternatively, the term "sample" may be used as a counterpart to pixel. A sample can generally represent a pixel or a pixel value, and can represent only the luma component pixel / pixel value, or only the chroma component pixel / pixel value.

[0039] A unit can represent a basic unit of image processing. A unit can contain at least one of a specific region of a picture and information associated with that region. A unit can contain one luma block and two chroma (e.g., cb, cr) blocks. The term unit may sometimes be used interchangeably with terms such as block or area. In general, an M×N block can contain a sample (or sample array) or a set (or array) of transform coefficients consisting of M columns and N rows.

[0040] In this document, the terms " / " and "," should be interpreted as "and / or." For example, "A / B" is interpreted as "A and / or B," and "A, B" is interpreted as "A and / or B." Additionally, "A / B / C" means "at least one of A, B, and / or C." Similarly, "A, B, C" also means "at least one of A, B, and / or C."

[0041] In addition, in this document, "or" should be interpreted as "and / or." For example, "A or B" can mean 1) only "A," 2) only "B," or 3) both "A and B." In other words, "or" in this document can mean "additionally or alternatively."

[0042] Figure 2 is a schematic diagram illustrating the configuration of a video / image encoding device to which this disclosure may apply. Hereinafter, the term "video encoding device" may include an image encoding device.

[0043] As shown in Figure 2, the encoding device 200 can be configured to include an image partitioner 210, a predictor 220, a residual processor 230, an entropy encoder 240, an adder 250, a filter 260, and a memory 270. The predictor 220 may include an inter-prediction unit 221 and an intra-prediction unit 222. The residual processor 230 may include a transformer 232, a quantizer 233, a dequantizer 234, and an inverse transformer 235. The residual processor 230 may further include a subtractor 231. The adder 250 may be called a rebuilder or a reconstructed block generator. The image segmentation unit 210, prediction unit 220, residual processing unit 230, entropy encoding unit 240, addition unit 250, and filtering unit 260 described above can be configured by one or more hardware components (e.g., an encoder chipset or processor) depending on the embodiment. The memory 270 may also include a DPB (decoded picture buffer) and may be configured by a digital storage medium. The hardware components may further include the memory 270 as an internal / external component.

[0044] The image splitting unit 210 can split an input image (or picture, frame) input to the encoding device 200 into one or more processing units. For example, the processing units may be called coding units (CUs). In this case, the coding units can be recursively split from a coding tree unit (CTU) or the largest coding unit (LCU) using a QTBTTT (Quad-tree binary-tree ternary-tree) structure. For example, one coding unit can be split into multiple coding units of deeper depth based on a quad-tree structure, a binary-tree structure, and / or a ternary structure. In this case, for example, the quad-tree structure may be applied first, followed by the binary-tree structure and / or the ternary structure. Alternatively, the binary-tree structure may be applied first. The coding procedure according to this disclosure may be performed based on the final coding unit that is not further split. In this case, based on coding efficiency due to image characteristics, the largest coding unit can be used as the final coding unit, or, if necessary, the coding unit can be recursively divided into lower-depth coding units so that the optimally sized coding unit is used as the final coding unit. Here, the coding procedure may include procedures such as prediction, transformation, and restoration, which will be described later. As another example, the processing unit may further comprise a prediction unit (PU) or a transformation unit (TU). In this case, the prediction unit and the transformation unit can each be separated or partitioned from the final coding unit described above.The prediction unit may be a unit of sample prediction, and the conversion unit may be a unit for deriving conversion coefficients and / or a unit for deriving a residual signal from conversion coefficients.

[0045] The term "unit" can sometimes be used interchangeably with terms such as "block" or "area." Generally, an M×N block can represent a set of samples or transform coefficients consisting of M columns and N rows. A sample can generally represent a pixel or a pixel value, and may represent only the luminance (luma) component pixel / pixel value, or only the chroma component pixel / pixel value. A sample can be used as the term corresponding to a single picture (or image) pixel or pel.

[0046] The encoding device 200 can generate a residual signal (residual block, residual sample array) by subtracting the prediction signal (predicted block, predicted sample array) output from the inter-prediction unit 221 or intra-prediction unit 222 from the input image signal (original block, original sample array), and the generated residual signal is transmitted to the conversion unit 232. In this case, as shown in the figure, the unit that subtracts the prediction signal (predicted block, predicted sample array) from the input image signal (original block, original sample array) within the encoder 200 can be called the subtraction unit 231. The prediction unit can make predictions for the block to be processed (hereinafter referred to as the current block) and generate a predicted block that includes the predicted sample for the current block. The prediction unit can determine whether intra-prediction or inter-prediction is applied on a current block or CU basis. The prediction unit can generate various prediction-related information, such as prediction mode information, and transmit it to the entropy encoding unit 240, as will be described later in the explanation of each prediction mode. The prediction information can be encoded by the entropy encoding unit 240 and output in bitstream format.

[0047] The intra-prediction unit 222 can predict the current block by referring to a sample in the current picture. The referenced sample can be located in the vicinity (neighbor) of the current block or at a distance, depending on the prediction mode. The prediction mode in intra-prediction can include multiple non-directional modes and multiple directional modes. Non-directional modes can include, for example, DC mode and planar mode. Directional modes can include, for example, 33 directional prediction modes or 65 directional prediction modes, depending on the degree of fineness of the prediction direction. However, this is illustrative, and more or fewer directional prediction modes may be used depending on the settings. The intra-prediction unit 222 can also determine the prediction mode to apply to the current block using the prediction modes applied to the surrounding blocks.

[0048] The interprediction unit 221 can derive a predicted block relative to the current block based on a reference block (reference sample array) identified by motion vectors on the reference picture. In this case, in order to reduce the amount of motion information transmitted in interprediction mode, motion information can be predicted in units of blocks, subblocks, or samples based on the correlation of motion information between the surrounding block and the current block. The motion information may include motion vectors and reference picture indices. The motion information may further include interprediction direction information (L0 prediction, L1 prediction, Bi prediction, etc.). In the case of interprediction, surrounding blocks may include spatial neighboring blocks present in the current picture and temporal neighboring blocks present in the reference picture. The reference picture containing the reference block and the reference picture containing the temporal neighboring block may be the same or different. The temporal neighboring block may be called a collocated reference block, col CU, etc., and the reference picture containing the temporal neighboring block may be called a collocated picture (colPic). For example, the interpretation unit 221 can construct a motion information candidate list based on surrounding blocks and generate information indicating which candidate is used to derive the motion vector and / or reference picture index of the current block. Interpretation can be performed based on various prediction modes; for example, in skip mode and merge mode, the interpretation unit 221 can use the motion information of surrounding blocks as the motion information of the current block. In skip mode, unlike merge mode, a residual signal may not be transmitted.In motion vector prediction (MVP) mode, the motion vector of the surrounding blocks is used as a motion vector predictor, and the motion vector difference is signaled to indicate the motion vector of the current block.

[0049] The prediction unit 220 can generate prediction signals based on various prediction methods described later. For example, the prediction unit can apply intra-prediction or inter-prediction for predictions on a single block, and can also apply intra-prediction and inter-prediction simultaneously. This can be called combined inter and intra prediction (CIIP). The prediction unit can also base its predictions on an intra-block copy (IBC) prediction mode or a palette mode for predictions on a block. The IBC prediction mode or palette mode can be used for content image / video coding, such as in SCC (screen content coding), for example. IBC basically performs predictions within the current picture, but can be performed similarly to inter-prediction in that it derives reference blocks within the current picture. That is, IBC can use at least one of the inter-prediction techniques described in this document. Palette mode can be considered an example of intra-coding or intra-prediction. When palette mode is applied, sample values ​​within the picture can be signaled based on information about the palette table and palette index.

[0050] The prediction signal generated via the prediction unit (comprising the inter-prediction unit 221 and / or the intra-prediction unit 222) can be used to generate a reconstructed signal or a residual signal. The transformation unit 232 can generate transformation coefficients by applying a transformation technique to the residual signal. For example, the transformation technique may include at least one of the following: DCT (Discrete Cosine Transform), DST (Discrete Sine Transform), KLT (Karhunen-Loeve Transform), GBT (Graph-Based Transform), or CNT (Conditionally Non-linear Transform). Here, GBT refers to a transformation obtained from a graph when attempting to represent the relationship information between pixels in a graph. CNT refers to a transformation obtained by generating a prediction signal using all previously reconstructed pixels and obtaining a transformation based on it. The transformation process can be applied to pixel blocks of the same size that are square, or to non-square blocks of variable size.

[0051] The quantization unit 233 quantizes the conversion coefficients and transmits them to the entropy encoding unit 240, which can encode the quantized signal (information about the quantized conversion coefficients) and output it as a bitstream. The information about the quantized conversion coefficients can be called residual information. The quantization unit 233 can rearrange the block-form quantized conversion coefficients into a one-dimensional vector form based on the coefficient scan order, and can also generate information about the quantized conversion coefficients based on the one-dimensional vector form of the quantized conversion coefficients. The entropy encoding unit 240 can perform various encoding methods, such as exponential Golomb, CAVLC (context-adaptive variable length coding), and CABAC (context-adaptive binary arithmetic coding). In addition to the quantized conversion coefficients, the entropy encoding unit 240 can also encode information necessary for video / image restoration (e.g., the values ​​of syntax elements) together with or separately from the quantized conversion coefficients. Encoded information (e.g., encoded video / image information) can be transmitted or stored in bitstream form in units of network abstraction layer (NAL). The video / image information may further include information about various parameter sets, such as adaptation parameter sets (APS), picture parameter sets (PPS), sequence parameter sets (SPS), or video parameter sets (VPS). The video / image information may also further include general constraint information. In this document, information and / or syntax elements transmitted / signaled from the encoding device to the decoding device may be included in the video / image information. The video / image information may be encoded via the encoding procedure described above and included in the bitstream.The bitstream can be transmitted over a network or stored in a digital storage medium. Here, the network may include broadcast networks and / or communication networks, and the digital storage medium may include various storage media such as USB, SD, CD, DVD, Blu-ray, HDD, SSD, etc. The signal output from the entropy encoding unit 240 can be transmitted by a transmitting unit (not shown) and / or stored by a storage unit (not shown) which are configured as internal / external elements of the encoding device 200, or the transmitting unit may be included in the entropy encoding unit 240.

[0052] The quantized conversion coefficients output from the quantization unit 233 can be used to generate a prediction signal. For example, a residual signal (residual block or residual sample) can be reconstructed by applying inverse quantization and inverse transformation to the quantized conversion coefficients via the inverse quantization unit 234 and the inverse transformation unit 235. The adder 155 can generate a reconstructed signal (reconstructed picture, reconstructed block, reconstructed sample array) by adding the reconstructed residual signal to the prediction signal output from the inter-prediction unit 221 or the intra-prediction unit 222. If there is no residual for the block to be processed, such as when skip mode is applied, the predicted block can be used as the reconstructed block. The adder 250 can be called the reconstruction unit or reconstructed block generation unit. The generated reconstructed signal can be used for intra-prediction of the next block to be processed in the current picture, or, as described later, for inter-prediction of the next picture after filtering.

[0053] On the other hand, LMCS (luma mapping with chroma scaling) can also be applied during the picture encoding and / or restoration process.

[0054] The filtering unit 260 can improve subjective / objective image quality by applying filtering to the restored signal. For example, the filtering unit 260 can apply various filtering methods to the restored picture to generate a modified restored picture, and store the modified restored picture in the memory 270, specifically in the DPB of the memory 270. The various filtering methods can include, for example, deblocking filtering, sample adaptive offset, adaptive loop filter, and bilateral filter. The filtering unit 260 can generate various filtering-related information and transmit it to the entropy encoding unit 240, as will be described later in the description of each filtering method. The filtering-related information can be encoded by the entropy encoding unit 240 and output in bitstream form.

[0055] The corrected restored picture sent to memory 270 can be used as a reference picture in the interpretation unit 221. When interpretation is applied via this, the encoding device can avoid prediction mismatches between the encoding device 100 and the decoding device, and can also improve encoding efficiency.

[0056] The DPB in memory 270 can store the corrected restored picture for use as a reference picture in the inter-prediction unit 221. Memory 270 can store motion information of blocks from which motion information in the current picture has been derived (or encoded) and / or motion information of blocks in the picture that have already been restored. The stored motion information can be transmitted to the inter-prediction unit 221 for use as motion information of spatially surrounding blocks or motion information of temporally surrounding blocks. Memory 270 can store restored samples of restored blocks in the current picture and transmit them to the intra-prediction unit 222.

[0057] Figure 3 is a schematic diagram illustrating the configuration of a video / image decoding device to which this disclosure may apply.

[0058] As shown in Figure 3, the decoding device 300 can be configured to include an entropy decoder 310, a residual processor 320, a predictor 330, an adder 340, a filter 350, and a memory 360. The predictor 330 may include an inter-prediction unit 331 and an intra-prediction unit 332. The residual processor 320 may include a dequantizer 321 and an inverse transformer 321. The entropy decoder 310, residual processor 320, predictor 330, adder 340, and filtering unit 350 described above can be configured by a single hardware component (e.g., a decoder chipset or processor) depending on the embodiment. The memory 360 may include a decoded picture buffer (DPB) and may also be configured by a digital storage medium. The aforementioned hardware component may also further include memory 360 as an internal / external component.

[0059] When a bitstream containing video / image information is input, the decoding device 300 can reconstruct the image corresponding to the process by which the video / image information was processed in the encoding device shown in Figure 3. For example, the decoding device 300 can derive units / blocks based on block division-related information obtained from the bitstream. The decoding device 300 can perform decoding using the processing units applied in the encoding device. Therefore, the decoding processing unit can be, for example, a coding unit, which can be divided from a coding tree unit or a maximum coding unit according to a quad-tree structure, a binary tree structure, and / or a terminally tree structure. One or more conversion units can be derived from the coding unit. The reconstructed image signal decoded and output via the decoding device 300 can then be reproduced via a playback device.

[0060] The decoding device 300 can receive the signal output from the encoding device shown in Figure 3 in bitstream form, and the received signal can be decoded via the entropy decoding unit 310. For example, the entropy decoding unit 310 can parse the bitstream to derive information necessary for image restoration (or picture restoration) (e.g., video / image information). The video / image information may further include information about various parameter sets, such as the adaptation parameter set (APS), picture parameter set (PPS), sequence parameter set (SPS), or video parameter set (VPS). The video / image information may also further include general constraint information. The decoding device can further decode the picture based on the parameter set information and / or the general constraint information. The signaling / received information and / or syntax elements described later in this document can be decoded via the decoding procedure and obtained from the bitstream. For example, the entropy decoding unit 310 can decode information in the bitstream based on a coding method such as exponential Golomb coding, CAVLC, or CABAC, and output the values ​​of syntax elements necessary for image reconstruction and the quantized values ​​of conversion coefficients related to the residual. More specifically, the CABAC entropy decoding method receives bins corresponding to each syntax element in the bitstream, determines a context model using the syntax element information to be decoded and the decoded information of the surrounding and decoded blocks or the symbol / bin information decoded in a previous step, predicts the probability of bin occurrence based on the determined context model, performs arithmetic decoding of the bins, and generates symbols corresponding to the values ​​of each syntax element. At this time, after determining the context model, the CABAC entropy decoding method can update the context model using the decoded symbol / bin information for the context model of the next symbol / bin.Of the information decoded by the entropy decoding unit 310, information related to prediction is provided to the prediction unit (inter-prediction unit 332 and intra-prediction unit 331), and the residual values ​​that have been entropy decoded by the entropy decoding unit 310, i.e., quantized conversion coefficients and related parameter information, can be input to the residual processing unit 320. The residual processing unit 320 can derive residual signals (residual blocks, residual samples, residual sample arrays). In addition, of the information decoded by the entropy decoding unit 310, information related to filtering can be provided to the filtering unit 350. On the other hand, a receiving unit (not shown) that receives signals output from the encoding device can be further configured as an internal / external element of the decoding device 300, or the receiving unit can be a component of the entropy decoding unit 310. On the other hand, the decoding device relating to this document may be called a video / image / picture decoding device, and the decoding device may also be divided into an information decoder (video / image / picture information decoder) and a sample decoder (video / image / picture sample decoder). The information decoder may include the entropy decoding unit 310, and the sample decoder may include at least one of the inverse quantization unit 321, inverse transformation unit 322, addition unit 340, filtering unit 350, memory 360, inter-prediction unit 332, and intra-prediction unit 331.

[0061] The inverse quantization unit 321 can inverse quantize the quantized transformation coefficients and output the transformation coefficients. The inverse quantization unit 321 can rearrange the quantized transformation coefficients in a two-dimensional block form. In this case, the rearrangement can be performed based on the coefficient scan order performed by the encoding device. The inverse quantization unit 321 can perform inverse quantization on the quantized transformation coefficients using quantization parameters (e.g., quantization step size information) and obtain the transformation coefficients.

[0062] In the inverse conversion unit 322, the conversion coefficients are inversely converted to obtain a residual signal (residual block, residual sample array).

[0063] The prediction unit can make predictions for the current block and generate a predicted block containing prediction samples for the current block. Based on the prediction information output from the entropy decoding unit 310, the prediction unit can determine whether intra-prediction or inter-prediction is applied to the current block, and can determine a specific intra / inter-prediction mode.

[0064] The prediction unit 320 can generate prediction signals based on various prediction methods described later. For example, the prediction unit can apply intra-prediction or inter-prediction for prediction of a single block, and can also apply intra-prediction and inter-prediction simultaneously. This can be called combined inter and intra prediction (CIIP). The prediction unit can also be based on intra-block copy (IBC) prediction mode or palette mode for prediction of a block. The IBC prediction mode or palette mode can be used for content image / video coding such as in games, for example, as in SCC (screen content coding). IBC basically performs prediction within the current picture, but can be done similarly to inter-prediction in that it derives a reference block within the current picture. That is, IBC can utilize at least one of the inter-prediction techniques described in this document. Palette mode can be considered an example of intra-coding or intra-prediction. When palette mode is applied, information about the palette table and palette index can be included in the video / image information and signaled.

[0065] The intra-prediction unit 331 can predict the current block by referring to a sample in the current picture. The referenced sample can be located in the vicinity (neighbor) of the current block or at a distance from it, depending on the prediction mode. In intra-prediction, the prediction mode can include multiple non-directional modes and multiple directional modes. The intra-prediction unit 331 can also determine the prediction mode to be applied to the current block using the prediction modes applied to the surrounding blocks.

[0066] The interprediction unit 332 can derive a predicted block for the current block based on a reference block (reference sample array) identified by motion vectors on the reference picture. In this case, in order to reduce the amount of motion information transmitted in interprediction mode, motion information can be predicted in blocks, subblocks, or samples based on the correlation of motion information between neighboring blocks and the current block. The motion information may include motion vectors and reference picture indices. The motion information may further include interprediction direction information (L0 prediction, L1 prediction, Bi prediction, etc.). In the case of interprediction, neighboring blocks may include spatial neighboring blocks that exist in the current picture and temporal neighboring blocks that exist in the reference picture. For example, the interprediction unit 332 can construct a motion information candidate list based on neighboring blocks and derive the motion vector and / or reference picture index of the current block based on the received candidate selection information. Interprediction can be performed based on various prediction modes, and the prediction information may include information indicating the mode of interprediction for the current block.

[0067] The adder 340 can generate a restored signal (restored picture, restored block, restored sample array) by adding the acquired residual signal to the predicted signal (predicted block, predicted sample array) output from the prediction unit (which comprises an inter-prediction unit 332 and / or an intra-prediction unit 331). If there is no residual for the block to be processed, such as when skip mode is applied, the predicted block can be used as the restored block.

[0068] The summing unit 340 may be called the restoration unit or restoration block generation unit. The generated restoration signal can be used for intra-prediction of the next block to be processed in the current picture, and can be output after filtering as described later, or it can be used for intra-prediction of the next picture.

[0069] On the other hand, LMCS (luma mapping with chroma scaling) can also be applied during the picture decoding process.

[0070] The filtering unit 350 can apply filtering to the restored signal to improve subjective / objective image quality. For example, the filtering unit 350 can apply various filtering methods to the restored picture to generate a modified restored picture, and can transmit the modified restored picture to the memory 360, specifically to the DPB of the memory 360. The various filtering methods may include, for example, deblocking filtering, sample adaptive offset, adaptive loop filter, and bilateral filter.

[0071] The (modified) restored picture stored in the DPB of memory 360 can be used as a reference picture by the inter-prediction unit 332. Memory 360 can store motion information of blocks from which motion information in the current picture has been derived (or decoded) and / or motion information of blocks in the picture that have already been restored. The stored motion information can be transmitted to the inter-prediction unit 260 for use as motion information of spatially surrounding blocks or motion information of temporally surrounding blocks. Memory 360 can store restored samples of restored blocks in the current picture and transmit them to the intra-prediction unit 331.

[0072] In this specification, embodiments described for the filtering unit 260, inter-prediction unit 221, and intra-prediction unit 222 of the encoding device 100 can also be applied identically or in a corresponding manner to the filtering unit 350, inter-prediction unit 332, and intra-prediction unit 331 of the decoding device 300, respectively.

[0073] As mentioned above, prediction is performed to improve compression efficiency when performing video coding. This allows for the generation of a predicted block containing predicted samples for the current block, which is the block to be coded. Here, the predicted block contains predicted samples in the spatial domain (or pixel domain). The predicted block is similarly derived by the encoding and decoding devices, and the encoding device can improve image coding efficiency by signaling the decoding device with information about the residual between the original block and the predicted block (residual information), which is not the original sample value of the original block itself. The decoding device can derive a residual block containing residual samples based on the residual information, and can generate a restored block containing restored samples by adding the residual block and the predicted block, thereby generating a restored picture containing the restored block.

[0074] The residual information can be generated through transformation and quantization procedures. For example, an encoding device can signal the relevant residual information (via a bitstream) to a decoding device by deriving a residual block between the original block and the predicted block, performing a transformation procedure on the residual samples (residual sample array) contained in the residual block to derive transformation coefficients, and performing a quantization procedure on the transformation coefficients to derive quantized transformation coefficients. Here, the residual information may include information such as the value information, position information, transformation technique, transformation kernel, and quantization parameters of the quantized transformation coefficients. The decoding device can derive a residual sample (or residual block) by performing an inverse quantization / inverse transformation procedure based on the residual information. The decoding device can generate a reconstructed picture based on the predicted block and the residual block. The encoding device can also derive a residual block by inverse quantization / inverse transformation of the quantized transformation coefficients for reference for subsequent interpretation of the picture, and generate a reconstructed picture based on this.

[0075] In one embodiment, a subblock TMVP flag can be used to control the motion prediction of a subblock substrate, indicating whether or not a subblock-based temporal motion vector predictor can be used. The subblock TMVP flag can be signaled at the SPS (Sequence Parameter Set) level to control whether the motion prediction of the subblock substrate is on or off. The subblock TMVP flag can be named, for example, sps_sbtmvp_enabled_flag, as shown in Table 1 below.

[0076] Furthermore, to control the affine motion prediction method, an affine flag can be used to indicate whether or not affine prediction can be applied to the current block. This affine flag can be signaled at the SPS level to control whether affine prediction is on or off. This affine flag can be called, for example, sps_affine_enabled_flag, as shown in Table 1 below. If the value of the affine flag is 1, an affine type flag can be additionally signaled to determine whether or not 6-parameter affine prediction is available.

[0077] An example of syntax signaled at the SPS level is shown in Table 1 below.

[0078] [Table 1-1]

[0079] [Table 1-2]

[0080] In one embodiment, the low-level coding syntax, as shown in Table 2 below, can signal whether an affine merge or a normal merge should be applied to the current block if the merge_flag of the current block is 1, based on the conditions of the current block (e.g., block size, block shape, etc.) when the affine flag of the SPS is 1. The merge affine flag can be represented, for example, as merge_affine_flag. In one example, if the value of the affine flag signaled at the SPS level is 0 and the value of merge_flag signaled at the coding unit level is 1, it can be determined that a normal merge should be applied to the current block without signaling any additional syntax elements.

[0081] An example of syntax signaled at the coding unit level is shown in Table 2 below.

[0082] [Table 2-1]

[0083] [Table 2-2]

[0084] [Table 2-3]

[0085] [Table 2-4]

[0086] On the other hand, if the high-level syntax design in Table 1 and the low-level syntax design in Table 2 are applied, and ATMVP is used as an affine merge candidate, design problems, logical problems, and conceptual problems may arise. For example, if the value of the affine flag signaled at the SPS level is 0 and the value of the subblock TMVP flag signaled at the SPS level is 1, the ATMVP candidate may not be used as any candidate, even though the SPS is signaled to use ATMVP. In addition to the design and logical problems mentioned above, conceptual problems may also exist. ATMVP is a motion prediction method for a subblock (SubPu in one example) base, and its purpose is to distinguish between motion prediction candidates for a non-subblock base (non-SubPu base in one example) and motion prediction candidates for a subblock base in a normal merge, by using it as a candidate for an affine merge mode in which prediction is performed on the subblock base. However, despite these objectives, the low-level syntax design shown in Table 2 above controls the subblock ATMVP by whether or not affine merging is enabled.

[0087] To complement the design, logical, and conceptual problems described above, one embodiment can provide high-level and / or low-level syntax designs based on at least one of Tables 3 to 11 below.

[0088] In one embodiment, a flag for controlling the motion prediction of a subblock substrate can be signaled at the SPS level. The flag for controlling the motion prediction of the subblock substrate can be represented, for example, as sps_subpumvp_enabled_flag, and can be used to determine whether the motion prediction of the subblock substrate is turned on or off. When the value of sps_subpumvp_enabled_flag is 1, affine_enabled_flag and sbtmvp_enabled_flag can be signaled as shown in Table 3 below.

[0089] [Table 3]

[0090] When using the SPS level syntax design in Table 3, the availability of affine prediction and ATMVP can be expressed as shown in Table 4 below. In Table 4 below, 1 indicates that the method is available, and 0 indicates that the method is not available.

[0091] [Table 4]

[0092] In one embodiment, a high-level syntax design may be provided for controlling the availability of both affine prediction and ATMVP based on sps_subpumvp_enabled_flag. In this embodiment, in one example, if the value of sps_subpumvp_enabled_flag is 1, it can be determined that both affine prediction and ATMVP are available. The high-level syntax design according to this embodiment may be as shown in Table 5 below.

[0093] [Table 5]

[0094] In one embodiment, the availability of affine prediction and ATMVP is controlled based on sps_subpumvp_enabled_flag included in the high-level syntax according to Table 5, but a method may be provided to use slice_subpumvp_enabled_flag in the slice header syntax in order to control the availability of ATMVP in detail at the slice level. The slice header level syntax according to this embodiment can be, for example, as shown in Table 6 below.

[0095] [Table 6]

[0096] In one embodiment, if the affine prediction method is not used and sps_sbtmvp_enabled_flag is 1, a method may be provided in which merge_affine_flag is signaled, but affine candidates are not configured as candidates, and only ATMVP is configured as a candidate. An example of low-level syntax to represent this embodiment may be shown in Table 7 below.

[0097] [Table 7]

[0098] In Table 7 above, if the value of sps_affine_enabled_flag is 1, or if the value of sps_sbtmvp_enabled_flag is 1, it can be determined to decode the merge_affine_flag, which indicates whether merge_affine mode is applicable or not.

[0099] For example, if the value of sps_affine_enabled_flag is 1 or the value of sps_sbtmvp_enabled_flag is 1, it may be decided to decode the merge_subblock_flag, which indicates whether the merge subblock mode is applicable. In the merge subblock mode, merge candidates can be determined on a subblock basis.

[0100] In Table 7 above, if the current block width (cbWidth) and height (cbHeight) are both 8 or greater, and the value of sps_affine_enabled_flag is 1, or the value of sps_sbtmvp_enabled_flag is 1, then it can be decided to decode the merge affine flag merge_affine_flag.

[0101] In one example, if the maximum number of merge candidates for the subblocks of the current block is greater than 0, it may be decided to decode the previously determined merge mode flag.

[0102] In one example, if the value of the affine flag is 1, or the value of the subblock TMVP flag is 1, the maximum number of merge candidates for the subblocks of the current block can be greater than 0.

[0103] In one example, whether or not to decode the previously determined merge mode flag can be determined based on whether the condition if(MaxNumSubblockMergeCand>0&&cbWidth>=8&&cbHeight>=8) is satisfied. MaxNumSubblockMergeCand represents the maximum number of merge candidates for the subblock, cbWidth represents the width of the current block, and cbHeight represents the height of the current block.

[0104] In Table 7, if the value of sps_affine_enabled_flag is 0 and the value of sps_sbtmvp_enabled_flag is 1, merge_affine_idx may not be signaled and may be inferred to 0. According to the embodiment in Table 7, the availability of affine prediction and ATMVP can be expressed as shown in Table 8 below.

[0105] [Table 8]

[0106] In one embodiment, a method may be provided to control whether ATMVP is used as a normal merge candidate when the affine prediction method is not used and the value of sps_sbtmvp_enabled_flag is 1. According to this embodiment, the availability of affine prediction and ATMVP can be expressed as shown in Table 9 below.

[0107] [Table 9]

[0108] In one embodiment, a method may be provided for designing high-level syntax to signal sps_sbtmvp_enabled_flag only when the value of affine_enabled_flag is 1. This may take into account the structure of a low-level coding tool designed so that ATMVP cannot be used when ATMVP is used as an affine merge candidate and the value of sps_affine_enabled_flag is 0. An example of high-level syntax according to this embodiment is shown in Table 10 below.

[0109] [Table 10]

[0110] When using the SPS level syntax design shown in Table 10, the availability of affine prediction and ATMVP can be expressed as shown in Table 11 below.

[0111] [Table 11]

[0112] Figure 4 is a flowchart showing the operation of an encoding device according to one embodiment, and Figure 5 is a block diagram showing the configuration of an encoding device according to one embodiment.

[0113] The encoding device shown in Figures 4 and 5 can perform operations corresponding to those of the decoding device shown in Figures 6 and 7. Therefore, the operation of the decoding device described later in Figures 6 and 7 can also be applied to the encoding device shown in Figures 4 and 5.

[0114] Each step disclosed in Figure 4 can be performed by the encoding device 200 disclosed in Figure 2. More specifically, steps S400 and S410 can be performed by the prediction unit 220 disclosed in Figure 2, and step S420 can be performed by the entropy encoding unit 240 disclosed in Figure 2. Furthermore, the operations of steps S400 to S420 are based in part on the content described above in Figure 3. Therefore, specific details that overlap with the content described above in Figures 2 and 3 are omitted or simplified in the explanation.

[0115] As shown in Figure 5, an encoding device according to one embodiment may include a prediction unit 220 and an entropy encoding unit 240. However, in some cases, not all of the components shown in Figure 5 are essential components of the encoding device, and the encoding device can be realized with more or fewer components than those shown in Figure 5.

[0116] In one embodiment of the encoding device, the prediction unit 220 and the entropy encoding unit 240 may be implemented on separate chips, or at least two or more components may be implemented via a single chip.

[0117] An encoding device according to one embodiment can determine whether affine prediction can be applied to the current block and whether a temporal motion vector predictor based on the subblocks of the current block can be used (S400). More specifically, the prediction unit 220 of the encoding device can determine whether affine prediction can be applied to the current block and whether a temporal motion vector predictor based on the subblocks of the current block can be used.

[0118] In one embodiment, the encoding device can determine whether or not to encode a predetermined merge mode flag, which indicates whether or not to apply a predetermined merge mode to the current block, based on the determination of whether or not the affine prediction can be applied to the current block and whether or not the temporal motion vector predictor based on the subblocks of the current block can be used (S410). More specifically, the prediction unit 220 of the encoding device can determine whether or not to encode a predetermined merge mode flag, which indicates whether or not to apply a predetermined merge mode to the current block, based on the determination of whether or not the affine prediction can be applied to the current block and whether or not the temporal motion vector predictor based on the subblocks of the current block can be used.

[0119] In one example, the predetermined merge mode may be a merge affine mode or a merge subblock mode, and the predetermined merge mode flag may be a merge affine flag or a merge subblock flag. The merge affine flag may be represented as merge_affine_flag, and the merge subblock flag may be represented as merge_subblock_flag.

[0120] In one embodiment, the encoding device can encode an affine flag indicating whether the affine prediction can be applied to the current block, a subblock TMVP flag indicating whether the temporal motion vector predictor based on the subblock of the current block can be used, and the pre-determined merge mode flag, based on the decision to encode the pre-determined merge mode flag (S420). More specifically, the entropy encoding unit 240 of the encoding device can encode an affine flag indicating whether the affine prediction can be applied to the current block, a subblock TMVP flag indicating whether the temporal motion vector predictor based on the subblock of the current block can be used, and the pre-determined merge mode flag, based on the decision to encode the pre-determined merge mode flag.

[0121] In one embodiment, it can be determined to encode the pre-determined merge mode flag if the value of the affine flag is 1 or the value of the subblock TMVP flag is 1.

[0122] In one embodiment, it can be determined to encode the pre-determined merge mode flag if the width and height of the current block are each 8 or greater, and the first condition is met, which is that the value of the affine flag is 1, or the second condition is met, which is that the value of the subblock TMVP flag is 1.

[0123] In one embodiment, whether or not to encode the previously determined merge mode flag can be determined based on the following formula 1.

[0124]

number

[0125] In the above formula 1, sps_affine_enabled_flag represents the affine flag, cbWidth represents the width of the current block, cbHeight represents the height of the current block, and sps_sbtmvp_enabled_flag represents the subblock TMVP flag.

[0126] In one embodiment, the determined merge mode flag may be a merge affine flag indicating whether or not an affine merge mode is applied to the current block, or a merge subblock flag indicating whether or not a merge mode is applied to the subblock units of the current block.

[0127] In one embodiment, if the maximum number of merge candidates for the subblocks of the current block is greater than 0, it may be determined to encode the determined merge mode flag.

[0128] In one embodiment, if the value of the affine flag is 1, or the value of the subblock TMVP flag is 1, the maximum number of merge candidates for the subblocks of the current block can be characterized as being greater than 0.

[0129] In one embodiment, whether or not to encode the previously determined merge mode flag can be determined based on the following formula 2.

[0130]

number

[0131] In the above formula 2, MaxNumSubblockMergeCand represents the maximum number of merge candidates for the subblock, cbWidth represents the width of the current block, and cbHeight represents the height of the current block.

[0132] According to the encoding device and the method of operation of the encoding device shown in Figures 4 and 5, the encoding device determines whether or not affine prediction can be applied to the current block and whether or not a temporal motion vector predictor based on the subblocks of the current block can be used (S400), determines whether or not to encode a predetermined merge mode flag indicating whether or not to apply a predetermined merge mode to the current block based on the determination of whether or not affine prediction can be applied to the current block and whether or not the temporal motion vector predictor based on the subblocks of the current block can be used (S410), and encodes an affine flag indicating whether or not affine prediction can be applied to the current block, a subblock TMVP flag indicating whether or not the temporal motion vector predictor based on the subblocks of the current block can be used, and the predetermined merge mode flag based on the determination of whether or not to encode the predetermined merge mode flag (S420), wherein it is determined to encode the predetermined merge mode flag if the value of the affine flag is 1 or the value of the subblock TMVP flag is 1. In other words, by determining whether or not to decode the predetermined merge mode flag, which indicates whether or not to apply a predetermined merge mode to the current block, based on the affine flag and the subblock TMVP flag, image coding efficiency can be improved.

[0133] Figure 6 is a flowchart showing the operation of a decoding device according to one embodiment, and Figure 7 is a block diagram showing the configuration of a decoding device according to one embodiment.

[0134] Each step disclosed in Figure 6 can be performed by the decoding device 300 disclosed in Figure 3. More specifically, S600 and S610 can be performed by the entropy decoding unit 310 disclosed in Figure 3, S620 can be performed by the prediction unit 330 disclosed in Figure 3, and S630 can be performed by the addition unit 340 disclosed in Figure 3. Furthermore, the operations of S600 to S630 are based in part on the content described above in Figure 3. Therefore, specific details that overlap with the content described above in Figure 3 are omitted or simplified in this explanation.

[0135] As shown in Figure 7, a decoding device according to one embodiment may include an entropy decoding unit 310, a prediction unit 330, and an addition unit 340. However, in some cases, not all of the components shown in Figure 7 are essential components of the decoding device, and the decoding device can be realized with more or fewer components than those shown in Figure 7.

[0136] In a decoding device according to one embodiment, the entropy decoding unit 310, the prediction unit 330, and the addition unit 340 may each be implemented on a separate chip, or at least two or more components may be implemented via a single chip.

[0137] A decoding device according to one embodiment can decode an affine flag indicating whether or not affine prediction can be applied to the current block and a subblock TMVP flag indicating whether or not a temporal motion vector predictor based on the subblocks of the current block can be used, based on the bitstream (S600). More specifically, the entropy decoding unit 310 of the decoding device can decode an affine flag indicating whether or not affine prediction can be applied to the current block and a subblock TMVP flag indicating whether or not a temporal motion vector predictor based on the subblocks of the current block can be used, based on the bitstream.

[0138] In one example, the affine flag can be represented as sps_affine_enabled_flag, and the subblock TMVP flag can be represented as sps_sbtmvp_enabled_flag. The subblock TMVP flag may also be referred to as the subPU TMVP flag.

[0139] In one example, the affine flag and the subblock TMVP flag can be signaled at the SPS level.

[0140] In one embodiment, the decoding device can determine whether or not to decode a predetermined merge mode flag, which indicates whether or not to apply a predetermined merge mode to the current block, based on the decoded affine flag and the decoded subblock TMVP flag (S610). More specifically, the entropy decoding unit 310 of the decoding device can determine whether or not to decode a predetermined merge mode flag, which indicates whether or not to apply a predetermined merge mode to the current block, based on the decoded affine flag and the decoded subblock TMVP flag.

[0141] In one example, the predetermined merge mode may be a merge affine mode or a merge subblock mode, and the predetermined merge mode flag may be a merge affine flag or a merge subblock flag. The merge affine flag may be represented as merge_affine_flag, and the merge subblock flag may be represented as merge_subblock_flag.

[0142] In one embodiment, the decoding device can derive a predicted sample for the current block based on the decision of whether or not to decode the previously determined merge mode flag (S620). More specifically, the prediction unit 330 of the decoding device can derive a predicted sample for the current block based on the decision of whether or not to decode the previously determined merge mode flag.

[0143] A decoding device according to one embodiment can derive a prediction mode to be applied to the current block based on the decision on whether or not to decode the previously determined merge mode flag, and can derive prediction samples for the current block based on the derived prediction mode.

[0144] In one embodiment, the decoding device can generate a restored sample for the current block based on the predicted sample for the current block (S630). More specifically, the adder 340 of the decoding device can generate a restored sample for the current block based on the predicted sample for the current block.

[0145] In one embodiment, if the value of the affine flag is 1, or if the value of the subblock TMVP flag is 1, it may be determined to decode the previously determined merge mode flag.

[0146] In one example, if the value of sps_affine_enabled_flag is 1, or if the value of sps_sbtmvp_enabled_flag is 1, it may be decided to decode the previously determined merge mode flag.

[0147] In another example, if the value of sps_affine_enabled_flag is 1, or if the value of sps_sbtmvp_enabled_flag is 1, it may be decided to decode the merge_affine_flag.

[0148] In yet another example, if the value of sps_affine_enabled_flag is 1 or the value of sps_sbtmvp_enabled_flag is 1, it may be decided to decode the merge_subblock_flag.

[0149] In one embodiment, if the width and height of the current block are each 8 or greater, and the first condition is met (the value of the affine flag is 1), or the second condition is met (the value of the subblock TMVP flag is 1), then it may be decided to decode the previously determined merge mode flag.

[0150] In one embodiment, whether or not to decode the previously determined merge mode flag can be determined based on the following formula 3.

[0151]

number

[0152] In the above formula 3, sps_affine_enabled_flag represents the affine flag, cbWidth represents the width of the current block, cbHeight represents the height of the current block, and sps_sbtmvp_enabled_flag represents the subblock TMVP flag.

[0153] In one embodiment, if the maximum number of merge candidates for the subblocks of the current block is greater than 0, it may be determined to decode the determined merge mode flag.

[0154] In one embodiment, if the value of the affine flag is 1, or the value of the subblock TMVP flag is 1, the maximum number of merge candidates for the subblocks of the current block can be greater than 0.

[0155] In one embodiment, whether or not to decode the previously determined merge mode flag can be determined based on the following formula 4.

[0156]

number

[0157] In the above formula 4, MaxNumSubblockMergeCand represents the maximum number of merge candidates for the subblock, cbWidth represents the width of the current block, and cbHeight represents the height of the current block.

[0158] According to the decoding device and method of operation of the decoding device disclosed in Figures 6 and 7, the decoding device decodes an affine flag indicating whether or not an affine prediction can be applied to the current block and a subblock TMVP flag indicating whether or not a temporal motion vector predictor based on the subblocks of the current block can be used, based on the bitstream (S600); determines whether or not to decode a predetermined merge mode flag indicating whether or not to apply a predetermined merge mode to the current block, based on the decoded affine flag and the decoded subblock TMVP flag (S610); derives a prediction sample for the current block based on the decision on whether or not to decode the predetermined merge mode flag (S620); and generates a restoration sample for the current block based on the prediction sample for the current block (S630), but is characterized in that it is determined to decode the predetermined merge mode flag if the value of the affine flag is 1 or the value of the subblock TMVP flag is 1. In other words, by determining whether or not to decode the predetermined merge mode flag, which indicates whether or not to apply a predetermined merge mode to the current block, based on the affine flag and the subblock TMVP flag, image coding efficiency can be improved.

[0159] In the embodiments described above, the method is explained based on a sequence diagram as a series of steps or blocks; however, the disclosure is not limited to the order of the steps, and some steps may occur in a different order or simultaneously with other steps than those described above. Furthermore, those skilled in the art will understand that the steps shown in the sequence diagram are not exclusive, and other steps may be included, or one or more steps in the sequence diagram may be deleted without affecting the scope of the disclosure.

[0160] The methods relating to the present disclosure described above can be implemented in software form, and the encoding and / or decoding devices relating to the present disclosure can be included in, for example, image processing devices such as TVs, computers, smartphones, set-top boxes, and display devices.

[0161] When embodiments, etc., in this disclosure are implemented in software, the methods described above can be implemented by modules (processes, functions, etc.) that perform the functions described above. These modules are stored in memory and can be executed by a processor. The memory may be internal or external to the processor and can be connected to the processor by various well-known means. The processor may include an ASIC (application-specific integrated circuit), other chipsets, logic circuits, and / or data processing devices. The memory may include ROM (read-only memory), RAM (random access memory), flash memory, memory cards, storage media, and / or other storage devices. In other words, the embodiments, etc., described in this disclosure can be implemented on a processor, microprocessor, controller, or chip. For example, the functional units illustrated in each drawing can be implemented on a computer, processor, microprocessor, controller, or chip. In this case, information on instructions or algorithms for implementation can be stored on a digital storage medium.

[0162] Furthermore, the decoding and encoding devices to which this disclosure applies may include multimedia broadcasting transceivers, mobile communication terminals, home cinema video equipment, digital cinema video equipment, surveillance cameras, video interaction equipment, real-time communication equipment such as video communications, mobile streaming equipment, storage media, camcorders, video-on-demand (VoD) service providers, over-the-top (OTT) video equipment, internet streaming service providers, 3D video equipment, virtual reality (VR) equipment, argumente reality (AR) equipment, image-phone video equipment, transportation terminals (e.g., vehicle terminals (including autonomous vehicles), airplane terminals, ship terminals, etc.), and medical video equipment, and may be used to process video signals or data signals. For example, over-the-top video equipment may include game consoles, Blu-ray players, internet-connected TVs, home theater systems, smartphones, tablet PCs, and digital video recorders (DVRs).

[0163] Furthermore, the processing methods to which this disclosure applies can be produced in the form of programs executed on a computer and stored on a computer-readable recording medium. Multimedia data having the data structures relating to this disclosure can also be stored on a computer-readable recording medium. The computer-readable recording medium includes all kinds of storage devices and distributed storage devices that store data that can be read by a computer. The computer-readable recording medium can include, for example, Blu-ray discs (BDs), general-purpose serial buses (USBs), ROMs, PROMs, EPROMs, EEPROMs, RAMs, CD-ROMs, magnetic tapes, floppy disks, and optical data storage devices. The computer-readable recording medium also includes media implemented in the form of carrier waves (e.g., transmission over the Internet). Furthermore, bitstreams generated by encoding methods can be stored on a computer-readable recording medium or transmitted over a wireless network.

[0164] Furthermore, embodiments of the present disclosure can be implemented as a computer program product comprising program code, the program code being performed by a computer according to the embodiments of the present disclosure. The program code can be stored on a computer-readable carrier.

[0165] Figure 8 shows an example of a content streaming system to which the disclosures in this document may apply.

[0166] As shown in Figure 8, the content streaming system to which this disclosure applies may broadly include an encoding server, a streaming server, a web server, a media storage facility, user equipment, and multimedia input devices.

[0167] The encoding server is responsible for compressing content input from multimedia input devices such as smartphones, cameras, and camcorders into digital data to generate a bitstream, and then transmitting this bitstream to the streaming server. In other cases, if a multimedia input device such as a smartphone, camera, or camcorder directly generates the bitstream, the encoding server can be omitted.

[0168] The bitstream can be generated by an encoding method or bitstream generation method to which the present disclosure applies, and the streaming server may temporarily store the bitstream in the process of transmitting or receiving the bitstream.

[0169] The streaming server transmits multimedia data to user devices based on user requests via a web server, and the web server acts as an intermediary to inform users about available services. When a user requests a desired service from the web server, the web server transmits this to the streaming server, and the streaming server transmits multimedia data to the user. In this case, the content streaming system may include a separate control server, in which case the control server controls the commands and responses between the devices within the content streaming system.

[0170] The streaming server can receive content from a media storage and / or encoding server. For example, if it starts receiving content from the encoding server, it can receive the content in real time. In this case, in order to provide a smooth streaming service, the streaming server can store the bitstream for a certain period of time.

[0171] Examples of user devices include mobile phones, smartphones, laptop computers, digital broadcasting terminals, PDAs (personal digital assistants), PMPs (portable multimedia players), navigation systems, slate PCs, tablet PCs, ultrabooks, wearable devices (such as smartwatches, smart glasses, and HMDs), digital TVs, desktop computers, and digital signage.

[0172] Each server within the aforementioned content streaming system can be operated as a distributed server, in which case the data received by each server can be processed in a distributed manner.

Claims

1. In an image decoding method performed by a decoding device, The step of receiving image information including affine enable flag information, subblock temporal motion vector prediction enable flag information, and residual information, The steps include receiving specific flag information related to whether a subblock-based specific merge mode is currently applied to the block, A step of determining whether to receive a specific merge index for the subblock-based specific merge mode based on the specific flag information, the affine enable flag information, and the subblock temporal motion vector prediction enable flag information, The steps include: deriving the prediction mode of the current block based on the specified flag information; The steps include: deriving prediction samples for the current block by applying interpretation to the current block based on the derived prediction mode; The steps include: deriving a residual sample for the current block based on the residual information; The step of generating a reconstructed sample based on the predicted sample and the residual sample includes, The step of receiving the specified flag information is: A step of determining whether to receive the specific flag information based on at least one of the affine enable flag information and the subblock temporal motion vector prediction enable flag information, The step includes receiving the specific flag information based on the result of the step of determining whether to receive the specific flag information, A method in which it is determined that the specific merge index is not received based on the case where the value of the specific flag information is equal to 1, the value of the affine enable flag information is equal to 0, and the value of the subblock temporal motion vector prediction enable flag information is equal to 1.

2. In an image encoding method performed by an encoding device, The steps include deriving affine enable flag information and subblock temporal motion vector prediction enable flag information, The steps include: deriving prediction samples for the current block by applying interpretation to the current block; A step of deriving specific flag information related to whether a subblock-based specific merge mode is applied to the current block, A step of determining whether to signal a specific merge index for the subblock-based specific merge mode based on the specific flag information, the affine enable flag information, and the subblock temporal motion vector prediction enable flag information, The steps include generating residual information based on the aforementioned prediction samples, The process includes encoding image information which includes the residual information and at least one of the affine enable flag information, the subblock temporal motion vector prediction enable flag information, the specific flag information, or the specific merge index, The step of deriving the aforementioned specific flag information is: A step of determining whether to signal the specific flag information based on at least one of the affine enable flag information and the subblock temporal motion vector prediction enable flag information, The step includes deriving the specific flag information based on the result of the step of determining whether or not to signal the specific flag information, A method in which it is determined that the specific merge index is not signaled based on the case where the value of the specific flag information is equal to 1, the value of the affine enable flag information is equal to 0, and the value of the subblock temporal motion vector prediction enable flag information is equal to 1.

3. A method for transmitting data for an image, A step of obtaining a bitstream, wherein the bitstream is The steps include deriving affine enable flag information and subblock temporal motion vector prediction enable flag information, The steps include: deriving prediction samples for the current block by applying interpretation to the current block; A step of deriving specific flag information related to whether a subblock-based specific merge mode is applied to the current block, A step of determining whether to signal a specific merge index for the subblock-based specific merge mode based on the specific flag information, the affine enable flag information, and the subblock temporal motion vector prediction enable flag information, The steps include generating residual information based on the aforementioned prediction samples, The bitstream is generated by encoding image information that includes the residual information and at least one of the affine enable flag information, the subblock temporal motion vector prediction enable flag information, the specific flag information, or the specific merge index. The step of transmitting the data, which includes the bitstream, The step of deriving the aforementioned specific flag information is: A step of determining whether to signal the specific flag information based on at least one of the affine enable flag information and the subblock temporal motion vector prediction enable flag information, The step includes deriving the specific flag information based on the result of the step of determining whether or not to signal the specific flag information, A transmission method in which it is determined that the specific merge index is not signaled based on the case where the value of the specific flag information is equal to 1, the value of the affine enable flag information is equal to 0, and the value of the subblock temporal motion vector prediction enable flag information is equal to 1.