Encoding and decoding devices, storage media and data transmitting devices
By using history-based motion vector prediction (HMVP) to configure prediction candidates in image/video coding, the problems of low coding efficiency and high complexity are solved, and more efficient image/video compression is achieved.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- TCL KING ELECTRICAL APPLIANCES HUIZHOU
- Filing Date
- 2019-10-02
- Publication Date
- 2026-06-16
AI Technical Summary
Existing technologies suffer from low coding efficiency and high complexity in the encoding of high-resolution, high-quality images/videos, especially the increased complexity caused by pruning in inter-frame prediction.
Historical motion vector prediction (HMVP) is used to configure prediction candidates. By configuring AMVP candidate list and HMVP candidate list for the current block, appropriate HMVP candidates are selected and added to the AMVP candidate list. Updated motion information is derived to generate prediction samples, omitting pruning.
It improves image/video compression efficiency, reduces the increased complexity caused by trimming, and improves encoding efficiency.
Smart Images

Figure CN116708822B_ABST
Abstract
Description
[0001] This application is a divisional application of the original invention patent application No. 201980070236.3 (International Application No.: PCT / KR2019 / 012921, Application Date: October 2, 2019, Invention Title: Method and Apparatus for Constructing Predictive Candidates Based on HMVP). Technical Field
[0002] This disclosure relates to image coding techniques, and more specifically, to a method and apparatus for configuring prediction candidates in an image coding system based on history-based motion vector prediction (HMVP). Background Technology
[0003] Recently, there has been a growing demand for high-resolution, high-quality images / videos, such as 4K or 8K Ultra High Definition (UHD) images / videos, across various fields. As image / video resolution or quality increases, relatively more information or bits are transmitted compared to traditional image / video data. Therefore, if image / video data is transmitted via media such as existing wired / wireless broadband lines or stored in traditional storage media, the costs of transmission and storage can easily increase.
[0004] In addition, there is growing interest and demand for virtual reality (VR) and artificial reality (AR) content, as well as immersive media such as holograms; and the broadcasting of images / videos that exhibit characteristics different from actual images / videos (e.g., game images / videos) is also increasing.
[0005] Therefore, highly efficient image / video compression technology is needed to effectively compress and send, store, or play high-resolution, high-quality images / videos that exhibit the various characteristics described above. Summary of the Invention
[0006] Technical issues
[0007] One technical objective of this disclosure is to provide a method and apparatus for improving image coding efficiency.
[0008] Another technical objective of this disclosure is to provide a method and apparatus for performing encoding using an inter-frame prediction method.
[0009] Another technical objective of this disclosure is to provide a method and apparatus for configuring prediction candidates based on HMVP for inter-frame prediction.
[0010] Another technical objective of this disclosure is to provide a method and apparatus that omits pruning processing when configuring prediction candidates based on HMVP for inter-frame prediction to avoid increased complexity due to pruning processing.
[0011] Technical solution
[0012] According to one embodiment of this disclosure, an image decoding method performed by a decoding device is provided. The method includes the following steps: configuring an AMVP candidate list for a current block, including at least one Advanced Motion Vector Prediction (AMVP) candidate; deriving a history-based High-Level Motion Vector Prediction (HMVP) candidate list for the current block, the HMVP candidate list including HMVP candidates for the current block; selecting at least one HMVP candidate from the HMVP candidates in the HMVP candidate list; deriving an updated AMVP candidate list by adding the at least one HMVP candidate to the AMVP candidate list; deriving motion information for the current block based on the updated AMVP candidate list; deriving a prediction sample for the current block based on the motion information of the current block; and generating a reconstructed sample for the current block based on the prediction sample for the current block.
[0013] According to another embodiment of this disclosure, a decoding apparatus for performing image decoding is provided. The decoding apparatus includes: a predictor that configures an AMVP candidate list including at least one Advanced Motion Vector Prediction (AMVP) candidate for a current block; derives a history-based HMVP candidate list for the current block, the HMVP candidate list including HMVP candidates for the current block; selects at least one HMVP candidate from the HMVP candidate list; derives an updated AMVP candidate list by adding the at least one HMVP candidate to the AMVP candidate list; derives motion information for the current block based on the updated AMVP candidate list; and derives a predicted sample for the current block based on the motion information of the current block; and an adder that generates a reconstructed sample for the current block based on the predicted sample for the current block.
[0014] According to another embodiment of this disclosure, an image encoding method performed by an encoding device is provided. The method includes the following steps: configuring an AMVP candidate list including at least one AMVP candidate for a current block; deriving an HMVP candidate list for the current block, the HMVP candidate list including HMVP candidates for the current block; selecting at least one HMVP candidate from the HMVP candidates in the HMVP candidate list; deriving an updated AMVP candidate list by adding the at least one HMVP candidate to the AMVP candidate list; deriving motion information of the current block based on the updated AMVP candidate list; deriving a prediction sample of the current block based on the motion information of the current block; deriving a residual sample of the current block based on the prediction sample of the current block; and encoding image information including information about the residual sample.
[0015] According to another embodiment of this disclosure, an encoding apparatus for performing image encoding is provided. The encoding apparatus includes: a predictor that configures an AMVP candidate list including at least one AMVP candidate for a current block, derives an HMVP candidate list for the current block, the HMVP candidate list including HMVP candidates for the current block, selects at least one HMVP candidate from the HMVP candidates in the HMVP candidate list, derives an updated AMVP candidate list by adding the at least one HMVP candidate to the AMVP candidate list, derives motion information of the current block based on the updated AMVP candidate list, and derives a prediction sample of the current block based on the motion information of the current block; a residual processor that derives residual samples of the current block based on the prediction samples of the current block; and an entropy encoder that encodes image information including information about the residual samples.
[0016] According to another embodiment of this disclosure, a decoder-readable storage medium is provided that stores information about instructions that cause a video decoding device to perform a decoding method according to a partial embodiment.
[0017] According to another embodiment of this disclosure, a decoder-readable storage medium is provided that stores information about instructions that cause a video decoding device to perform a decoding method according to one embodiment. The decoding method according to one embodiment includes the following steps: configuring an AMVP candidate list including at least one AMVP candidate for a current block; deriving a history-based motion vector prediction (HMVP) candidate list for the current block, the HMVP candidate list including HMVP candidates for the current block; selecting at least one HMVP candidate from the HMVP candidates in the HMVP candidate list; deriving an updated AMVP candidate list by adding the at least one HMVP candidate to the AMVP candidate list; deriving motion information for the current block based on the updated AMVP candidate list; deriving a prediction sample for the current block based on the motion information for the current block; and generating a reconstructed sample for the current block based on the prediction sample for the current block.
[0018] Beneficial effects
[0019] According to this disclosure, overall image / video compression efficiency can be improved.
[0020] According to this disclosure, inter-frame prediction methods can be used to improve image coding efficiency.
[0021] According to this disclosure, image coding efficiency can be improved by configuring prediction candidates based on HMVP for inter-frame prediction.
[0022] According to this disclosure, when configuring prediction candidates based on HMVP for inter-frame prediction, the increased complexity due to pruning can be prevented by omitting the pruning process. Attached Figure Description
[0023] Figure 1 Examples of video / image coding systems to which this disclosure can be applied are shown.
[0024] Figure 2 This illustrates the configuration of video / image encoding devices to which this disclosure can be applied.
[0025] Figure 3 This illustrates the configuration of a video / image decoding device to which this disclosure can be applied.
[0026] Figure 4 An example of the decoding process based on HMVP candidates is shown.
[0027] Figure 5a and Figure 5b This illustrates the process of updating the HMVP buffer according to one implementation.
[0028] Figures 6 to 13 An HMVP method according to a partial implementation is shown.
[0029] Figure 14 This is a flowchart illustrating the operation of an encoding device according to one embodiment.
[0030] Figure 15 This is a block diagram illustrating the structure of an encoding device according to one embodiment.
[0031] Figure 16 This is a block diagram illustrating the operation of a decoding device according to one embodiment.
[0032] Figure 17 This is a block diagram illustrating the structure of a decoding device according to one embodiment.
[0033] Figure 18 Examples of content streaming systems to which this disclosure can be applied are shown. Detailed Implementation
[0034] According to one embodiment of this disclosure, an image decoding method performed by a decoding device is provided. The method includes: configuring an AMVP candidate list including at least one AMVP candidate for a current block; deriving an HMVP candidate list for the current block, the HMVP candidate list including HMVP candidates for the current block; selecting at least one HMVP candidate from the HMVP candidates in the HMVP candidate list; deriving an updated AMVP candidate list by adding at least one HMVP candidate to the AMVP candidate list; deriving motion information of the current block based on the updated AMVP candidate list; deriving a predicted sample of the current block based on the motion information of the current block; and generating a reconstructed sample of the current block based on the predicted sample of the current block.
[0035] Embodiments of the present invention
[0036] This disclosure may be modified in various forms, as will be described and illustrated in the accompanying drawings. However, these embodiments are not intended to limit this disclosure. The terminology used in the following description is for the purpose of describing particular embodiments only and is not intended to limit this disclosure. Singular expressions include plural expressions, which shall be clearly distinguishable by different readings. Terms such as “comprising” and “having” are intended to indicate the presence of the features, quantities, steps, operations, elements, components or combinations thereof used in the following description, and therefore it should be understood that the possibility of having or adding one or more different features, quantities, steps, operations, elements, components or combinations thereof is not excluded.
[0037] Furthermore, the various components shown independently in the accompanying drawings described in this disclosure are for ease of depiction of different functionalities and do not imply that these components are implemented in separate hardware or separate software. For example, two or more of the various configurations may be combined to form a single configuration, or a single configuration may be divided into multiple configurations. Implementations in which the various configurations are integrated and / or separated are also included within the scope of this disclosure without departing from its spirit.
[0038] Hereinafter, exemplary embodiments of the present disclosure will be described in detail with reference to the accompanying drawings. Hereinafter, the same reference numerals will be used for the same components in the drawings, and redundant descriptions of the same components may be omitted.
[0039] Figure 1 This schematically illustrates a video / image coding system to which this disclosure can be applied.
[0040] Reference Figure 1 A video / image encoding system may include a first device (source device) and a second device (receiving device). The source device may transmit encoded video / image information or data to the receiving device in the form of a file or stream via a digital storage medium or network.
[0041] The source device may include a video source, an encoding device, and a transmitter. The receiving device may include a receiver, a decoding device, and a renderer. The encoding device may be referred to as a video / image encoding device, and the decoding device may be referred to as a video / image decoding device. The transmitter may be included in the encoding device. The receiver may be included in the decoding device. The renderer may include a display, and the display may be configured as a separate device or an external component.
[0042] Video sources can acquire video / images through processes that capture, synthesize, or generate video / images. Video sources may include video / image capture devices and / or video / image generation devices. For example, a video / image capture device may include one or more cameras, a video / image archive containing previously captured video / images, etc. For example, a video / image generation device may include a computer, tablet computer, and smartphone, and may generate video / images (electronically). For example, virtual video / images may be generated via a computer, etc. In this case, the video / image capture process may be replaced by a process that generates related data.
[0043] Encoding devices can encode input video / images. For compression and encoding efficiency, encoding devices can perform a series of processes such as prediction, transformation, and quantization. The encoded data (encoded video / image information) can be output as a bitstream.
[0044] The transmitter can send encoded image / image information or data, output as a bitstream, to a receiver of a receiving device in the form of a file or stream via a digital storage medium or network. The digital storage medium may include various storage media such as USB, SD, CD, DVD, Blu-ray, HDD, SSD, etc. The transmitter may include elements for generating media files according to a predetermined file format and may include elements for transmission over a broadcast / communication network. The receiver can receive / extract the bitstream and send the received bitstream to a decoding device.
[0045] Decoding devices can decode video / images by performing a series of processes such as dequantization, inverse transform, and prediction, which correspond to the operations of encoding devices.
[0046] The renderer can render decoded video / images. The rendered video / images can be displayed on a monitor.
[0047] This document relates to video / image coding. For example, the methods / implementations disclosed in this document can be applied to methods disclosed in Universal Video Coding (VVC), EVC (Essential Video Coding) standard, AOMedia Video 1 (AV1) standard, AVS2 (Audio Video Coding 2) standard, or next-generation video / image coding standards (e.g., H.267 or H.268).
[0048] This document presents various implementations of video / image coding, and unless otherwise stated, these implementations can be combined with each other.
[0049] In this document, video can refer to a series of images over time. A frame typically refers to a unit representing an image in a specific time zone, and a slice / tile is a unit that constitutes a portion of the frame during encoding. A slice / tile may include one or more Code Tree Units (CTUs). A frame may consist of one or more slices / tiles. A frame may consist of one or more tile groups. A tile group may include one or more tiles. A tile can represent a rectangular area of a CTU row within a tile in a frame. A tile can be divided into multiple tiles, each tile consisting of one or more CTU rows within the tile. A tile that is not divided into multiple tiles can also be referred to as a tile. Tile scanning is a specific ordering of the CTUs that divide the frame, wherein CTUs are ordered consecutively in the CTU raster scan of a tile, tiles are ordered consecutively within a tile in the tile raster scan of a tile, and tiles are ordered consecutively in the tile raster scan of a frame. A tile is a rectangular area of a CTU within a specific tile column and a specific tile row in a frame. A tile column is a rectangular region of CTUs whose height is equal to the height of the frame and whose width is specified by a syntax element in the frame parameter set. A tile row is a rectangular region of CTUs whose height is specified by a syntax element in the frame parameter set and whose width is equal to the width of the frame. A tile scan is a specific ordering of CTUs that divide the frame, wherein CTUs are ordered consecutively in a tile raster scan, and tiles in the frame are ordered consecutively in a frame raster scan. A slice comprises an integer number of tiles in the frame that can be exclusively contained in a single NAL unit. A slice can consist of a consecutive sequence of multiple complete tiles or a single complete tile. In this document, tile groups and slices are used interchangeably. For example, in this document, a tile group / tile group header may be referred to as a slice / slice header.
[0050] A pixel or image unit can refer to the smallest unit that makes up a picture (or image). Additionally, the term "sample" can be used as the counterpart to a pixel. A sample can typically represent a pixel or pixel value, and can represent only the pixel / pixel value of the luminance component or only the pixel / pixel value of the chrominance component.
[0051] A unit can represent a basic unit of image processing. A unit may include a specific region of the image and at least one of the information associated with that region. A unit may include a luminance block and two chrominance (e.g., cb, cr) blocks. In some cases, the term "unit" may be used interchangeably with terms such as "block" or "region". In general, an M×N block may include a set (or array) of samples (or sample arrays) or transform coefficients in M columns and N rows.
[0052] In this document, the terms “ / ” and “,” should be interpreted as indicating “and / or”. For example, the expression “A / B” can mean “A and / or B”. Furthermore, “A, B” can mean “A and / or B”. Additionally, “A / B / C” can mean “at least one of A, B, and / or C”. Also, “A / B / C” can mean “at least one of A, B, and / or C”.
[0053] Furthermore, in this document, the term "or" should be interpreted as indicating "and / or". For example, expressing "A or B" can include 1) only A, 2) only B, and / or 3) both A and B. In other words, the term "or" in this document should be interpreted as indicating "additionally or alternatively".
[0054] Figure 2 This is a schematic diagram illustrating the configuration of a video / image encoding apparatus to which embodiments of the present disclosure can be applied. Hereinafter, the video encoding apparatus may include an image encoding apparatus.
[0055] Reference Figure 2 The encoding device 200 includes an image segmenter 210, a predictor 220, a residual processor 230, an entropy encoder 240, an adder 250, a filter 260, and a memory 270. The predictor 220 may include an inter-frame predictor 221 and an intra-frame predictor 222. The residual processor 230 may include a transform 232, a quantizer 233, a dequantizer 234, and an inverse transform 235. The residual processor 230 may also include a subtractor 231. The adder 250 may be referred to as a reconstructor or a reconstruction block generator. According to embodiments, the image segmenter 210, predictor 220, residual processor 230, entropy encoder 240, adder 250, and filter 260 may be configured by at least one hardware component (e.g., an encoder chipset or processor). Additionally, the memory 270 may include a decoded picture buffer (DPB) or may be configured by a digital storage medium. The hardware component may also include the memory 270 as an internal / external component.
[0056] Image segmenter 210 can segment an input image (or picture or frame) input to encoding device 200 into one or more processors. For example, a processor may be referred to as a coding unit (CU). In this case, the coding unit may be recursively segmented from a coding tree unit (CTU) or a maximum coding unit (LCU) according to a quadtree-binary-truncate (QTBTTT) structure. For example, a coding unit may be segmented into multiple deeper coding units based on a quadtree structure, a binary tree structure, and / or a ternary structure. In this case, for example, a quadtree structure may be applied first, followed by a binary tree structure and / or a ternary structure. Alternatively, a binary tree structure may be applied first. The encoding process according to this disclosure may be performed based on the final coding unit that is no longer segmented. In this case, based on the image characteristics and encoding efficiency, the maximum coding unit may be used as the final coding unit, or if necessary, the coding unit may be recursively segmented into deeper coding units, and the coding unit with the optimal size may be used as the final coding unit. Here, the encoding process may include prediction, transformation, and reconstruction processes (described later). As another example, the processor may also include a prediction unit (PU) or a transform unit (TU). In this case, the prediction unit and the transform unit may be split or separated from the final encoding unit described above. The prediction unit may be a unit for predicting samples, and the transform unit may be a unit for deriving transform coefficients and / or a unit for deriving residual signals from transform coefficients.
[0057] In some cases, a unit can be used interchangeably with terms such as a block or region. Generally, an M×N block can represent a set of samples or transform coefficients consisting of M columns and N rows. Samples can typically represent pixels or pixel values, and may represent pixel / pixel values of only the luminance component or only the chrominance component. A sample can be used as a term corresponding to a frame (or image) of pixels or picometers.
[0058] In the encoding device 200, a residual signal (residual block, residual sample array) is generated by subtracting the prediction signal (prediction block, prediction sample array) output from the inter-frame predictor 221 or the intra-frame predictor 222 from the input image signal (original block, original sample array), and the generated residual signal is sent to the converter 232. In this case, as shown, the unit in the encoding device 200 that subtracts the prediction signal (prediction block, prediction sample array) from the input image signal (original block, original sample array) may be called the subtractor 231. The predictor can perform prediction on the block to be processed (hereinafter referred to as the current block) and generate a prediction block including the prediction samples of the current block. The predictor can determine whether to apply intra-frame prediction or inter-frame prediction based on the current block or CU. As described later in the description of the various prediction modes, the predictor can generate various types of information related to the prediction (e.g., prediction mode information) and send the generated information to the entropy encoder 240. The information about the prediction can be encoded in the entropy encoder 240 and output in the form of a bitstream.
[0059] Intra-predictor 222 can refer to samples in the current frame to predict the current block. Depending on the prediction mode, the referenced samples may be located near or separated from the current block. In intra-prediction, the prediction mode may include multiple non-directional modes and multiple directional modes. For example, non-directional modes may include DC mode and planar mode. For example, depending on the level of detail in the prediction direction, the directional modes may include 33 or 65 directional prediction modes. However, this is just an example, and more or fewer directional prediction modes may be used depending on the settings. Intra-predictor 222 can use the prediction modes applied to neighboring blocks to determine the prediction mode applied to the current block.
[0060] Inter-frame predictor 221 can deduce the prediction block of the current block based on a reference block (reference sample array) specified by a motion vector on a reference frame. Here, to reduce the amount of motion information transmitted in inter-frame prediction mode, motion information can be predicted on a block, sub-block, or sample basis based on the correlation between motion information between neighboring blocks and the current block. Motion information may include motion vectors and reference frame indices. Motion information may also include inter-frame prediction direction (L0 prediction, L1 prediction, Bi prediction, etc.) information. In the case of inter-frame prediction, neighboring blocks may include spatially neighboring blocks existing in the current frame and temporally neighboring blocks existing in the reference frame. The reference frame including the reference block and the reference frame including the temporally neighboring block may be the same or different. The temporally neighboring block may be referred to as a juxtaposed reference block, a juxtaposed CU (colCU), etc., and the reference frame including the temporally neighboring block may be referred to as a juxtaposed frame (colPic). For example, inter-frame predictor 221 can configure a motion information candidate list based on neighboring blocks and generate information indicating which candidate is used to deduce the motion vector and / or reference frame index of the current block. Inter-frame prediction can be performed based on various prediction modes. For example, in skip mode and merge mode, the inter-frame predictor 221 can use motion information from neighboring blocks as motion information for the current block. In skip mode, unlike merge mode, residual signals may not be sent. In motion vector prediction (MVP) mode, motion vectors from neighboring blocks can be used as motion vector predictors, and the motion vector of the current block can be indicated by signaling the motion vector difference.
[0061] Predictor 220 can generate a prediction signal based on various prediction methods described below. For example, the predictor can apply not only intra-frame prediction or inter-frame prediction to predict a block, but also both intra-frame prediction and inter-frame prediction simultaneously. This can be referred to as combined intra-frame and inter-frame prediction (CIIP). Alternatively, the predictor can predict blocks based on an intra-block copy (IBC) prediction mode or a palette mode. The IBC prediction mode or palette mode can be used for content image / video coding such as screen content coding (SCC) in games, etc. IBC essentially performs prediction in the current frame, but can be performed similarly to inter-frame prediction, such that a reference block is derived in the current frame. That is, IBC can use at least one inter-frame prediction technique described in this document. The palette mode can be considered as an example of intra-frame coding or intra-frame prediction. When the palette mode is applied, the sample values in the frame can be signaled based on information about the palette table and palette index.
[0062] The predicted signal generated by the predictor (including inter-frame predictor 221 and / or intra-frame predictor 222) can be used to generate a reconstructed signal or a residual signal. Transformer 232 can generate transform coefficients by applying transform techniques to the residual signal. For example, the transform technique may include at least one of Discrete Cosine Transform (DCT), Discrete Sine Transform (DST), Karhunen–Loève Transform (KLT), Graphical Based Transform (GBT), or Conditional Nonlinear Transform (CNT). Here, GBT refers to a transform obtained from a graphic when the relationship information between pixels is represented graphically. CNT refers to a transform generated based on the predicted signal generated using all previously reconstructed pixels. Furthermore, the transform processing can be applied to square pixel blocks of the same size or to blocks of variable size other than square.
[0063] Quantizer 233 quantizes the transform coefficients and sends them to entropy encoder 240, which encodes the quantized signal (information about the quantized transform coefficients) and outputs a bitstream. This information about the quantized transform coefficients can be referred to as residual information. Quantizer 233 can rearrange the block-type quantized transform coefficients into a one-dimensional vector based on the coefficient scan order, and generate information about the quantized transform coefficients based on this one-dimensional vector. Entropy encoder 240 can perform various encoding methods such as exponential Golomb, context-adaptive variable-length coding (CAVLC), and context-adaptive binary arithmetic coding (CABAC). Entropy encoder 240 can encode information required for video / image reconstruction other than the quantized transform coefficients (e.g., values of syntax elements, etc.) together or separately. The encoded information (e.g., encoded video / image information) can be sent or stored in NAL (Network Abstraction Layer) units as a bitstream. The video / image information may also include information about various parameter sets, such as Adaptive Parameter Set (APS), Picture Parameter Set (PPS), Sequence Parameter Set (SPS), or Video Parameter Set (VPS). Additionally, the video / image information may include general constraint information. In this document, information and / or syntactic elements transmitted from the encoding device / notified by signal to the decoding device may be included in the video / picture information. The video / image information may be encoded by the above-described encoding process and included in the bitstream. The bitstream may be transmitted via a network or stored in a digital storage medium. The network may include broadcast networks and / or communication networks, and the digital storage medium may include various storage media such as USB, SD, CD, DVD, Blu-ray, HDD, SSD, etc. A transmitter (not shown) that transmits the signal output from the entropy encoder 240 and / or a storage unit (not shown) that stores the signal may be included as an internal / external element of the encoding device 200, and alternatively, the transmitter may be included in the entropy encoder 240.
[0064] The quantized transform coefficients output from quantizer 233 can be used to generate a prediction signal. For example, the residual signal (residual block or residual sample) can be reconstructed by applying dequantization and inverse transform to the quantized transform coefficients via dequantizer 234 and inverse transformer 235. Adder 250 adds the reconstructed residual signal to the prediction signal output from inter-frame predictor 221 or intra-frame predictor 222 to generate a reconstructed signal (reconstructed frame, reconstructed block, reconstructed sample array). If the block to be processed has no residual (e.g., in the case of applying skip mode), the prediction block can be used as a reconstructed block. Adder 250 may be referred to as a reconstructor or reconstructed block generator. As described below, the generated reconstructed signal can be used for intra-frame prediction of the next block to be processed in the current frame and can be filtered for inter-frame prediction of the next frame.
[0065] In addition, luminance mapping with chroma scaling (LMCS) can be applied during screen encoding and / or reconstruction.
[0066] Filter 260 can improve subjective / objective image quality by applying filtering to the reconstructed signal. For example, filter 260 can generate a modified reconstructed image by applying various filtering methods to the reconstructed image and store the modified reconstructed image in memory 270 (specifically, the DPB of memory 270). Various filtering methods may include deblocking filtering, sample adaptive offset, adaptive loop filtering, bilateral filtering, etc. Filter 260 can generate various types of filtering-related information and send the generated information to entropy encoder 240, as described later in the description of each filtering method. The filtering-related information can be encoded by entropy encoder 240 and output as a bitstream.
[0067] The modified reconstructed frame sent to memory 270 can be used as a reference frame in inter-frame predictor 221. When inter-frame prediction is applied by the encoding device, prediction mismatch between the encoding device 200 and the decoding device can be avoided and encoding efficiency can be improved.
[0068] The DPB of memory 270 can store reconstructed frames modified for use as reference frames in inter-frame predictor 221. Memory 270 can store motion information of blocks in the current frame that derive (or encode) motion information and / or motion information of already reconstructed blocks in the frame. The stored motion information can be sent to inter-frame predictor 221 and used as motion information for spatially or temporally neighboring blocks. Memory 270 can store reconstructed samples of reconstructed blocks in the current frame and can transmit the reconstructed samples to intra-frame predictor 222.
[0069] Figure 3 This is a schematic diagram illustrating the configuration of a video / image decoding device to which embodiments of this disclosure can be applied.
[0070] Reference Figure 3 The decoding device 300 may include an entropy decoder 310, a residual processor 320, a predictor 330, an adder 340, a filter 350, and a memory 360. The predictor 330 may include an inter-frame predictor 332 and an intra-frame predictor 331. The residual processor 320 may include a dequantizer 321 and an inverse transformer 322. According to embodiments, the entropy decoder 310, residual processor 320, predictor 330, adder 340, and filter 350 may be configured by hardware components (e.g., a decoder chipset or processor). Additionally, the memory 360 may include a decoded picture buffer (DPB) or may be configured by a digital storage medium. The hardware components may also include the memory 360 as an internal / external component.
[0071] When the input includes a bitstream containing video / image information, the decoding device 300 can reconstruct and... Figure 2 The encoding device processes video / image information corresponding to the image. For example, the decoding device 300 can deduce units / blocks based on block segmentation information obtained from the bitstream. The decoding device 300 can use a processor applied in the encoding device to perform decoding. Therefore, for example, the processor for decoding can be an encoding unit, and the encoding unit can be segmented from encoding tree units or maximum encoding units according to a quadtree structure, binary tree structure, and / or ternary tree structure. One or more transform units can be derived from the encoding units. The reconstructed image signal decoded and output by the decoding device 300 can be reproduced by a reproduction device.
[0072] Decoding device 300 can receive from Figure 2The encoding device outputs a signal in the form of a bitstream, and the received signal can be decoded by the entropy decoder 310. For example, the entropy decoder 310 can parse the bitstream to derive information (e.g., video / image information) required for image reconstruction (or picture reconstruction). The video / image information may also include information about various parameter sets, such as adaptive parameter sets (APS), picture parameter sets (PPS), sequence parameter sets (SPS), or video parameter sets (VPS). In addition, the video / image information may also include general constraint information. The decoding device can also decode the picture based on the information about the parameter sets and / or general constraint information. The information and / or syntax elements notified / received by signals, as described later in this document, can be decoded and obtained from the bitstream through the decoding process. For example, the entropy decoder 310 decodes the information in the bitstream based on encoding methods such as exponential Golomb coding, CAVLC, or CABAC, and outputs the quantized values of the syntax elements and transform coefficients of the residuals required for image reconstruction. More specifically, the CABAC entropy decoding method receives bins corresponding to each syntactic element in the bitstream, determines a context model using information about the target syntactic element, decoding information about the target block, or information about symbols / bins decoded in a previous stage, and performs arithmetic decoding on the bins by predicting the probability of bin occurrence based on the determined context model, generating symbols corresponding to the values of each syntactic element. In this case, the CABAC entropy decoding method can update the context model after determining the context model by using the information of the decoded symbols / bins for the context model of the next symbol / bin. Information related to prediction from the information decoded by the entropy decoder 310 can be provided to the predictors (inter-frame predictor 332 and intra-frame predictor 331), and the residual values (i.e., quantized transform coefficients and related parameter information) from the entropy decoder 310 can be input to the residual processor 320. The residual processor 320 can derive residual signals (residual blocks, residual samples, residual sample arrays). Additionally, information about filtering from the information decoded by the entropy decoder 310 can be provided to the filter 350. Furthermore, a receiver (not shown) for receiving signals output from the encoding device may be configured as an internal / external element of the decoding device 300, or the receiver may be a component of the entropy decoder 310. Additionally, the decoding device according to this document may be referred to as a video / image / picture decoding device, and the decoding device may be classified as an information decoder (video / image / picture information decoder) and a sample decoder (video / image / picture sample decoder). The information decoder may include the entropy decoder 310, and the sample decoder may include at least one of a dequantizer 321, an inverse transformer 322, an adder 340, a filter 350, a memory 360, an inter-frame predictor 332, and an intra-frame predictor 331.
[0073] Dequantizer 321 can dequantize the quantized transform coefficients and output the transform coefficients. Dequantizer 321 can rearrange the quantized transform coefficients in a two-dimensional block format. In this case, the rearrangement can be performed based on the coefficient scan order performed in the encoding device. Dequantizer 321 can use quantization parameters (e.g., quantization step size information) to perform dequantization on the quantized transform coefficients and obtain the transform coefficients.
[0074] The inverse transformer 322 performs inverse transformation on the transformation coefficients to obtain the residual signal (residual block, residual sample array).
[0075] The predictor can perform prediction on the current block and generate a prediction block that includes prediction samples of the current block. The predictor can determine whether to apply intra-frame prediction or inter-frame prediction to the current block based on information about the prediction output from the entropy decoder 310, and can determine a specific intra-frame / inter-frame prediction mode.
[0076] Predictor 330 can generate prediction signals based on various prediction methods. For example, the predictor can not only apply intra-frame prediction or inter-frame prediction to predict a block, but also apply both intra-frame prediction and inter-frame prediction simultaneously. This can be referred to as combined intra-frame and inter-frame prediction (CIIP). Alternatively, the predictor can predict blocks based on an intra-block copy (IBC) prediction mode or a palette mode. IBC prediction mode or palette mode can be used for content image / video coding in games, such as screen content coding (SCC). IBC essentially performs prediction in the current frame, but can be performed similarly to inter-frame prediction, such that a reference block is derived in the current frame. That is, IBC can use at least one inter-frame prediction technique described in this document. Palette mode can be considered as an example of intra-frame coding or intra-frame prediction. When applying palette mode, sample values within the frame can be signaled based on information about the palette table and palette index.
[0077] Intra-predictor 331 can refer to samples in the current frame to predict the current block. Depending on the prediction mode, the referenced samples may be located near or separated from the current block. In intra-prediction, the prediction mode may include multiple non-directional modes and multiple directional modes. Intra-predictor 331 can use prediction modes applied to neighboring blocks to determine the prediction mode applied to the current block.
[0078] Inter-frame predictor 332 can deduce the predicted block of the current block based on a reference block (reference sample array) specified by a motion vector on a reference frame. In this case, to reduce the amount of motion information transmitted in inter-frame prediction mode, motion information can be predicted on a block, sub-block, or sample basis based on the correlation between motion information between neighboring blocks and the current block. Motion information may include motion vectors and reference frame indices. Motion information may also include inter-frame prediction direction (L0 prediction, L1 prediction, Bi prediction, etc.) information. In the case of inter-frame prediction, neighboring blocks may include spatially neighboring blocks existing in the current frame and temporally neighboring blocks existing in the reference frame. For example, inter-frame predictor 332 can configure a motion information candidate list based on neighboring blocks and deduce the motion vector and / or reference frame index of the current block based on the received candidate selection information. Inter-frame prediction can be performed based on various prediction modes, and the information about the prediction may include information indicating the inter-frame prediction mode of the current block.
[0079] Adder 340 generates a reconstruction signal (reconstructed frame, reconstruction block, reconstruction sample array) by adding the obtained residual signal to the prediction signal (prediction block, prediction sample array) output from the predictor (including inter-frame predictor 332 and / or intra-frame predictor 331). If the block to be processed has no residual, such as when a skip mode is applied, the prediction block can be used as a reconstruction block.
[0080] Adder 340 may be referred to as a reconstructor or reconstruction block generator. The generated reconstructed signal can be used for intra-frame prediction of the next block to be processed in the current frame, and can be output through filtering as described below, or it can be used for inter-frame prediction of the next frame.
[0081] In addition, Luminance Mapping with Chroma Scaling (LMCS) can be applied in the image decoding process.
[0082] Filter 350 can improve subjective / objective image quality by applying filtering to the reconstructed signal. For example, filter 350 can generate a modified reconstructed image by applying various filtering methods to the reconstructed image and store the modified reconstructed image in memory 360 (specifically, the DPB of memory 360). For example, various filtering methods may include deblocking filtering, sample adaptive shifting, adaptive loop filtering, bilateral filtering, etc.
[0083] The (modified) reconstructed frame stored in the DPB of memory 360 can be used as a reference frame in inter-frame predictor 332. Memory 360 can store motion information of blocks in the current frame from which motion information is derived (or decoded) and / or motion information of already reconstructed blocks in the frame. The stored motion information can be sent to inter-frame predictor 332 as motion information of spatially or temporally neighboring blocks. Memory 360 can store reconstructed samples of reconstructed blocks in the current frame and transmit the reconstructed samples to intra-frame predictor 331.
[0084] In this disclosure, the embodiments described in the filter 260, inter-frame predictor 221, and intra-frame predictor 222 of the encoding device 200 can be applied in the same way as or respectively corresponding to the filter 350, inter-frame predictor 332, and intra-frame predictor 331 of the decoding device 300. This can also be applied to the inter-frame predictor 332 and the intra-frame predictor 331.
[0085] As described above, during video encoding, prediction is performed to enhance compression efficiency. A prediction block, which includes prediction samples of the current block (i.e., the target coding block), can be generated through prediction. In this case, the prediction block includes prediction samples in the spatial domain (or pixel domain). The prediction block is derived identically in both the encoding and decoding devices. The encoding device can enhance image coding efficiency by signaling information (residual information) about the residual between the original block (rather than the original sample values of the original block) and the prediction block to the decoding device. The decoding device can derive a residual block including residual samples based on the residual information, generate a reconstructed block including reconstructed samples by adding the residual block and the prediction block, and generate a reconstructed image including the reconstructed block.
[0086] Residual information can be generated through transform and quantization processes. For example, an encoding device can derive a residual block between the original block and the prediction block, derive transform coefficients by performing a transform process on residual samples (residual sample arrays) included in the residual block, derive quantized transform coefficients by performing a quantization process on the transform coefficients, and signal the relevant residual information (via a bitstream) to the decoding device. In this case, the residual information may include information such as the values of the quantized transform coefficients, their positions, the transform scheme, the transform core, and the quantization parameters. The decoding device can perform dequantization / inverse transform processes based on the residual information and derive residual samples (or residual blocks). The decoding device can generate a reconstructed frame based on the prediction block and the residual block. Furthermore, the encoding device can derive residual blocks by performing dequantization / inverse transform on the quantized transform coefficients used as inter-frame prediction references for subsequent frames and can generate a reconstructed frame.
[0087] Figure 4An example of a decoding process based on HMVP candidates is shown. Here, the decoding process based on HMVP candidates may include an inter-frame prediction process based on HMVP candidates.
[0088] Reference Figure 4 The decoding device loads an HMVP table including HMVP candidates and decodes blocks based on at least one HMVP candidate. For example, the decoding device can derive motion information for the current block based on at least one HMVP candidate, perform inter-frame prediction on the current block based on the motion information, and derive a predicted block (including predicted samples). As described above, a reconstructed block can be generated based on the predicted block. The motion information derived from the current block can be updated in the table. In this case, the motion information can be added as the last entry of the table as a new HMVP candidate. If the number of existing HMVP candidates in the table is equal to the size of the table, the candidate first added to the table is deleted, and the derived motion information can be added as the last entry of the table as a new HMVP candidate.
[0089] Figure 5a and Figure 5b This illustrates the process of updating the HMVP buffer according to one implementation. More specifically, Figure 5a This shows how to update the HMVP table according to the FIFO rule. Figure 5b This shows how to update the HMVP table according to the finite FIFO rule.
[0090] First-In-First-Out (FIFO) rules can be applied to Figure 5a The table shown. For example, when the table size S is 16, its indicator table can include 16 HMVP candidates. If there are more than 16 HMVP candidates from previously encoded blocks, a FIFO rule can be applied, so that the table can contain up to 16 most recently encoded motion information candidates. In this case, as Figure 5a As shown, the FIFO rule is applied so that the oldest HMVP candidate can be removed, but new HMVP candidates can be added.
[0091] In addition, to further improve coding efficiency, one can, for example... Figure 5b The example shown applies a finite FIFO rule. (Refer to...) Figure 5b When an HMVP candidate is inserted into a table, a redundancy check is first applied. This check determines if an HMVP candidate with the same motion information already exists in the table. If the table contains HMVP candidates with the same motion information, these candidates are removed from the table. The remaining HMVP candidates are then shifted to the next position (i.e., every index - 1), and a new HMVP candidate can then be inserted.
[0092] As described above, HMVP candidates can be used during the configuration of the merge candidate list. For example, all HMVP candidates from the last entry to the first entry of the table can be inserted after the spatial merge candidate and the temporal merge candidate. In this case, a pruning check can be applied to the HMVP candidates. A signal can be used to indicate the maximum allowed number of merge candidates, and the process of configuring the merge candidate list can terminate when the total number of available merge candidates reaches the maximum allowed number.
[0093] Similarly, HMVP candidates can also be used during the configuration of the (A)MVP candidate list. In this case, the motion vectors of the last k HMVP candidates in the HMVP table can be added next to the TMVP candidates that make up the MVP candidate list. For example, an HMVP candidate with the same reference frame as the MVP target reference frame can be used to configure the MVP candidate list. Here, the MVP target reference frame can refer to the reference frame used for inter-frame prediction of the current block in MVP mode. In this case, a pruning check can be applied to the HMVP candidates. For example, k can be set to 4. However, the specific value of k is only an example, and k can have various other values such as 1, 2, 3, and 4.
[0094] Furthermore, when the total number of merge candidates is equal to or greater than 15, the truncated unary plus fixed-length (3-bit) binarization method can be applied to merge index coding, as shown in Table 1.
[0095] [Table 1]
[0096] The table above assumes N mrg =15, where N mrg This represents the total number of candidates to be merged.
[0097] Figures 6 to 13 An HMVP method according to a partial implementation is shown.
[0098] Some implementations may provide a method for deriving motion information from an HMVP buffer as prediction candidates in a process that uses motion information from a buffer as prediction candidates. The HMVP may perform a process in which motion information of the current block is pushed to the motion buffer and prediction candidates for the current block are derived by popping the most recent motion information from the motion buffer during the process of configuring motion candidates for the next block. At this time, a pruning process may be performed to check the similarity between the popped prediction candidates and existing prediction candidates (e.g., motion information of neighboring blocks or temporal motion information).
[0099] Without pruning, the prediction candidate list might be configured using the same motion information, as the most recently pushed motion information stored in the HMVP buffer is very likely to be the same as the motion information of already configured neighboring blocks. However, since pruning always requires a comparison operation with previously configured prediction candidates, it not only increases coding complexity but may also be less efficient in compression compared to other methods of configuring prediction candidates that do not perform pruning but exhibit the same / similar level of computational complexity. Therefore, in one implementation, pruning can be removed, and a method for handling popped prediction candidates can be proposed to improve prediction performance.
[0100] According to Figure 6 In one implementation, a method is proposed to pop the oldest prediction candidate from the HMVP buffer first, rather than the most recent prediction candidate.
[0101] When Figure 6 When configuring the buffer as shown, motion information for the current block can be decoded, and motion information can be stored based on the last buffer index of an empty buffer. Motion information based on the first buffer index (i.e., the lowest buffer index) can be used to decode the motion information for the next block of the current block. This method can provide not only... Figure 7 The example shown demonstrates the efficiency of deriving motion information that differs from that of neighboring motion information, and also provides the ability to decode the current block using motion information from a spatially distant location.
[0102] According to Figure 8 In one implementation, a method is proposed for determining prediction candidates to be popped from the HMVP buffer based on undersampling. More specifically, see [link to implementation details]. Figure 8 Motion information stored in the buffer can be undersampled to be used as prediction candidates.
[0103] In one example, the oldest motion information (or the one with the smallest HMVP buffer index) can be used after undersampling. (See reference...) Figure 9 During the encoding of the current block, the undersampled motion information 0, 2, and 4 (or motion information based on HMVP buffer indices 0, 2, and 4) can be used as prediction candidates.
[0104] In another example, recent motion information can be used in undersampling. (See reference...) Figure 10 Undersampling can be performed, but the motion information from the most recently undersampled motion can be used as a prediction candidate.
[0105] In another example, if not all prediction candidates for the current block are configured even when prediction candidates are popped sequentially while configuring prediction candidates using undersampling, prediction candidates can be configured by sequentially selecting the unselected candidates again from undersampling. (See also...) Figure 11 If undersampling is used to configure motion information but not all prediction candidates for the current block are configured, undersampling is performed again. Then, motion information that is different from the motion information derived through the first undersampling can be used as prediction candidates.
[0106] According to Figure 12 In one implementation, the prediction candidates popped from the HMVP buffer are not used immediately; instead, average motion information of the popped prediction candidates can be calculated, and the calculated motion information can be used as prediction candidates.
[0107] Reference Figure 12 When using motion information, the stored motion information is not used directly; instead, motion information is obtained by calculating its average with other motion information. Although Figure 12 This demonstrates the operation of popping the oldest motion information from the HMVP buffer first, but also allows popping the most recent motion information. In this case, since the motion information used as a prediction candidate when no pruning is performed is not the most recently stored motion information, the probability of generating the same motion information as the neighboring motion information can be lower.
[0108] Reference Figure 13 Instead of immediately using the prediction candidates popped from the HMVP buffer, an offset is applied to the motion vector, and the motion information with the offset applied is used as a prediction candidate. Figure 13 The example shown is the pop-up motion information MV_0 used as a prediction candidate after adding an offset.
[0109] Figure 14 This is a flowchart illustrating the operation of an encoding device according to one embodiment. Figure 15 This is a block diagram illustrating the structure of an encoding device according to one embodiment.
[0110] according to Figure 14 and Figure 15 The encoding device can perform and according to Figure 16 and Figure 17 The operation corresponding to the decoding device. Therefore, refer to the following... Figure 16 and Figure 17 The operation of the described decoding device can be in accordance with... Figure 14 and Figure 15 The same method is used for encoding devices.
[0111] Figure 14 The steps shown can be derived from Figure 2 The encoding device 200 shown performs this operation. More specifically, steps S1400 to S1450 can be performed by... Figure 2 The predictor 220 disclosed in the document executes the steps of S1460, which can be performed by... Figure 2 The residual processor 230 disclosed herein executes step S1470, which can be performed by... Figure 2 The entropy encoder 240 disclosed herein is executed. Furthermore, the operations according to steps S1400 to S1420 are based on a reference. Figure 3 The given description is partial. Therefore, omissions or simplifications will be made in relation to the reference. Figure 2 and Figure 3 The details described are repeated in the specific description.
[0112] like Figure 15 As shown, an encoding device according to one embodiment may include a predictor 220, a residual processor 230, and an entropy encoder 240. However, depending on the circumstances, it is not... Figure 15 All the components shown can be essential components of an encoding device, which can use... Figure 15 This is achieved by using more or fewer of the constituent elements shown.
[0113] In an encoding device according to one embodiment, the predictor 220, the residual processor 230, and the entropy encoder 240 may be implemented by corresponding chips, or at least two or more constituent elements may be implemented using a single chip.
[0114] According to one embodiment, an encoding device may configure an AMVP candidate list S1400 that includes at least one AMVP candidate for the current block. More specifically, the predictor 220 of the encoding device may configure an AMVP candidate list that includes at least one AMVP candidate for the current block.
[0115] According to one embodiment, an encoding device can derive an HMVP candidate list for the current block, the HMVP candidate list including HMVP candidates for the current block S1410. More specifically, the predictor 220 of the encoding device can derive an HMVP candidate list for the current block, the HMVP candidate list including HMVP candidates for the current block.
[0116] According to one embodiment, the encoding device may select at least one HMVP candidate from the HMVP candidates in the HMVP candidate list S1420. More specifically, the predictor 220 of the encoding device may select at least one HMVP candidate from the HMVP candidates in the HMVP candidate list. In one example, selecting at least one HMVP candidate may be considered as popping at least one HMVP candidate.
[0117] According to one embodiment, the encoding device can derive an updated AMVP candidate list S1430 by adding at least one HMVP candidate to the AMVP candidate list. More specifically, the predictor 220 of the encoding device can derive an updated AMVP candidate list by adding at least one HMVP candidate to the AMVP candidate list.
[0118] According to one embodiment, the encoding device can derive motion information S1440 of the current block based on an updated AMVP candidate list. More specifically, the predictor 220 of the encoding device can derive motion information of the current block based on an updated AMVP candidate list.
[0119] According to one embodiment, the encoding device can derive a prediction sample S1450 for the current block based on the motion information of the current block. More specifically, the predictor 220 of the encoding device can derive a prediction sample for the current block based on the motion information of the current block.
[0120] According to one embodiment, the encoding device can derive the residual sample S1460 of the current block based on the predicted sample of the current block. More specifically, the residual processor 230 of the encoding device can derive the residual sample of the current block based on the predicted sample of the current block.
[0121] According to one embodiment, the encoding device can encode image information including information about residual samples (S1470). More specifically, the entropy encoder 240 of the encoding device can encode image information including information about residual samples.
[0122] In one implementation, selecting at least one HMVP candidate may include applying a pruning process to one of the HMVP candidates based on at least one AMVP candidate in the AMVP candidate list and determining whether to add the HMVP candidate to the AMVP candidate list based on the pruning process.
[0123] In one implementation, selecting at least one HMVP candidate may include selecting at least one HMVP candidate from the HMVP candidates based on an index of the HMVP candidate list.
[0124] In one implementation, each application in at least one HMVP candidate may not be pruned based on at least one AMVP candidate.
[0125] In one implementation, at least one HMVP candidate may include the HMVP candidate with the smallest HMVP candidate list index among the HMVP candidates.
[0126] An encoding device according to one embodiment can derive sampled HMVP candidates by applying sampling to HMVP candidates. At least one HMVP candidate can be selected from the sampled HMVP candidates based on an index of the HMVP candidate list of sampled HMVP candidates.
[0127] According to one embodiment, the encoding device can update the HMVP candidate list based on the motion information of the current block. The updated HMVP candidate list can be used to deduce the motion information of blocks encoded after the current block.
[0128] according to Figure 14 and Figure 15 The encoding device and method for operating the encoding device are described. The encoding device can configure an AMVP candidate list including at least one AMVP candidate for the current block (S1400); derive an HMVP candidate list for the current block, the HMVP candidate list including HMVP candidates for the current block (S1410); select at least one HMVP candidate from the HMVP candidates in the HMVP candidate list (S1420); derive an updated AMVP candidate list by adding at least one HMVP candidate to the AMVP candidate list (S1430); derive motion information for the current block based on the updated AMVP candidate list (S1440); derive prediction samples for the current block based on the motion information of the current block (S1450); derive residual samples for the current block based on the prediction samples of the current block (S1460); and encode image information including information about the residual samples (S1470). In other words, when configuring prediction candidates based on HMVP for inter-frame prediction, the increased complexity due to pruning can be prevented by omitting the pruning process.
[0129] Figure 16 This is a block diagram illustrating the operation of a decoding device according to one embodiment. Figure 17 This is a block diagram illustrating the structure of a decoding device according to one embodiment.
[0130] Figure 16 The steps shown can be derived from Figure 3 The decoding device 300 shown performs this operation. More specifically, steps S1600 to S1650 can be performed by... Figure 3 The predictor 330 disclosed in the document executes the steps of S1660, which can be performed by... Figure 3 The adder 340 disclosed herein is executed. Furthermore, the operation according to steps S1600 to S1660 is based on reference... Figures 4 to 13 The given description is partial. Therefore, omissions or simplifications will be made in relation to the reference. Figures 3 to 13 The details described are repeated in the specific description.
[0131] like Figure 17 As shown, a decoding device according to one embodiment may include an entropy decoder 310, a predictor 330, and an adder 340. However, depending on the circumstances, it is not... Figure 17 All the components shown can be essential components of a decoding device, which can use... Figure 17 This is achieved by using more or fewer of the constituent elements shown.
[0132] In a decoding device according to one embodiment, the predictor 330 and the adder 340 may be implemented by respective chips, or at least two or more constituent elements may be implemented using a single chip.
[0133] According to one embodiment, a decoding device may configure an AMVP candidate list S1600 for the current block, including at least one AMVP candidate. More specifically, the predictor 330 of the decoding device may configure an AMVP candidate list for the current block, including at least one AMVP candidate.
[0134] According to one embodiment, a decoding device can derive an HMVP candidate list for the current block, the HMVP candidate list including HMVP candidates for the current block S1610. More specifically, the predictor 330 of the decoding device can derive an HMVP candidate list for the current block, the HMVP candidate list including HMVP candidates for the current block.
[0135] According to one embodiment, the decoding device may select at least one HMVP candidate from the HMVP candidates in the HMVP candidate list S1620. More specifically, the predictor 330 of the decoding device may select at least one HMVP candidate from the HMVP candidates in the HMVP candidate list. In one example, selecting at least one HMVP candidate may be considered as popping at least one HMVP candidate.
[0136] According to one embodiment, a decoding device can derive an updated AMVP candidate list S1630 by adding at least one HMVP candidate to the AMVP candidate list. More specifically, the predictor 330 of the decoding device can derive an updated AMVP candidate list by adding at least one HMVP candidate to the AMVP candidate list.
[0137] According to one embodiment, the decoding device can deduce the motion information of the current block S1640 based on an updated AMVP candidate list. More specifically, the predictor 330 of the decoding device can deduce the motion information of the current block based on the updated AMVP candidate list.
[0138] According to one embodiment, the decoding device can derive a prediction sample S1650 of the current block based on the motion information of the current block. More specifically, the predictor 330 of the decoding device can derive a prediction sample of the current block based on the motion information of the current block.
[0139] According to one embodiment, the decoding device can generate a reconstructed sample of the current block based on the predicted sample of the current block S1660. More specifically, the adder 340 of the decoding device can generate a reconstructed sample of the current block based on the predicted sample of the current block.
[0140] In one implementation, selecting at least one HMVP candidate may include applying a pruning process to one of the HMVP candidates based on at least one AMVP candidate in the AMVP candidate list and determining whether to add the HMVP candidate to the AMVP candidate list based on the pruning process.
[0141] In one implementation, selecting at least one HMVP candidate may include selecting at least one HMVP candidate from the HMVP candidates based on an index of the HMVP candidate list.
[0142] In one implementation, each application in at least one HMVP candidate may not be pruned based on at least one AMVP candidate.
[0143] In one implementation, at least one HMVP candidate may include the HMVP candidate with the smallest HMVP candidate list index.
[0144] According to one embodiment, a decoding device can deduce sampled HMVP candidates by applying sampling to HMVP candidates. At least one HMVP candidate can be selected from the sampled HMVP candidates based on an index of the HMVP candidate list of sampled HMVP candidates.
[0145] According to one embodiment, a decoding device can update the HMVP candidate list based on the motion information of the current block. The updated HMVP candidate list can be used to deduce the motion information of blocks decoded after the current block.
[0146] according to Figure 16 and Figure 17 The decoding device and the method of operating the decoding device are described. The decoding device can configure an AMVP candidate list including at least one AMVP candidate for the current block (S1600); derive an HMVP candidate list for the current block, the HMVP candidate list including HMVP candidates for the current block (S1610); select at least one HMVP candidate from the HMVP candidates in the HMVP candidate list (S1620); derive an updated AMVP candidate list by adding at least one HMVP candidate to the AMVP candidate list (S1630); derive motion information for the current block based on the updated AMVP candidate list (S1640); derive prediction samples for the current block based on the motion information of the current block (S1650); and generate reconstructed samples for the current block based on the prediction samples of the current block (S1660). In other words, when configuring prediction candidates based on HMVP for inter-frame prediction, the increased complexity due to pruning can be prevented by omitting the pruning process.
[0147] In the above embodiments, the method is described based on a flowchart having a series of steps or blocks. However, this disclosure is not limited to the order of the steps or blocks described above. As mentioned above, some steps may occur simultaneously with other steps or in a different order than other steps. Furthermore, those skilled in the art will understand that the steps shown in the above flowchart are not exclusive and may include other steps, or one or more steps in the flowchart may be deleted without affecting the scope of this disclosure.
[0148] The method described above can be implemented in software. The encoding and / or decoding apparatus according to the present disclosure can be included in an apparatus for performing image processing, such as a TV, computer, smartphone, set-top box, or display device.
[0149] When the embodiments of this disclosure are implemented in software, the methods described above can be implemented by modules (processes, functions, etc.) that perform the functions described above. These modules can be stored in memory and executed by a processor. The memory can be internal or external to the processor, and the memory can be connected to the processor using various well-known means. The processor may include application-specific integrated circuits (ASICs), other chipsets, logic circuits, and / or data processing devices. The memory may include ROM (read-only memory), RAM (random access memory), flash memory, memory cards, storage media, and / or other storage devices. That is, the embodiments described in this disclosure can be implemented and executed on a processor, microprocessor, controller, or chip. For example, the functional units shown in the various figures can be implemented and executed on a computer, processor, microprocessor, controller, or chip. In this case, the information used for implementation (e.g., information about instructions) or algorithms can be stored in a digital storage medium.
[0150] Furthermore, the decoding and encoding devices employing this disclosure can be used in multimedia communication devices such as multimedia broadcasting transmitters and receivers, mobile communication terminals, home theater video devices, digital cinema video devices, surveillance cameras, video chat devices, (3D) video devices, video telephony devices, and medical video devices. These devices can be included in, for example, storage media, cameras, video-on-demand (VoD) service providers, OTT video (overhead video), internet streaming service providers, 3D video devices, virtual reality (VR) devices, augmented reality (AR) devices, video calling devices, and transportation terminals (e.g., vehicle (including autonomous vehicle) terminals, aircraft terminals, ship terminals, etc.), and can be used to process video or data signals. For example, OTT video (overhead video) devices may include game consoles, Blu-ray players, internet access TVs, home theater systems, smartphones, tablet PCs, and digital video recorders (DVRs).
[0151] Furthermore, the processing method applied to this disclosure can be generated in the form of a computer-executable program and can be stored in a computer-readable recording medium. Multimedia data having the data structure according to this disclosure can also be stored in a computer-readable recording medium. Computer-readable recording media include all types of storage devices and distributed storage devices for storing computer-readable data. For example, computer-readable recording media can be Blu-ray discs (BD), Universal Serial Bus (USB), ROM, PROM, EPROM, EEPROM, RAM, CD-ROM, magnetic tape, floppy disks, and optical data storage devices. Additionally, computer-readable recording media include media implemented in the form of carrier waves (e.g., transmission via the Internet). Furthermore, the bitstream generated by this encoding method can be stored in a computer-readable recording medium or transmitted via wired and wireless communication networks.
[0152] Furthermore, the embodiments of this disclosure can be implemented as computer program products using program code, and the program code can be executed in a computer according to the embodiments of this disclosure. The program code can be stored on a carrier that can be read by a computer.
[0153] Figure 18 This is a diagram showing the structure of the content flow system.
[0154] Reference Figure 18 The content streaming system using this disclosure may mainly include an encoding server, a streaming server, a web server, a media storage device, a user device, and a multimedia input device.
[0155] An encoding server is used to compress content input from multimedia input devices (e.g., smartphones, cameras, and camcorders) into digital data to generate a bitstream, and then sends the bitstream to a streaming server. As another example, if the multimedia input device (e.g., smartphone, camera, and camcorder) generates the bitstream directly, the encoding server can be omitted.
[0156] Bitstreams can be generated by applying the encoding method or bitstream generation method disclosed herein, and the stream server can temporarily store bitstreams during the sending or receiving of bitstreams.
[0157] A streaming server is used to send multimedia data to a user device based on a user request via a web server, and the web server acts as a medium to inform the user which services are available. When a user requests a desired service from the web server, the web server forwards the user's request to the streaming server, and the streaming server sends the multimedia data to the user. In this case, the content streaming system may include a separate control server, which controls the commands / responses between devices within the content streaming system.
[0158] A streaming server can receive content from media storage devices and / or encoding servers. For example, when content is received from an encoding server, the streaming server can receive the content in real time. In this case, to provide a smooth streaming service, the streaming server can store the bitstream for a predetermined period of time.
[0159] Examples of user devices include portable telephones, smartphones, laptops, digital broadcasting terminals, personal digital assistants (PDAs), portable multimedia players (PMPs), navigation devices, slate PCs, tablet PCs, ultrabooks, wearable devices (e.g., smartwatches, smart glasses, head-mounted displays (HMDs)), digital TVs, desktop computers, digital signage, etc.
[0160] Each server within a content streaming system can be operated by a distributed server, and in this case, the data received by each server can be distributed and processed.
Claims
1. A decoding device for image decoding, the decoding device comprising: Memory; as well as At least one processor connected to the memory, the at least one processor being configured to: Receive residual information from the bitstream; Configure the current block in the current frame with an MVP candidate list that includes at least one motion vector prediction MVP candidate; Derive a historical motion vector prediction HMVP buffer for the current block, the HMVP buffer including HMVP candidates for the current block; An updated MVP candidate list is derived by adding at least one additional candidate, wherein the at least one additional candidate is derived by using at least one HMVP candidate selected from the HMVP candidates in the HMVP buffer; The motion information of the current block is derived based on the updated MVP candidate list; Based on the motion information of the current block, the prediction sample of the current block is derived; Based on the residual information, the residual sample of the current block is derived; and Based on the predicted samples and the residual samples of the current block, a reconstruction sample for the current block is generated. Specifically, at least one HMVP candidate is selected from the HMVP candidates based on the HMVP buffer index. The at least one HMVP candidate is selected based on a priority order. The priority order is based on the HMVP buffer index. Among them, HMVP candidates with relatively lower HMVP buffer indices have higher priority than HMVP candidates with relatively higher HMVP buffer indices, and Wherein, the reference frame of the at least one HMVP candidate is equal to the reference frame of the current block, and the at least one HMVP candidate is used to derive the at least one additional candidate without checking whether the motion information of the at least one HMVP candidate is equal to the motion information of the at least one MVP candidate included in the MVP candidate list.
2. An encoding device for image encoding, the encoding device comprising: Memory; as well as At least one processor connected to the memory, the at least one processor being configured to: Configure the current block in the current frame with an MVP candidate list that includes at least one motion vector prediction MVP candidate; Derive a historical motion vector prediction HMVP buffer for the current block, the HMVP buffer including HMVP candidates for the current block; An updated MVP candidate list is derived by adding at least one additional candidate, wherein the at least one additional candidate is derived by using at least one HMVP candidate selected from the HMVP candidates in the HMVP buffer; The motion information of the current block is derived based on the updated MVP candidate list; Based on the motion information of the current block, the prediction sample of the current block is derived; The residual sample of the current block is derived based on the predicted sample of the current block; Residual information is generated based on the residual samples of the current block; and The image information, including the residual information, is encoded. Specifically, at least one HMVP candidate is selected from the HMVP candidates based on the HMVP buffer index. The at least one HMVP candidate is selected based on a priority order. The priority order is based on the HMVP buffer index. Among them, HMVP candidates with relatively lower HMVP buffer indices have higher priority than HMVP candidates with relatively higher HMVP buffer indices, and Wherein, the reference frame of the at least one HMVP candidate is equal to the reference frame of the current block, and the at least one HMVP candidate is used to derive the at least one additional candidate without checking whether the motion information of the at least one HMVP candidate is equal to the motion information of the at least one MVP candidate included in the MVP candidate list.
3. A computer-readable storage medium for storing a bitstream generated by an encoding device for image encoding according to claim 2.
4. An apparatus for transmitting data for an image, the apparatus comprising: At least one processor is configured to obtain a bitstream for the image, wherein the bitstream is generated based on the following operations: configuring an MVP candidate list including at least one motion vector prediction MVP candidate for a current block in the current frame; deriving a history-based motion vector prediction HMVP buffer for the current block, the HMVP buffer including HMVP candidates for the current block; deriving an updated MVP candidate list by adding at least one additional candidate, wherein the at least one additional candidate is derived by using at least one HMVP candidate selected from the HMVP candidates in the HMVP buffer; deriving motion information of the current block based on the updated MVP candidate list; deriving a prediction sample of the current block based on the motion information of the current block; deriving a residual sample of the current block based on the prediction sample of the current block; generating residual information based on the residual sample of the current block; and encoding image information including the residual information; and A transmitter configured to send the data including the bit stream. Specifically, at least one HMVP candidate is selected from the HMVP candidates based on the HMVP buffer index. The at least one HMVP candidate is selected based on a priority order. The priority order is based on the HMVP buffer index. Among them, HMVP candidates with relatively lower HMVP buffer indices have higher priority than HMVP candidates with relatively higher HMVP buffer indices, and Wherein, the reference frame of the at least one HMVP candidate is equal to the reference frame of the current block, and the at least one HMVP candidate is used to derive the at least one additional candidate without checking whether the motion information of the at least one HMVP candidate is equal to the motion information of the at least one MVP candidate included in the MVP candidate list.