Video encoding method, video decoding method, and method of transmitting a bitstream

By calculating texture complexity in bidirectional optical flow motion compensation and selectively applying BIO processing, the problem of high computational complexity is solved, and image quality stability is achieved.

CN116744018BActive Publication Date: 2026-06-19SK TELECOM CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
SK TELECOM CO LTD
Filing Date
2018-08-29
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Existing bidirectional optical flow (BIO) motion compensation techniques have high computational complexity, leading to a decrease in image quality.

Method used

Motion vectors are generated by referencing the first and second reference images. The texture complexity of the current block is calculated, and BIO processing is selectively applied or skipped based on the texture complexity to generate the predicted block for the current block.

🎯Benefits of technology

This reduces the computational complexity of bidirectional optical flow (BIO) while minimizing image quality degradation.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN116744018B_ABST
    Figure CN116744018B_ABST
Patent Text Reader

Abstract

Video encoding methods, video decoding methods, and methods for transmitting bitstreams are disclosed. A method for estimating adaptive bidirectional optical flow (BIO) for inter-frame prediction correction during video encoding processing is disclosed, and more specifically, the method includes the steps of: generating a first reference block from a first motion vector of a first reference image, and generating a second reference block from a second motion vector of a second reference image; performing texture complexity calculations on the current block using the first and second reference blocks; and generating a prediction block for the current block, wherein the prediction block is generated by selectively using the first and second reference blocks based on texture complexity according to the BIO processing, or by using the first and second reference blocks without applying BIO processing. This invention can reduce the complexity and / or cost of pixel-level or sub-block-level bidirectional optical flow.
Need to check novelty before this filing date? Find Prior Art

Description

[0001] This application is a divisional application of the original invention patent application No. 201880056369.0 (International Application No.: PCT / KR2018 / 009940, Application Date: August 29, 2018, Invention Title: Motion Compensation Method and Device Using Bidirectional Optical Flow). Technical Field

[0002] This disclosure relates to image encoding or decoding. More specifically, this disclosure relates to bidirectional optical flow for motion compensation. Background Technology

[0003] The statements in this section are provided only as background information in connection with this disclosure and may not constitute prior art.

[0004] In video coding, compression is performed using data redundancy in both spatial and temporal dimensions. Spatial redundancy is significantly reduced through transform coding, while temporal redundancy is reduced through predictive coding. Temporal correlation is observed to be maximized along motion trajectories; therefore, predictions are used for motion compensation. In this context, the primary objective of motion estimation is not to find the "real" motion in the scene, but to maximize compression efficiency. In other words, the motion vectors must provide accurate predictions of the signal. Furthermore, since motion information must be transmitted as overhead in the compressed bitstream, compressed representation must be enabled. Effective motion estimation is crucial for achieving high compression in video coding.

[0005] Motion is a crucial source of information in video sequences. Motion occurs not only due to the movement of objects but also due to camera movement. Visual motion (also known as optical flow) captures the spatiotemporal changes in pixel intensity within an image sequence.

[0006] Two-way optical flow (BIO) is a motion estimation / compensation technique based on assumptions of optical flow and stable motion, disclosed in JCTVC-C204 and VCEG-AZ05 BIO. The currently discussed two-way optical flow estimation method has advantages in allowing for fine-grained correction of motion vector information; however, it requires higher computational complexity compared to conventional two-way prediction used for fine-grained correction of motion vector information.

[0007] Non-Patent Document 1: JCTVC-C204 (E. Alshina et al., Bidirectional Optical Flow, Joint Collaborative Working Group on Video Coding (JCT-VC) of ITU-T SG 16WP 3 and ISO / IEC JTC 1 / SC 29 / WG 11, 3rd Meeting: Guangzhou, China, October 7-15, 2010)

[0008] Non-Patent Document 2: VCEG-AZ05 (E. Alshina et al., Performance Study of Known Tools for Next-Generation Video Coding, ITU-T SG 16 Problem 6, Video Coding Experts Group (VCEG), 52nd Meeting: June 19-26, 2015, Warsaw, Poland) Summary of the Invention

[0009] Technical issues

[0010] The purpose of this disclosure is to reduce the computational complexity of bidirectional optical flow (BIO) while minimizing image quality degradation.

[0011] Technical solution

[0012] According to one aspect of this disclosure, a method for motion compensation using bidirectional optical flow (BIO) during video encoding or decoding is provided, the method comprising the steps of: generating a first reference block by referencing a first motion vector of a first reference image, and generating a second reference block by referencing a second motion vector of a second reference image; calculating the texture complexity of a current block using the first and second reference blocks; and generating a predicted block of the current block based on the first and second reference blocks by selectively applying or skipping BIO processing based on the texture complexity.

[0013] According to another aspect of this disclosure, an apparatus is provided for performing motion compensation using bidirectional optical flow (BIO) during video encoding or decoding. The apparatus includes: a reference block generator configured to generate a first reference block by referencing a first motion vector of a first reference image and a second reference block by referencing a second motion vector of a second reference image; a skip determiner configured to calculate the texture complexity of a current block using the first and second reference blocks and to determine whether to skip BIO processing by comparing the texture complexity with a threshold; and a prediction block generator configured to selectively apply or skip BIO processing based on the determination of the skip determiner, generating a prediction block of the current block based on the first and second reference blocks. Attached Figure Description

[0014] Figure 1 This is an exemplary block diagram of a video encoding apparatus according to embodiments of the present disclosure;

[0015] Figure 2 This is a diagram showing the adjacent blocks of the current block;

[0016] Figure 3 This is an exemplary block diagram of a video decoding apparatus according to embodiments of the present disclosure;

[0017] Figure 4 This is a reference diagram used to explain the basic concepts of BIO;

[0018] Figure 5 This is a schematic diagram of the shape of a mask centered on the current pixel in pixel-based BIO;

[0019] Figure 6 This is a schematic diagram used to explain how to set brightness values ​​and gradients for pixels outside the reference block within a mask in a filling manner.

[0020] Figure 7 This is a schematic diagram of the shape of a mask centered on a sub-block in sub-block-based BIO.

[0021] Figure 8 This is a schematic diagram used to explain the application of masks pixel by pixel in sub-block-based BIO.

[0022] Figure 9 This is a schematic diagram of the shape of another mask centered on a sub-block in sub-block-based BIO;

[0023] Figure 10 This is a block diagram illustrating the configuration of a device according to an embodiment of the present disclosure, configured to perform motion compensation by selectively applying BIO processing;

[0024] Figure 11 This is a schematic diagram illustrating a process of performing motion compensation by selectively applying BIO processing based on the texture complexity of the current block according to an embodiment of the present disclosure;

[0025] Figure 12 This is another schematic diagram illustrating a process of performing motion compensation by selectively applying BIO processing based on the texture complexity of the current block according to an embodiment of the present disclosure;

[0026] Figure 13 This is yet another schematic diagram illustrating the process of performing motion compensation by selectively applying BIO processing based on the texture complexity of the current block according to an embodiment of the present disclosure.

[0027] Figure 14 This is a schematic diagram illustrating a process of performing motion compensation by selectively applying BIO processing based on the current block size and the encoding pattern of the motion vector according to an embodiment of the present disclosure;

[0028] Figure 15 This is a schematic diagram illustrating a process for performing motion compensation by selectively applying BIO processing based on CVC and BCC conditions according to an embodiment of the present disclosure; and

[0029] Figure 16This is a schematic diagram illustrating a process for performing motion compensation by selectively applying BIO processing based on the motion vector variance of adjacent blocks according to an embodiment of the present disclosure. Detailed Implementation

[0030] In the following description, some embodiments of the invention will be described in detail with reference to the accompanying drawings. It should be noted that although elements are shown in different drawings, the same reference numerals denote the same elements when the component elements in the various drawings are added. Furthermore, in the following description of the invention, detailed descriptions of known functions and configurations incorporated herein will be omitted where such detailed descriptions might obscure the subject matter of the invention.

[0031] The techniques disclosed herein generally relate to reducing the complexity and / or cost of bidirectional optical flow (BIO) techniques. BIO can be applied during motion compensation. Typically, BIO is used to calculate motion vectors for each pixel or sub-block in the current block via optical flow, and to update the predicted values ​​for the corresponding pixel or sub-block based on the motion vector values ​​calculated for each pixel or sub-block.

[0032] Figure 1 This is an exemplary block diagram of a video encoding apparatus capable of implementing the technology disclosed herein.

[0033] The video encoding apparatus includes a block divider 110, a predictor 120, a subtractor 130, a transformer 140, a quantizer 145, an encoder 150, an inverse quantizer 160, an inverse transformer 165, an adder 170, a filter unit 180, and a memory 190. Each element of the video encoding apparatus can be implemented as a hardware chip or as software, and one or more microprocessors can be implemented to execute the software functions corresponding to the respective elements.

[0034] The chunker 110 divides each image that makes up the video into multiple coding tree units (CTUs), and then uses a tree structure to recursively divide the CTUs. The leaf nodes in the tree structure are coding units (CUs), which are the basic coding units. A quadtree (QT) structure (in a quadtree structure, a node (or parent node) is divided into four child nodes (or children nodes) of the same size) or a quadtree plus binary tree (QTBT) structure combining a QT structure and a binary tree (BT) structure (in a binary tree structure, a node is divided into two child nodes) can be used as the tree structure. That is, a CTU can be divided into multiple CUs using a QTBT.

[0035] In a Quadtree Plus Binary Tree (QTBT) structure, the CTU can be first partitioned according to the QT structure. Quadtree partitioning can be repeated until the size of the partitioned blocks reaches the minimum block size (MinQTSize) allowed for leaf nodes in the QT. If the leaf nodes of the QT are not larger than the maximum block size (MaxBTSize) allowed for root nodes in the BT, they can be further partitioned into a BT structure. BT can have various partitioning types. For example, in some examples, there can be two partitioning types: one that horizontally partitions a node's block into two blocks of the same size (i.e., symmetric horizontal partitioning) and another that vertically partitions a node's block into two blocks of the same size (i.e., symmetric vertical partitioning). Furthermore, there may be partitioning types that asymmetrically partition a node's block into two blocks. Asymmetric partitioning can include partitioning a node's block into two rectangular blocks with a 1:3 size ratio, or partitioning a node's block diagonally.

[0036] The segmentation information generated by the segmenter 110 by dividing the CTU according to the QTBT structure is encoded by the encoder 150 and sent to the video decoding device.

[0037] A CU can have various sizes depending on the QTBT segmentation of the CTU. The block corresponding to the CU to be encoded or decoded (i.e., the leaf node of the QTBT) is then called the "current block".

[0038] Predictor 120 generates a prediction block by predicting the current block. Predictor 120 includes an intra-frame predictor 122 and an inter-frame predictor 124.

[0039] Typically, each current block within an image can be predicted individually. This prediction can usually be accomplished using intra-frame prediction or inter-frame prediction techniques. Intra-frame prediction uses data from the image containing the current block, while inter-frame prediction uses data from images encoded before the image containing the current block. Inter-frame prediction includes one-way prediction and two-way prediction.

[0040] For each inter-frame prediction block, a set of motion information is available. This set of motion information can include motion information about the forward and backward prediction directions. Here, the forward and backward prediction directions are two prediction directions in a bidirectional prediction mode, and the terms "forward" and "backward" do not necessarily have geometric meaning. Rather, they typically correspond to whether a reference image is displayed before ("backward direction") or after ("forward direction") the current image. In some examples, the "forward" and "backward" prediction directions may correspond to reference image list 0 (RefPicList0) and reference image list 1 (RefPicList1) for the current image.

[0041] For each prediction direction, motion information includes a reference index and a motion vector. The reference index is used to identify reference images in the current list of reference images (RefPicList0 or RefPicList1). The motion vector has a horizontal component (x) and a vertical component (y). Typically, the horizontal component represents the horizontal displacement in the reference image relative to the current block's position in the current image, which is needed to locate the reference block's x-coordinate. The vertical component represents the vertical displacement in the reference image relative to the current block's position, which is needed to locate the reference block's y-coordinate.

[0042] Inter-frame predictor 124 generates a prediction block for the current block through a motion compensation process. Inter-frame predictor 124 searches for the most similar block in a reference image that was encoded and decoded earlier than the current image, and uses the searched block to generate a prediction block for the current block. Then, the inter-frame predictor generates a motion vector corresponding to the displacement between the current block in the current image and the prediction block in the reference image. Typically, motion estimation is performed on the luma component, and the motion vector calculated based on the luma component is used for both the luma and chroma components. Information including information about the reference image and motion information used to predict the motion vector for the current block is encoded by encoder 150 and sent to the video decoding device.

[0043] In the case of bidirectional prediction, the inter-frame predictor 124 selects a first reference image and a second reference image from reference image list 0 and reference image list 1, respectively, and searches for blocks similar to the current block in each reference image to generate a first reference block and a second reference block. Then, the inter-frame predictor 124 generates a predicted block for the current block by averaging or weighted averaging the first and second reference blocks. The inter-frame predictor then sends motion information, including information about the two reference images and information about two motion vectors used to predict the current block, to the encoder 150. Here, the two motion vectors represent a first motion vector (i.e., the motion vector referring to the first reference image) corresponding to the displacement between the position of the current block in the current image and the position of the first reference block in the first reference image, and a second motion vector (i.e., the motion vector referring to the second reference image) corresponding to the displacement between the position of the current block in the current image and the position of the second reference block in the second reference image.

[0044] Furthermore, the inter-frame predictor 124 can perform the bidirectional optical flow (BIO) processing of this disclosure to generate a prediction block for the current block through bidirectional prediction. In other words, after determining the bidirectional motion vectors for the current block, the inter-frame predictor 124 can generate a prediction block for the current block based on motion compensation according to the BIO processing, either per image pixel or per sub-block. In other examples, one or more other units of the encoding apparatus may additionally participate in performing the BIO processing of this disclosure. Additionally, since the BIO processing is performed by applying an explicit equation using pre-decoding information shared between the encoding and decoding apparatuses, it is not necessary to signal additional information for the BIO processing.

[0045] In motion compensation via bidirectional prediction, the application of BIO processing can be determined in various ways. (See reference...) Figure 4 The accompanying figures illustrate the details of BIO processing and whether BIO processing is applied during motion compensation.

[0046] Various methods can be used to minimize the number of bits required to encode motion information.

[0047] For example, when the motion vector and reference image of the current block are the same as those of the neighboring blocks, the motion information about the current block can be sent to the decoding device by encoding the information used to identify the neighboring blocks. This method is called "merging mode".

[0048] In merge mode, the inter-frame predictor 124 selects a predetermined number of merge candidate blocks (hereinafter referred to as "merge candidates") from the neighboring blocks of the current block.

[0049] like Figure 2 As shown, neighboring blocks used to derive merge candidates can be all or part of the left block L, top block A, top right block AR, bottom left block BL, and top left block AL adjacent to the current block in the current image. Additionally, blocks located in a reference image (which may be the same as or different from the reference image used to predict the current block) can be used as merge candidates, in addition to the current image containing the current block. For example, co-located blocks in the reference image that are at the same position as the current block, or blocks adjacent to that co-located block, can also be used as merge candidates.

[0050] The inter-frame predictor 124 uses such neighboring blocks to configure a merge list including a predetermined number of merge candidates. From the merge candidates included in the merge list, merge candidates that will be used as motion information related to the current block are selected, and merge index information for identifying the selected candidates is generated. The generated merge index information is encoded by the encoder 150 and sent to the decoding device.

[0051] Another method for encoding motion information is to encode motion vector differences.

[0052] In this method, the inter-frame predictor 124 uses neighboring blocks of the current block to derive predicted motion vector candidates for the motion vector of the current block. As neighboring blocks used to derive the predicted motion vector candidates, all or part of the left block L, top block A, top right block AR, bottom left block BL, and top left block AL adjacent to the current block in the current image can be used, such as... Figure 2 As shown. In addition to the current image containing the current block, blocks located in a reference image (which may be the same as or different from the reference image used to predict the current block) can be used as neighboring blocks for deriving candidate motion vectors. For example, a colocalized block in the reference image that is at the same location as the current block, or a block adjacent to a colocalized block, can be used.

[0053] The inter-frame predictor 124 uses motion vectors from neighboring blocks to derive candidate predicted motion vectors, and uses these candidate predicted motion vectors to determine the predicted motion vector for the current block. The motion vector difference is then calculated by subtracting the predicted motion vector from the motion vector of the current block.

[0054] Predicted motion vectors can be obtained by applying a predetermined function (e.g., a function used to calculate the median, mean, etc.) to the predicted motion vector candidates. In this case, the video decoding device also knows the predetermined function. Since the neighboring blocks used to derive the predicted motion vector candidates have already been encoded and decoded, the video decoding device also knows the motion vectors of the neighboring blocks. Therefore, the video encoding device does not need to encode the information used to identify the predicted motion vector candidates. Thus, in this case, information related to the motion vector difference and information related to the reference image used to predict the current block are encoded.

[0055] The predicted motion vector can be determined by selecting any one of the predicted motion vector candidates. In this case, the information used to identify the selected predicted motion vector candidate is further encoded along with information about the motion vector difference and information about the reference image used to predict the current block.

[0056] Intra-predictor 122 uses pixels (reference pixels) surrounding the current block in the current image, including the current block, to predict pixels in the current block. Multiple intra-prediction modes exist depending on the prediction direction, and the reference pixels and equations to be used are defined differently for each prediction mode. Specifically, intra-predictor 122 can determine the intra-prediction mode to be used when encoding the current block. In some examples, intra-predictor 122 can encode the current block using several intra-prediction modes and select a suitable intra-prediction mode from the tested modes for use. For example, intra-predictor 122 can calculate rate distortion values ​​using rate distortion analysis of several tested intra-prediction modes and can select the intra-prediction mode with the best rate distortion characteristics from the tested modes.

[0057] Intra-predictor 122 selects one intra-prediction mode from a plurality of intra-prediction modes and uses neighboring pixels (reference pixels) and an equation determined according to the selected intra-prediction mode to predict the current block. Information about the selected intra-prediction mode is encoded by encoder 150 and sent to video decoding device.

[0058] Subtractor 130 subtracts the prediction block generated by intra-predictor 122 or inter-predictor 124 from the current block to generate a residual block.

[0059] Transformer 140 transforms the residual signal in the residual block having pixel values ​​in the spatial domain into transform coefficients in the frequency domain. Transformer 140 can transform the residual signal in the residual block by using the size of the current block as the transform unit, or it can divide the residual block into multiple smaller sub-blocks and transform the residual signal using transform units corresponding to the size of the transformed block. Various methods may exist for dividing the residual block into smaller sub-blocks. For example, the residual block can be divided into sub-blocks of the same predetermined size, or it can be divided in the form of a quadtree (QT) with the residual block as the root node.

[0060] The quantizer 145 quantizes the transformation coefficients output from the converter 140 and outputs the quantized transformation coefficients to the encoder 150.

[0061] Encoder 150 uses an encoding scheme such as CABAC to encode the quantized transform coefficients to generate a bitstream. Encoder 150 encodes information associated with block segmentation (such as CTU size, MinQTSize, MaxBTSize, MaxBTDepth, MinBTSize, QT segmentation marker, BT segmentation marker, and segmentation type) so that the video decoding device segments the blocks in the same way as the video encoding device.

[0062] Encoder 150 encodes information related to the prediction type, which indicates whether the current block is coded by intra-frame prediction or inter-frame prediction, and encodes the intra-frame prediction information or inter-frame prediction information according to the prediction type.

[0063] When the current block is intra-predicted, the syntax elements for the intra-prediction mode are encoded as intra-prediction information. When the current block is inter-predicted, encoder 150 encodes the syntax elements for the inter-prediction information. The syntax elements for the inter-prediction information include the following information.

[0064] (1) Mode information, which indicates whether motion information about the current block is encoded in a merging mode or in a mode used to encode motion vector differences.

[0065] (2) Syntax elements used for motion information

[0066] When encoding motion information in merge mode, encoder 150 can encode merge index information, which indicates which merge candidate is selected as the candidate for extracting motion information about the current block from the merge candidates, into syntax elements for motion information.

[0067] On the other hand, when motion information is encoded in a mode used to encode motion vector differences, information related to the motion vector differences and information related to the reference image are encoded as syntax elements for motion information. When a predicted motion vector is determined by selecting one of a plurality of predicted motion vector candidates, the syntax elements for motion information also include predicted motion vector identification information for identifying the selected candidate.

[0068] Inverse quantizer 160 inverse quantizes the quantized transform coefficients output from quantizer 145 to generate transform coefficients. Inverse transformer 165 transforms the transform coefficients output from inverse quantizer 160 from the frequency domain to the spatial domain and reconstructs the residual block.

[0069] Adder 170 adds the reconstructed residual block to the prediction block generated by predictor 120 to reconstruct the current block. Pixels in the reconstructed current block are used as reference samples when performing intra-frame prediction for the next block in sequence.

[0070] Filter unit 180 performs deblocking filtering on the boundaries between reconstructed blocks to eliminate block artifacts caused by block-by-block encoding / decoding, and stores the blocks in memory 190. When reconstructing all blocks in an image, the reconstructed image is used as a reference image for inter-frame prediction blocks in subsequent images to be encoded.

[0071] The video decoding device will be described below.

[0072] Figure 3This is an exemplary block diagram of a video decoding apparatus capable of implementing the technology disclosed herein.

[0073] The video decoding device includes a decoder 310, an inverse quantizer 320, an inverse converter 330, a predictor 340, an adder 350, a filter unit 360, and a memory 370. (As in...) Figure 2 In the case of a video encoding device, each element of the video encoding device can be implemented as a hardware chip or as software, and the microprocessor can be implemented to execute the software functions corresponding to each element.

[0074] Decoder 310 decodes the bitstream received from the video encoding device, extracts information related to block segmentation to determine the current block to be decoded, and extracts prediction information and information related to the residual signal required to reconstruct the current block.

[0075] Decoder 310 extracts information about the CTU size from the Sequence Parameter Set (SPS) or Picture Parameter Set (PPS), determines the CTU size, and segments the image into CTUs of the determined size. Then, the decoder identifies the CTU as the top level (i.e., the root node) of the tree structure and extracts segmentation information about the CTU to segment it using the tree structure. For example, when segmenting the CTU using a QTBT structure, a first flag (QT_split_flag) related to the QT segmentation is extracted to segment each node into four nodes in the sub-layer. For nodes corresponding to leaf nodes of the QT, a second flag (BT_split_flag) and segmentation type information related to the BT segmentation are extracted to segment the leaf node into a BT structure.

[0076] When determining the current block to be decoded through tree structure segmentation, decoder 310 extracts information related to the prediction type indicating whether the current block is predicted intra-frame or inter-frame.

[0077] When the prediction type information indicates intra-prediction, the decoder 310 extracts syntax elements for intra-prediction information (intra-prediction mode) related to the current block.

[0078] When the prediction type information indicates inter-frame prediction, the decoder 310 extracts syntax elements for inter-frame prediction information. First, the decoder extracts mode information indicating the coding mode in which motion information related to the current block is encoded in multiple coding modes. These multiple coding modes include merging mode and motion vector difference coding mode. When the mode information indicates merging mode, the decoder 310 extracts merging index information from the merging candidates, indicating the merging candidates from which the motion vectors of the current block will be derived, as syntax elements for motion information. On the other hand, when the mode information indicates motion vector difference coding mode, the decoder 310 extracts information related to the motion vector difference and information related to the reference picture for the motion vector reference of the current block, as syntax elements for motion vectors. When the video encoding apparatus uses one of the multiple predicted motion vector candidates as the predicted motion vector for the current block, predicted motion vector identification information is included in the bitstream. Therefore, in this case, not only information related to the motion vector difference and the reference picture is extracted, but also predicted motion vector identification information is extracted as syntax elements for motion vectors.

[0079] Decoder 310 extracts information related to the quantization transform coefficients of the current block as information related to the residual signal.

[0080] Inverse quantizer 320 performs inverse quantization on the quantized transform coefficients. Inverse transformer 330 inversely transforms the inverse quantized transform coefficients from the frequency domain to the spatial domain to reconstruct the residual signal, thereby generating a residual block for the current block.

[0081] Predictor 340 includes intra-predictor 342 and inter-predictor 344. Intra-predictor 342 is activated when the prediction type of the current block is intra-prediction, and inter-predictor 344 is activated when the prediction type of the current block is inter-prediction.

[0082] The intra predictor 342 uses syntax elements for intra prediction modes extracted from the decoder 310 to determine the intra prediction mode of the current block among multiple intra prediction modes, and uses reference pixels around the current block to predict the current block according to the intra prediction mode.

[0083] The inter-frame predictor 344 uses syntax elements for inter-frame prediction information extracted from the decoder 310 to determine motion information about the current block, and uses the determined motion information to predict the current block.

[0084] First, the inter-frame predictor 344 examines the mode information extracted from the decoder 310 for inter-frame prediction. When the mode information indicates a merging mode, the inter-frame predictor 344 configures a merging list including a predetermined number of merging candidates using the neighboring blocks of the current block. The inter-frame predictor 344 configures the merging list in the same manner as the inter-frame predictor 124 of the video encoding apparatus. Then, a merging candidate is selected from the merging candidates in the merging list using merging index information sent from the decoder 310. Motion information associated with the selected merging candidate (i.e., the motion vector and reference picture of the merging candidate) is set as the motion vector and reference picture of the current block.

[0085] On the other hand, when the mode information indicates a motion vector difference coding mode, the inter-frame predictor 344 uses the motion vectors of neighboring blocks of the current block to derive predicted motion vector candidates, and uses these candidates to determine the predicted motion vector for the current block. The inter-frame predictor 344 derives the predicted motion vector candidates in the same manner as the inter-frame predictor 124 of the video coding apparatus. When the video coding apparatus uses one of multiple predicted motion vector candidates as the predicted motion vector for the current block, the syntax elements for motion information include predicted motion vector identification information. Therefore, in this case, the inter-frame predictor 344 can select the candidate indicated by the predicted motion vector identification information as the predicted motion vector. However, when the video coding apparatus determines the predicted motion vector by applying a predefined function to multiple predicted motion vector candidates, the inter-frame predictor 344 can use the same function used by the video coding apparatus to determine the predicted motion vector. Once the predicted motion vector for the current block is determined, the inter-frame predictor 344 adds the predicted motion vector to the motion vector difference sent from the decoder 310 to determine the motion vector for the current block. The reference image referenced by the motion vector of the current block is determined using information related to the reference image passed from the decoder 310.

[0086] When the motion vector and reference image of the current block are determined in merge mode or motion vector difference coding mode, the inter-frame predictor 344 uses the block at the position indicated by the motion vector in the reference image to generate a predicted block for the current block.

[0087] In the case of bidirectional prediction, the inter-frame predictor 344 selects a first reference image and a second reference image from reference image list 0 and reference image list 1, respectively, using syntax elements for inter-frame prediction information, and determines a first motion vector and a second motion vector for each reference image. Then, a first reference block is generated using the first motion vector of the first reference image, and a second reference block is generated using the second motion vector of the second reference image. A prediction block for the current block is generated by averaging or weighted averaging the first and second reference blocks.

[0088] Furthermore, the inter-frame predictor 344 can perform the bidirectional optical flow (BIO) processing of this disclosure to generate a prediction block for the current block through bidirectional prediction. In other words, after determining the bidirectional motion vectors for the current block, the inter-frame predictor 344 can generate a prediction block for the current block based on motion compensation according to the BIO processing on a per-pixel or per-sub-block basis.

[0089] In motion compensation via bidirectional prediction, the application of BIO processing can be determined in various ways. (See reference...) Figure 4 The accompanying figures illustrate the details of BIO processing and whether BIO processing is applied during motion compensation.

[0090] Adder 350 reconstructs the current block by adding the residual block output from the inverse transform to the prediction block output from the inter-frame predictor or intra-frame predictor. Pixels in the reconstructed current block are used as reference samples for intra-frame prediction of blocks to be decoded later.

[0091] Filter unit 360 performs deblocking filtering on the boundaries between reconstructed blocks to remove block artifacts caused by block-by-block decoding, and stores the deblocked and filtered blocks in memory 370. When reconstructing all blocks in an image, the reconstructed image is used as a reference image for inter-frame prediction of blocks in subsequent images to be decoded.

[0092] In inter-frame prediction operations, the encoding unit performs motion estimation and compensation at the coding unit (CU) level, and then sends the resulting motion vector (MV) values ​​to the decoding unit. The encoding and decoding units can further use block-in-order (BIO) correction to adjust the MV values ​​in pixel units or sub-block units (i.e., sub-CUs) smaller than the CU. That is, BIO can accurately compensate for the motion of the coding block CU in 1×1 blocks (i.e., one pixel) or n×n blocks. Furthermore, since BIO processing is performed by applying explicit equations using pre-decoding information shared between the encoding and decoding units, there is no need to signal additional information for BIO processing from the encoding unit to the decoding unit.

[0093] Figure 4 This is a reference diagram used to explain the basic concepts of BIO.

[0094] BIO used for video encoding and decoding is based on the following assumptions: motion vector information should be bidirectional predictive information, and the pixels that make up the image move at a constant speed with almost no change in pixel values.

[0095] First, it is assumed that bidirectional motion vectors MV0 and MV1 have been determined through (regular) bidirectional motion prediction of the current block to be encoded in the current image. Bidirectional motion vectors MV0 and MV1 point to the corresponding regions (i.e., reference blocks) in reference images Ref0 and Ref1 that are most similar to the current block. The two bidirectional motion vectors have values ​​representing the motion of the current block. That is, the bidirectional motion vectors are values ​​obtained by setting the current block as a unit and estimating the motion of the entire unit.

[0096] exist Figure 4 In the example, the pixel in reference image Ref0 that is indicated by motion vector MV0 and corresponds to pixel P in the current block is represented as P0, and the pixel in reference image Ref1 that is indicated by motion vector MV1 and corresponds to pixel P in the current block is represented as P1. Furthermore, it is assumed that... Figure 4 The motion of pixel P in the current block is slightly different from the overall motion of the current block. For example, when located in... Figure 4 When an object at pixel A in Ref0 moves to pixel B in Ref1 via pixel P in the current block of the current image, pixels A and B can have very similar values. Similarly, in this case, the point in Ref0 most similar to pixel P in the current block is not P0 indicated by the motion vector MV0, but rather the point offset from P0 by a predetermined displacement vector (v). x τ0,v y Pixel A of τ0). The point in Ref1 most similar to pixel P in the current block is not P1 represented by motion vector MV1, but rather P1 offset by a predetermined displacement vector (-v). x τ1,-v y Pixel B of τ1). τ0 and τ1 represent the temporal distances of Ref0 and Ref1 relative to the current image, respectively, and are calculated based on the Image Order Count (POC). In the following text, for simplicity, (v x ,v y This is called "optical flow" or "BIO motion vector".

[0097] Therefore, when predicting the value of pixel P in the current block of the current image, using the values ​​of two reference pixels A and B allows for a more accurate prediction than using reference pixels P0 and P1 indicated by bidirectional motion vectors MV0 and MV1. This is considering the optical flow (v...) mentioned above. x ,v y The concept of specifying pixel-level motion to change the reference pixel used to predict a pixel in the current block can be extended to consider the concept of sub-block-level motion on a per-sub-block basis, dividing the current block into sub-blocks.

[0098] The following section describes a theoretical method for generating predicted values ​​for pixels in the current block based on BIO technology. For simplicity, it is assumed that BIO-based bidirectional motion compensation is performed on a pixel-by-pixel basis.

[0099] Assume that bidirectional motion vectors MV0 and MV1, pointing to the corresponding region (i.e., the reference block) most similar to the current block encoded in the current image, have been determined in reference images Ref0 and Ref1 through (conventional) bidirectional motion prediction for the current block. The decoding device can determine the bidirectional motion vectors MV0 and MV1 based on the motion vector information included in the bitstream. Furthermore, the brightness value of the pixel in reference image Ref0, represented by motion vector MV0 and corresponding to pixel (i,j) in the current block, is defined as I. (0) (i,j), and define the brightness value of the pixel in the reference image Ref1, represented by the motion vector MV1 and corresponding to the pixel (i,j) in the current block, as I. (1) (i,j).

[0100] The indicator BIO motion vector (v) in reference image Ref0 can be used. x ,v y The brightness value of pixel A corresponding to the pixel in the current block is defined as I. (0) (i+v x τ0,j+v y τ0), and the brightness value of pixel B in the reference image Ref1 can be defined as I. (1) (iv x τ1,jv y τ1). Here, when using only the first-order terms of the Taylor series for linear approximation, A and B can be expressed as Equation 1.

[0101] [Equation 1]

[0102]

[0103] Here, I x (k) and I y (k) (k = 0, 1) are the gradient values ​​in the horizontal and vertical directions at positions (i, j) of Ref0 and Ref1, respectively. τ0 and τ1 represent the temporal distances of Ref0 and Ref1 relative to the current image, and are calculated based on POC: τ0 = POC(current) - POC(Ref0), τ1 = POC(Ref1) - POC(current).

[0104] Bidirectional optical flow (v) of each pixel in the block x ,v yThe solution is determined to minimize Δ, which is defined as the difference between pixel A and pixel B. Δ is constrained by Equation 2, which is a linear approximation of A and B derived from Equation 1.

[0105] [Equation 2]

[0106]

[0107] For simplicity, the pixel positions (i,j) have been omitted from each term of Equation 2 above.

[0108] To achieve more robust optical flow estimation, we assume that the motion is locally consistent with neighboring pixels. For the BIO motion vector of the pixel (i,j) to be predicted, consider the difference Δ in Equation 2 for all pixels (i',j') existing in a mask Ω of size (2M+1)×(2M+1), centered at the pixel (i,j) to be predicted. That is, the optical flow for the current pixel (i,j) can be determined as such that the objective function Φ(v x ,v y The vector to be minimized is the sum of squares of the differences Δ[i',j'] obtained for each pixel in the mask Ω, as shown in Equation 3.

[0109] [Equation 3]

[0110]

[0111] Here, (i',j') represents the position of a pixel within the mask Ω. For example, when M=2, the mask has the following structure: Figure 5 The shape shown. The pixel in the shaded area at the center of the mask is the current pixel (i,j), and the pixels in the mask Ω are represented by (i',j').

[0112] To estimate the optical flow (v) of each pixel (i,j) in the block x ,v y The objective function Φ(v) is calculated using analytical methods. x ,v y Minimize the solution. and The objective function Φ(v) can be used. x ,v y Regarding v x and v y The partial derivatives are derived, and equation 4 can be obtained by solving the two equations into a simultaneous equation.

[0113] [Equation 4]

[0114] s1v x (i,j)+s2vy (i,j)=-s3

[0115] s4v x (i,j)+s5v y (i,j)=-s6

[0116] As shown in Equation 5, s1, s2, s3, s4, s5 and s6 in Equation 4 are given.

[0117] [Equation 5]

[0118]

[0119] Here, since s2 = s4, s4 is replaced by s2.

[0120] By solving equation 4, which is part of a system of equations, we can estimate v. x and v y For example, using Cramer's rule, v can be... x and v y The derivation is Equation 6.

[0121] [Equation 6]

[0122]

[0123]

[0124] As another example, it can be used by v y Substituting 0 into the first equation of equation 4 to calculate v x An approximation of v and by calculating v x Substitute the value into the second equation to calculate v y A simplified method for approximating v. In this case, v x and v y As shown in Equation 7.

[0125] [Equation 7]

[0126]

[0127]

[0128] Here, r and m are standardized parameters introduced to avoid performing division by zero or very small values. In Equation 7, v is set when s1 + r > m is not satisfied. x (i,j)=0. When s5+r>m is not satisfied, set v. y (i,j)=0.

[0129] As another example, this can be achieved by using vy Substituting 0 into the first equation of equation 4 to calculate v x An approximation of v, and can be obtained by... x Substituting 0 into the second equation to calculate v y An approximation of v. Using this method, v x and v y It can be calculated independently and can be represented as Equation 8.

[0130] [Equation 8]

[0131]

[0132]

[0133] As another example, this can be achieved by using v y Substituting 0 into the first equation of equation 4 to calculate v x An approximation of v, and v can be used as an approximation. y Calculated by v x Substituting the approximate value into the second equation yields v y The first approximation and by v x Substituting 0 into the second equation yields v y The average of the second approximation values. Using this method, v can be obtained as shown in Equation 9. x and v y .

[0134] [Equation 9]

[0135]

[0136]

[0137] The normalization parameters r and m used in equations 7 to 9 can be defined as in equation 10.

[0138] [Equation 10]

[0139] r = 500·4 d-8

[0140] m = 700·4 d-8

[0141] Here, d represents the bit depth of the image pixels.

[0142] For each pixel in the block, the optical flow v of each pixel in the block is obtained by using the calculations of equations 6 to 9. x and v y .

[0143] Once the optical flow (v) of the current pixel is determinedx ,v y Then, the bidirectional prediction value pred based on the current pixel (i,j) of BIO can be calculated using Equation 11. BIO .

[0144] [Equation 11]

[0145] or

[0146]

[0147] In Equation 11, (I (0) +I (1) ) / 2 is a typical bidirectional motion compensation based on blocks, so the remaining terms can be referred to as BIO offsets.

[0148] In typical bidirectional motion compensation, pixels from a reference block are used to generate the predicted block for the current block. On the other hand, to use a mask, access to pixels outside the reference block should be allowed. For example, for applications such as... Figure 6 The mask for the pixel at the top left position (position (0,0)) of the reference block shown in (a) includes pixels located outside the reference block. To maintain the same memory access as in typical bidirectional motion compensation and reduce the computational complexity of BIO, the I-values ​​of pixels located in the mask outside the reference block are... (k) I x (k) and I y (k) It can be filled with the corresponding value of the nearest pixel in the reference block. For example, such as Figure 6 As shown in (b), when the mask size is 5×5, the I of the outer pixel located above the reference block (k) I x (k) and I y (k) The I of the pixel in the top row of the reference block can be used. (k) I x (k) and I y (k) Fill. The outer pixel I on the left side of the reference block. (k) I x (k) and I y (k) The I value of the leftmost column of the reference block can be used. (k) I x (k) and I y (k) filling.

[0149] BIO processing based on pixels in the current block has been described. However, to reduce computational complexity, BIO processing can be performed on a block-by-block basis (e.g., on a 4×4 block basis). By performing BIO on a per-sub-block basis in the current block, the optical flow v can be obtained on a per-sub-block basis in the current block using Equations 6 to 9. x and v y Aside from the mask's range, sub-block-based BIO is based on the same principles as pixel-based BIO.

[0150] As an example, the range of the mask Ω can be extended to include the range of the sub-blocks. When the size of the sub-blocks is N×N, the size of the mask Ω is (2M+N)×(2M+N). For example, when M=2 and the size of the sub-blocks is 4×4, the mask has the following... Figure 7 The shape is shown. Δs of Equation 2 can be calculated for all pixels in the mask Ω, including the sub-blocks, to obtain the objective function of Equation 3 for the sub-blocks. Furthermore, the optical flow (v) can be calculated on a sub-block basis by applying Equations 4 through 9. x ,v y ).

[0151] As another example, Δs in Equation 2 can be computed by applying a mask to all pixels in a sub-block on a pixel-by-pixel basis, and the objective function for Equation 3 for the sub-block can be obtained by obtaining the sum of squares of Δs. The optical flow (v) for the sub-block can then be computed in a manner that minimizes the objective function. x ,v y For example, see reference. Figure 8 The Δs of Equation 2 can be calculated for all pixels in the 5×5 mask 810a by applying mask 810a to the pixel at position (0,0) of the 4×4 sub-block 820 in the current block. Then, the Δs of Equation 2 can be calculated for all pixels in the 5×5 mask 810b by applying mask 810b to the pixel at position (0,1). Through this process, the objective function of Equation 3 can be obtained by summing the squares of the calculated Δs for all pixels in the sub-block. Then, the optical flow (v) that minimizes the objective function can be calculated. x ,v y In this example, the objective function is represented by Equation 12.

[0152] [Equation 12]

[0153]

[0154] Among them, b k This represents the k-th sub-block in the current block, and Ω(x,y) represents the mask of the pixel with coordinates (x,y) in the k-th sub-block. It is used to calculate optical flow (v). x,v y s1 to s6 are modified according to Equation 13.

[0155] [Equation 13]

[0156]

[0157] In the above equations, and They represent I respectively x (k) and I y (k) That is, the horizontal gradient and the vertical gradient.

[0158] As another example, such as Figure 7 The sub-block-based mask shown can be used, and weights can be applied at each location on the mask. Higher weights are applied at locations closer to the center of the sub-block. For example, refer to... Figure 8 When applying a mask pixel-by-pixel within a sub-block, Δs can be redundantly calculated for the same location. Most pixels located within mask 810a centered at position (0,0) of sub-block 820 are also located within mask 810b centered at position (1,0) of sub-block 820. Therefore, Δs may be redundantly calculated. Instead of repeatedly calculating overlapping Δs, weights can be assigned to each location in the mask based on the amount of overlap. For example, when M=2 and the sub-block size is 4×4, a weighting method such as... Figure 9 The weighted mask shown is used to simplify the operations of equations 12 and 13, thereby reducing computational complexity.

[0159] The aforementioned pixel-based or sub-block-based BIO requires significant computation. Therefore, a method is needed in video encoding or decoding to reduce the computational load based on BIO. To this end, this disclosure proposes skipping BIO processing during motion compensation when certain conditions are met.

[0160] Figure 10 This is a block diagram illustrating the configuration of a device according to an embodiment of the present disclosure, configured to perform motion compensation by selectively applying BIO processing.

[0161] The motion compensation device 1000 described in this embodiment, which can be implemented in the inter-frame predictor 124 of the video encoding apparatus and / or the inter-frame predictor 344 of the video decoding apparatus, may include a reference block generator 1010, a skip determiner 1020, and a prediction block generator 1030. Each of these components may be implemented as a hardware chip or as software, and one or more microprocessors may be implemented to perform the software functions corresponding to the respective components.

[0162] The reference block generator 1010 uses the first motion vector of the first reference image in the reference image list 0 to generate the first reference block, and uses the second motion vector of the second reference image in the reference image list 1 to generate the second reference image.

[0163] Skip Determiner 1020 determines whether to apply BIO processing during motion compensation.

[0164] When the skip determiner 1020 determines to skip BIO processing, the prediction block generator 1030 generates a prediction block for the current block using typical motion compensation. That is, the prediction block for the current block is generated by averaging or weighted averaging the first reference block and the second reference block. Conversely, when the skip determiner 1020 determines to apply BIO processing, the prediction block generator 1030 generates a prediction block for the current block using the first and second reference blocks according to the BIO processing. That is, the prediction block for the current block can be generated by applying Equation 11.

[0165] Skip Determiner 1020 can determine whether to apply BIO processing based on one or more of the following conditions:

[0166] - Texture complexity of the current block;

[0167] - The size of the current block and / or pattern information indicating the motion information encoding pattern;

[0168] - Do the bidirectional motion vectors (first motion vector and second motion vector) satisfy the constant velocity constraint (CVC) and / or constant brightness constraint (BCC)?

[0169] - The degree of change in the motion vectors of adjacent blocks.

[0170] The following section describes a detailed method for determining whether to apply BIO processing using each condition.

[0171] Implementation method 1: Skipping BIO based on texture complexity

[0172] Optical flow often leads to unrobust results in smooth regions with few local features, such as edges or corners. Furthermore, regions with such smooth textures are likely already adequately predicted by conventional block-based motion estimation. Therefore, in this implementation, the texture complexity of the current block is calculated, and BIO processing is skipped based on the texture complexity.

[0173] To allow the encoding and decoding units to compute texture complexity without additional signaling, a first reference block and a second reference block shared between the encoding and decoding units can be used to compute the texture complexity of the current block. That is, a skip determiner implemented in each of the encoding and decoding units determines whether to skip BIO processing by computing the texture complexity of the current block.

[0174] For texture complexity, a local feature detector with minimal computation (e.g., differences with neighboring pixels, gradients, and Moravecs) can be used. In this implementation, gradients are used to compute texture complexity. The gradient used for the reference block is the value used in the BIO process. Therefore, the advantage of this implementation is that the gradient values ​​computed with texture complexity can be directly applied to the BIO process.

[0175] The motion compensation device 1000 according to this embodiment calculates texture complexity using the horizontal and vertical gradients of each pixel in the first and second reference blocks. As an example, the motion compensation device 1000 calculates horizontal complexity using the horizontal gradients of each pixel in the first and second reference blocks, and calculates vertical complexity using the vertical gradients of each pixel in the first and second reference blocks. For example, the horizontal and vertical complexity can be calculated using Equation 14.

[0176] [Equation 14]

[0177]

[0178] Here, D1 and D5 represent the horizontal and vertical complexity, respectively, and CU represents a set of pixel positions in the first and second reference blocks corresponding to the positions of each pixel in the current block. [i,j] represents the position in the first and second reference blocks corresponding to each pixel in the current block. d1(i,j) and d5(i,j) can be calculated by Equation 15.

[0179] [Equation 15]

[0180]

[0181] Using d1 and d5 of Equation 15, the horizontal and vertical complexity can be calculated in Equation 14. That is, the horizontal gradient (τ0I) for each pixel location can be calculated by considering the temporal distance (τ0, τ1) between corresponding pixels in the first and second reference blocks. x (0) (i,j), τ1I x (1)The level complexity D1 is calculated by summing (i,j) and summing the squares of the sums. Then, the vertical gradient (τ0I) for each pixel location can be calculated by considering the temporal distance between pixels at corresponding positions in the first and second reference blocks. y (0) (i,j), τ1I y (1) The vertical complexity D5 is calculated by summing the sums of (i,j) and summing the squares of the sums.

[0182] In Equation 15, d4 is omitted. d4 has the same value as d2. It can be seen that d1 to d6 in Equation 15 are related to s1 to s6 in Equation 5. d1 to d6 represent the values ​​at a single pixel location, and s1 to s6 represent the sum of each of d1 to d6 calculated at all pixel locations in a mask centered on a single pixel. That is, using Equation 15, Equation 5 can be expressed as Equation 16 below. In Equation 16, s4 is omitted because it has the same value as s2.

[0183] [Equation 16]

[0184]

[0185] The texture complexity for the current block can be set to any one of the minimum (Min(D1,D5)), maximum (Max(D1,D5)), or average (Ave(D1,D5)) of the horizontal and vertical complexities. The motion compensation device 1000 skips BIO processing when the texture complexity is less than a threshold T, and applies BIO processing when the texture complexity is greater than or equal to the threshold T. When BIO processing is applied, d1 to d6 calculated in Equation 14 can be used to calculate s1 to s6. That is, according to this embodiment, the texture complexity of the current block is obtained using the value to be calculated during BIO processing, and whether to skip BIO processing is determined based on this value. Therefore, the additional calculations used to determine whether to skip BIO processing can be reduced.

[0186] For the threshold T, the normalization parameter method used in scaling equations 7 to 9 can be employed. The relationship between the normalization parameters r and m is s1 > m – r and s5 > mr. When s1 <= m – r, even when BIO is performed, v x It is also 0. When s5 <= mr, even if BIO is executed, v y It is also 0.

[0187] Therefore, when the threshold T is set based on the relationship of the normalized parameters, even if BIO is performed, it can be skipped by pre-determining the regions set to 0 based on the CU. D1 is the sum of d1 for all pixel positions in the CU, and s1 is the sum of d1 in the mask Ω. Therefore, when the size of the CU is W×H and the size of the mask Ω is (2M+1)×(2M+1), the threshold T can be set as in Equation 17.

[0188] [Equation 17]

[0189]

[0190] Figure 11 This is a schematic diagram illustrating a process of performing motion compensation by selectively applying BIO processing based on the texture complexity of the current block, according to an embodiment of the present disclosure.

[0191] The motion compensation device 1000 calculates the horizontal gradient I for each pixel in the first and second reference blocks. x (k) and vertical gradient I y (k) (S1102). Then, d1 to d6 are calculated using Equation 15, and the horizontal complexity D1 and vertical complexity D5 are calculated using d1 and d5 according to Equation 14 (S1104). It is determined whether the texture complexity of the current block (which is the minimum of the horizontal complexity D1 and the vertical complexity D5) is less than the threshold T (S1106). Although the texture complexity of the current block is described as the minimum between the horizontal complexity D1 and the vertical complexity D5 in this example, the texture complexity can be set to the maximum or average value.

[0192] When the texture complexity of the current block is less than the threshold T, the BIO processing is skipped, and the predicted block of the current block is generated by typical motion compensation (S1108). That is, the predicted block of the current block is generated by averaging or weighted averaging the first reference block and the second reference block.

[0193] When the texture complexity of the current block is greater than or equal to the threshold T, a prediction block for the current block is generated using the first and second reference blocks according to BIO processing. First, s1 to s6 are calculated. Since the horizontal and vertical gradients of the pixels in the reference blocks have already been calculated in S1102, it is only necessary to calculate the horizontal and vertical gradients for pixels outside the reference blocks present in the mask to obtain s1 to s6. Alternatively, as described above, when the horizontal and vertical gradients of pixels outside the reference blocks are filled with the corresponding values ​​of pixels in the nearby reference blocks, s1 to s6 can be obtained using only the already calculated horizontal and vertical gradients for pixels in the reference blocks.

[0194] Alternatively, since d1 to d6 are associated with s1 to s6 (see Equation 16), the calculated values ​​of d1 to d6 can be used when calculating s1 to s6.

[0195] Once s1 through s6 are calculated, one of equations 6 through 9 is used to determine the pixel-based or sub-block-based optical flow (v). x ,v y (S1112). Then, by using optical flow (v) x ,v y The formula is applied to the corresponding pixel or sub-block in the current block to generate the prediction block of the current block according to Equation 11 (S1114).

[0196] Figure 12 This is another schematic diagram illustrating a process of performing motion compensation by selectively applying BIO processing based on the texture complexity of the current block, according to an embodiment of the present disclosure.

[0197] exist Figure 12 The example disclosed differs only in the order in which d1 to d6 are calculated. Figure 11 Example. That is, only d1 and d5 of d1 to d6 are needed to calculate the texture complexity of the current block. Therefore, as in S1204, d1 and d5 are obtained first. When the texture complexity is greater than the threshold, d2, d3, d4 (equal to d2), and d6 are calculated, thereby performing BIO processing (S1210). Other operations are similar to... Figure 11 The operations are basically the same.

[0198] The following shows experimental results comparing motion compensation based on BIO processing with motion compensation performed by selectively applying BIO processing based on texture complexity, according to this embodiment.

[0199] [Table 1]

[0200]

[0201] The experiments used four sequences for Class A1 (4K), five for Class B (FHD), four for Class C (832×480), and four for Class D (416×240), and all frames of the corresponding videos were used. The experimental environment was a random access (RA) configuration, and BD rates were compared by setting QP to 22, 27, 32, and 37.

[0202] According to this implementation, BIO is skipped by an average of approximately 19%, and in the computationally most demanding A1 class (4K), BIO is skipped by 32%. Experiments show that the skipping rate increases with increasing image resolution. These experimental results can be considered significant because increasing resolution substantially increases the computational burden.

[0203] Furthermore, although the Y BD rate increases by an average of 0.02%, a BD rate difference of 0.1% or less is generally considered negligible. Therefore, it can be seen that even if BIO is selectively skipped according to this example, the compression efficiency remains almost the same.

[0204] The example above relates to determining whether to skip the entire BIO process. Instead of skipping the entire BIO process, the horizontal optical flow v can be skipped independently. x and vertical optical flow v y That is, when the horizontal complexity D1 is less than the threshold T, by setting v x =0 to skip horizontal BIO processing, and when the vertical complexity D5 is less than the threshold T, v is set to 0. y =0 to skip BIO processing in the vertical direction.

[0205] Figure 13 This is yet another schematic diagram illustrating a process of performing motion compensation by selectively applying BIO processing based on the texture complexity of the current block, according to an embodiment of the present disclosure.

[0206] The motion compensation device 1000 calculates the horizontal gradient I for each pixel in the first and second reference blocks. x (k) and vertical gradient I y (k) (S1310). Then, use Equation 15 to calculate d1 and d5, use d1 to calculate the horizontal complexity D1, and use d5 to calculate the vertical complexity D5 (S1320).

[0207] Once the horizontal complexity D1 and vertical complexity D5 have been calculated in S1320, operations to determine whether to skip the horizontal optical flow (S1330) and whether to skip the vertical optical flow (S1340) are performed. Although Figure 13 It shows that you can first determine whether to skip the horizontal optical flow, but you can first determine whether to skip the vertical optical flow.

[0208] In S1330, the motion compensation device 1000 determines whether the horizontal complexity D1 is less than the threshold T (S1331). When the horizontal complexity D1 is less than the threshold T, the horizontal optical flow v is... xSet to 0 (S1332). This means that horizontal optical flow is not applied. When the horizontal complexity D1 is greater than or equal to the threshold T, d3 is calculated (S1333), and s1 and s3 are calculated using d1 and d3 (S1334). Refer to equations 7 to 9, when calculating the horizontal optical flow v x At this point, only s1 and s3 are needed. Since d1 has already been calculated in S1320, d3 is calculated in S1333, and s1 and s3 are calculated using d1 and d3 in S1334. Then, the horizontal optical flow v is calculated using s1 and s3 according to any of equations 7 to 9. x (S1335).

[0209] Then, the process proceeds to S1340 to determine whether to skip the vertical optical flow. It is determined whether the vertical complexity D5 is less than a threshold T (S1341). If the vertical complexity D5 is less than the threshold T, the vertical optical flow v is skipped. y Set to 0 (S1342). This means that vertical optical flow is not applied. When the vertical complexity D5 is greater than or equal to the threshold T, calculate d2 and d6 (S1343), and use d2, d5, and d6 to calculate s2, s5, and s6 (S1344). When calculating the vertical optical flow v using Equation 7 or 9... y At this point, only s2, s5, and s6 are needed. Since d5 has already been calculated in S1320, d2 and d6 are calculated in S1343, and s2, s5, and s6 are calculated using d2, d5, and d6 in S1344. Then, the vertical optical flow v is calculated using s2, s5, and s6 according to Equation 7 or 9. y (S1345).

[0210] When using Equation 8 to calculate the vertical optical flow v y In this case, only s5 and s6 are needed. Therefore, in this case, the calculations of d2 and s2 can be omitted in S1343 and S1344.

[0211] The horizontal optical flow v calculated in this way x and vertical optical flow v y Substituting into Equation 11 generates the prediction block for the current block. When skipping the horizontal optical flow, v in Equation 11... x =0, therefore the horizontal optical flow v x This will not help in generating prediction blocks. Similarly, when skipping vertical optical flow, v y =0, therefore the vertical optical flow v y This will not help in generating prediction blocks. When both horizontal and vertical optical flows are skipped, v x =0 and v y =0, therefore the prediction block is generated by averaging the first reference block and the second reference block. That is, the prediction block is generated through typical motion compensation.

[0212] In Embodiment 1 described above, the texture complexity of the current block is estimated using pixels in a reference block. However, the texture complexity of the current block can be calculated using actual pixels in the current block. For example, the encoding device can calculate the horizontal and vertical complexity using the horizontal and vertical gradients of the pixels in the current block. That is, the horizontal complexity is calculated using the sum of the squares of the horizontal gradients of each pixel in the current block, and the vertical complexity is calculated using the sum of the squares of the vertical gradients. The horizontal and vertical complexity are then used to determine whether to skip the BIO process. In this case, unlike the encoding device, the decoding device does not know the pixels in the current block. Therefore, the decoding device cannot calculate the texture complexity in the same way as the encoding device. Therefore, the encoding device should separately signal the decoding device with information indicating whether to skip the BIO. That is, the skip determiner implemented in the decoding device decodes the information indicating whether to skip the BIO received from the encoding device and selectively skips the BIO process indicated by the information.

[0213] Implementation Method 2: BIO skipping based on the current block size and / or motion information encoding pattern

[0214] As mentioned above, based on the tree structure segmented from the CTU, the CU (i.e., the current block) corresponding to the leaf nodes of the tree structure can have various sizes.

[0215] When the size of the current block is small enough, the motion vector of the current block may have values ​​that are essentially similar to those of pixel-based or sub-block-based BIO, so the compensation effect obtained by performing BIO may be small. In this case, the complexity reduction gained by skipping BIO may be more beneficial than the accuracy loss caused by skipping BIO.

[0216] As described above, the motion vector of the current block can be encoded in either a merging mode or a mode used for motion vector differential encoding. When encoding the motion vector of the current block in merging mode, the motion vector of the current block is merged with the motion vectors of adjacent blocks. That is, the motion vector of the current block is set to be equal to the motion vectors of adjacent blocks. In this case, additional compensation effects can be obtained through BIO.

[0217] Therefore, in this embodiment, BIO processing is skipped based on at least one of the current block size or the pattern information indicating the encoding pattern of the motion vector.

[0218] Figure 14 This is a schematic diagram illustrating a process for performing motion compensation according to an embodiment of the present disclosure by selectively applying BIO processing based on the current block size and the encoding pattern of the motion vector. Although Figure 14The method of determining whether to skip a BIO is shown using both the current block size and the encoding mode of the motion vector, but either one is used within the scope of this disclosure.

[0219] The motion compensation device 1000 first determines whether the size of the current block CU, which is the block to be encoded, is less than or equal to a threshold size (S1402). When the size of the current block CU is greater than the threshold size, a prediction block for the current block is generated according to the BIO process (S1408).

[0220] On the other hand, when the size of the current block CU is less than or equal to the threshold size, it is determined whether the motion vector MV of the current block CU is encoded using the merging mode (S1404). When the motion vector is not encoded using the merging mode, the BIO process is skipped, and the prediction block of the current block is generated using typical motion compensation (S1406). When the motion vector is encoded using the merging mode, the prediction block of the current block is generated according to the BIO process (S1408).

[0221] For example, when w t ×h t When the motion vector of the current block, which is defined as 8×8 and has a size of 8×8, 8×4, 4×8, or 4×4 (which is less than or equal to 8×8), is not encoded through the merge mode, skip the BIO process.

[0222] When generating a prediction block based on the BIO processing in S1308, it can be further determined whether to skip BIO based on Implementation 1 (i.e., the texture complexity of the current block).

[0223] Implementation method 3: BIO skipping based on CVC and / or BCC

[0224] BIO is based on the following assumptions: objects in the video move at a constant speed and pixel values ​​change almost no matter. These assumptions are defined as the Constant Velocity Constraint (CVC) and the Constant Luminance Constraint (BCC), respectively.

[0225] When the bidirectional motion vectors (MVx0,MVy0) and (MVx1,MVy1) estimated based on the current block satisfy both conditions of CVC and BCC, the BIO operating based on the same assumptions may also have values ​​similar to the bidirectional motion vectors of the current block.

[0226] The fact that the bidirectional motion vectors (MVx0, MVy0) and (MVx1, MVy1) of the current block satisfy the CVC condition means that the two motion vectors have opposite signs and have the same motion displacement each time.

[0227] The fact that the bidirectional motion vector of the current block satisfies the BCC condition means that the first reference block located in the first reference image Ref0, indicated by (MVx0, MVy0), and the first reference block located in the second reference image Ref1, indicated by (MVx0, MVy0), are related by (MVx0, MVy0). x1 ,MV y1 The difference between the reference blocks indicated is 0. The difference between the two reference blocks can be calculated using the sum of absolute differences (SAD), sum of squared errors (SSE), etc.

[0228] As an example, CVC and BCC conditions can be represented as follows.

[0229] [Equation 18]

[0230] |MVx0 / τ0+MVx1 / τ1| <T CVC &|MVy0 / τ0+MVy1 / τ1| <T CVC

[0231] ∑ (i,j) |I (0) (i+MVx0,j+MVy0)-I (1) (i+MVx1,j+MVy1)| <T BCC

[0232] Among them, T CVC and T BCC These are the thresholds for the CVC and BCC conditions, respectively.

[0233] refer to Figure 4 BIO assumes that the optical flow (+v) used for the first reference image Ref0 is... x ,+v y ) and optical flow (-v) for the second reference image Ref1 x ,-v y The two motion vectors (MVx0, MVy0) and (MVx1, MVy1) have the same size but different signs. Therefore, for the bidirectional motion vectors (MVx0, MVy0) and (MVx1, MVy1) to satisfy the BIO assumption, the x-components MVx0 and MVx1 of the bidirectional motion vectors should have different signs, and the y-components MVy0 and MVy1 should also have different signs. Furthermore, to satisfy the CVC condition, the absolute value of MVx0 divided by τ0 (which is the temporal distance between the current image and the first reference image) should be equal to the absolute value of MVx1 divided by τ1 (which is the temporal distance between the current image and the second reference image). Similarly, the absolute values ​​of MVy0 divided by τ0 and MVy1 divided by τ1 should be equal to each other. Therefore, based on the concept of thresholds, the CVC condition described above can be derived.

[0234] When the SAD between the reference blocks referenced by the bidirectional motion vectors (MVx0, MVy0) and (MVx1, MVy1) is less than or equal to the threshold TBCC When the condition is met, the BCC condition is satisfied. Of course, other indicators that can represent the difference between two reference blocks (such as SSE) can be used instead of SAD.

[0235] Figure 15 This is a schematic diagram illustrating a process of performing motion compensation by selectively applying BIO processing based on CVC and BCC conditions according to an embodiment of the present disclosure.

[0236] The motion compensation device 1000 determines whether the bidirectional motion vectors (MVx0, MVy0) and (MVx1, MVy1) of the current block satisfy the CVC condition and the BCC condition (S1502). When both conditions are satisfied, the BIO process is skipped, and a prediction block is generated based on typical motion compensation (S1504).

[0237] On the other hand, if neither of the two conditions is met, a prediction block for the current block is generated based on the BIO process (S1506).

[0238] Although Figure 15 This example illustrates skipping BIO processing when both the CVC and BCC conditions are met, but this is just an example. Whether to skip BIO can be determined based on either the CVC or BCC condition.

[0239] Implementation Method 4: BIO Skip Based on the Change in Motion Vectors of Adjacent Blocks

[0240] When bidirectional motion vectors estimated on a per-block basis in neighboring blocks of the current block have similar values, optical flow estimated on a per-pixel basis or per-sub-block basis in the current block may also have similar values.

[0241] Therefore, it is possible to determine whether to skip the BIO of the current block based on the degree of variation in the motion vectors of neighboring blocks (e.g., variance or standard deviation). As an extreme example, when the variance of the motion vectors of neighboring blocks is 0, the optical flow per pixel or per sub-block in the current block may also have the same value as the motion vector of the current block, thus skipping the BIO.

[0242] As an example, the variance of the motion vectors of adjacent blocks can be expressed as Equation 19.

[0243] [Equation 19]

[0244] VAR MV =VAR x +VAR y

[0245]

[0246]

[0247] Where L is a set of adjacent blocks, and l is the total number of adjacent blocks. (m,n) represents the index of the adjacent block, and t∈(0,1).

[0248] Figure 16 This is a schematic diagram illustrating a process for performing motion compensation by selectively applying BIO processing based on the motion vector variance of adjacent blocks according to an embodiment of the present disclosure.

[0249] The motion compensation device 1000 compares the variance of the motion vectors of adjacent blocks with a predetermined threshold (S1602). When the variance of the motion vectors of adjacent blocks is less than the threshold, the BIO process is skipped, and a prediction block is generated based on typical motion compensation (S1604). On the other hand, when the variance of the motion vectors of adjacent blocks is greater than the threshold, a prediction block for the current block is generated based on the BIO process (S1606).

[0250] In embodiments 1 to 4, it has been described how to determine whether to skip a BIO using various conditions individually. However, this disclosure is not limited to using any single condition to determine whether to skip a BIO. Selectively combining multiple conditions described in this disclosure to determine whether to skip a BIO should also be interpreted as being within the scope of this disclosure. For example, selectively combining various methods described in this disclosure, such as determining whether to skip a BIO based on the size and texture complexity of the current block, determining whether to skip a BIO based on the size of the current block, the CVC condition, and / or the BCC condition, and determining whether to skip a BIO based on one or more of the CVC condition, the BCC condition, and the texture complexity of the current block, should be interpreted as being within the scope of this disclosure.

[0251] Although exemplary embodiments have been described for illustrative purposes, those skilled in the art will understand that various modifications and changes can be made without departing from the spirit and scope of the embodiments. For simplicity and clarity, exemplary embodiments have been described. Therefore, those skilled in the art will understand that the scope of the embodiments is not limited to those explicitly described above, but includes the claims and their equivalents.

[0252] Cross-references to related applications

[0253] This application claims priority to Korean Patent Application No. 10-2017-0109632, filed on August 29, 2017, and Korean Patent Application No. 10-2017-0175587, filed on December 19, 2017, the entire contents of which are incorporated herein by reference.

Claims

1. A video coding method for predicting the current block using bidirectional optical flow, the video coding method comprising the following steps: A first reference block is generated using a first motion vector referenced to a first reference image, and a second reference block is generated using a second motion vector referenced to a second reference image; The difference between the first reference block and the second reference block is derived, wherein the difference is the sum of absolute differences or the sum of squared errors; Based on the difference, a predicted block for the current block is generated by applying or skipping the bidirectional optical flow; and Residual blocks are generated based on the prediction blocks and then encoded. Wherein, when the difference is less than a predetermined threshold, the bidirectional optical flow is skipped, and Specifically, when the difference is greater than the predetermined threshold, the bidirectional optical flow is applied.

2. A video decoding method using bidirectional optical flow to predict the current block, the video decoding method comprising the following steps: A first reference block is generated using a first motion vector referenced to a first reference image, and a second reference block is generated using a second motion vector referenced to a second reference image; The difference between the first reference block and the second reference block is derived, wherein the difference is the sum of absolute differences or the sum of squared errors; Based on the difference, a predicted block for the current block is generated by applying or skipping the bidirectional optical flow; The current block is reconstructed based on the predicted block. Wherein, when the difference is less than a predetermined threshold, the bidirectional optical flow is skipped, and Specifically, when the difference is greater than the predetermined threshold, the bidirectional optical flow is applied.

3. A method for transmitting a bitstream associated with video data, the method comprising the following steps: Generate a bitstream containing the encoded data of the current block; as well as Send the bit stream, The steps for generating the bitstream include: A first reference block is generated using a first motion vector referenced to a first reference image, and a second reference block is generated using a second motion vector referenced to a second reference image; The difference between the first reference block and the second reference block is derived, wherein the difference is the sum of absolute differences or the sum of squared errors; Based on the difference, a predicted block for the current block is generated by applying or skipping bidirectional optical flow; and The residual signal is generated by subtracting the prediction block from the current block; and The bit stream is generated by encoding the residual signal. Wherein, when the difference is less than a predetermined threshold, the bidirectional optical flow is skipped, and When the difference is greater than the predetermined threshold, the bidirectional optical flow is applied in units of N×N sub-blocks divided from the current block, where N is a positive integer.

Citation Information

Patent Citations

  • Method and apparatus for inter prediction encoding / decoding an image using sub-pixel motion estimation

    CN101816183A

  • Method and apparatus for video encoding and method and apparatus for video decoding

    CN102934444A