Apparatus for coding video data, method of storing encoded video data bitstream
By applying bidirectional optical flow processing on a sub-block or pixel-by-pixel basis to generate the prediction block for the current block, the problem of high computational complexity of bidirectional optical flow is solved, achieving more efficient video encoding and decoding.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- SK TELECOM CO LTD
- Filing Date
- 2018-03-15
- Publication Date
- 2026-06-19
AI Technical Summary
Existing bidirectional optical flow (BIO) estimation methods have high computational complexity and cost, and there is a need to reduce their complexity and/or cost.
By applying bidirectional optical flow (BIO) processing on a sub-block or pixel-by-pixel basis, a prediction block for the current block is generated, and the BIO motion vector is determined to generate the prediction value, reducing search operations and signaling.
It reduces the computational complexity and cost of bidirectional optical flow (BIO), while improving prediction accuracy and reducing the computational burden in the encoding process.
Smart Images

Figure CN116708830B_ABST
Abstract
Description
[0001] This application is a divisional application of the invention patent application No. 201880034013.7 (International application No. PCT / KR2018 / 003044, International application date March 15, 2018, invention title "Method and apparatus for estimating optical flow for motion compensation"). Technical Field
[0002] This disclosure relates to video encoding or decoding. More specifically, this disclosure relates to a method for adaptive bidirectional optical flow estimation for inter-frame prediction compensation during video encoding. Background Technology
[0003] The statements in this section are provided only as background information in connection with this disclosure and may not constitute prior art.
[0004] In video coding, compression is performed using data redundancy in both spatial and temporal dimensions. Transform coding significantly reduces spatial redundancy, while predictive coding reduces temporal redundancy. Motion-compensated prediction is used to maximize temporal correlation along motion trajectories for this purpose. In this context, the primary objective of motion estimation is not to find “real” motion in the scene, but to maximize compression efficiency. In other words, the motion vector must provide an accurate prediction of the signal. Furthermore, since motion information must be transmitted as overhead in the compressed bitstream, it must enable compressed representation. Effective motion estimation in video coding is crucial for achieving high compression.
[0005] Motion is a crucial source of information in video sequences. Motion occurs not only due to the movement of objects but also due to camera movement. Apparent motion (also known as optical flow) captures the spatiotemporal changes in pixel intensity within an image sequence.
[0006] Two-way optical flow (BIO) is a motion estimation / compensation technique disclosed in JCTVC-C204 and VCEG-AZ05. This technique derives sample-level motion refinement based on assumptions about optical flow and stable motion. The currently discussed BIO estimation methods can refine motion vector information finely, which is an advantage. However, compared to traditional two-way prediction used for fine correction of motion vector information, it requires higher computational complexity, which is a disadvantage.
[0007] Non-patent document 1: JCTVC-C204 (E. Alshina, et al., Bi-directional optical flow, Joint Collaborative Team on Video Coding (JCT-VC) of ITU-T SG 16WP 3 and ISO / IEC JTC 1 / SC 29 / WG 11, 3rd Meeting: Guangzhou, CN, 7-15 October, 2010)
[0008] Non-patent literature 2: VCEG-AZ05 (E. Alshina, et al., Known tools performance investigation for next generation video coding, ITU-T SG 16 Question 6, Video Coding Experts Group (VCEG), 52nd Meeting: 19-26 June 2015, Warsaw, Poland) Summary of the Invention
[0009] Technical issues
[0010] The purpose of this disclosure is to reduce the complexity and / or cost of bidirectional optical flow (BIO).
[0011] Technical solution
[0012] According to one aspect of this disclosure, a method for encoding or decoding video data is provided, the method comprising the steps of: determining a first motion vector indicating a first corresponding region in a first reference image most similar to the current block, and a second motion vector indicating a second corresponding region in a second reference image most similar to the current block; generating a prediction block of the current block by applying bidirectional optical flow (BIO) processing on a sub-block basis; and reconstructing the current block using the generated prediction block. Here, generating the prediction block includes: determining a BIO motion vector for each sub-block constituting the current block; and generating predicted values for pixels constituting the corresponding sub-blocks based on the determined BIO motion vectors.
[0013] According to another aspect of the present invention, an apparatus for decoding video data is provided, the apparatus including a memory; and one or more processors, wherein the one or more processors are configured to perform the following operations: determining a first motion vector indicating a first corresponding region in a first reference image most similar to the current block, and a second motion vector indicating a second corresponding region in a second reference image most similar to the current block; generating a prediction block of the current block by applying bidirectional optical flow (BIO) processing on a sub-block basis; and reconstructing the pixels of the current block using the generated prediction block. Here, the operation of generating the prediction block includes: determining a BIO motion vector for each sub-block constituting the current block; and generating a prediction value for the pixels constituting the corresponding sub-block based on the determined BIO motion vectors.
[0014] BIO motion vector (v) x ,v y The vector (v) can be defined as the vector that minimizes the sum of squares of the flow differences of the individual pixels within the search region, which is defined by a predetermined masking window centered on each pixel in the sub-block. Alternatively, the BIO motion vector (v) x ,v y A vector can be defined as the vector that minimizes the sum of squares of the flow differences of all pixels located in the search area, which is defined by a predetermined masking window centered on some pixels in a sub-block. For example, the positions of pixels with and without masking windows can form a grid pattern, a horizontal stripe pattern, or a vertical stripe pattern.
[0015] In some implementations, instead of repeatedly calculating the flow difference, the repeated differences can be weighted according to the number of times the difference is repeated. In some examples, when determining the BIO motion vector of a sub-block located at the edge of the current block, the flow difference of pixels in regions outside the current block can be ignored.
[0016] In some implementations, a masking window may not be used. For example, BIO motion vectors (v x ,v y ) can be determined as the vector that minimizes the sum of squares of the flow differences of the individual pixels in the sub-block.
[0017] According to another aspect of the present invention, a method for decoding video data is provided, the method comprising the steps of: determining a first motion vector indicating a first corresponding region in a first reference image most similar to the current block and a second motion vector indicating a second corresponding region in a second reference image most similar to the current block; generating a prediction block of the current block by applying bidirectional optical flow (BIO) processing on a pixel-by-pixel basis; and reconstructing the pixels of the current block using the generated prediction block, wherein the step of generating the prediction block includes determining a BIO motion vector for each pixel constituting the current block, wherein the BIO motion vector is determined to be the vector that minimizes the sum of squares of the flow differences obtained for all masked pixels located in a plus-shaped or diamond-shaped masking window centered on the corresponding pixel; and generating a prediction value for the corresponding pixel based on the determined BIO motion vector.
[0018] According to another aspect of the present invention, an apparatus for decoding video data is provided, the apparatus including a memory; and one or more processors, wherein the one or more processors are configured to perform the following operations: determining a first motion vector indicating a first corresponding region in a first reference image most similar to the current block and a second motion vector indicating a second corresponding region in a second reference image most similar to the current block; generating a prediction block of the current block by applying bidirectional optical flow (BIO) processing on a pixel-by-pixel basis; and reconstructing the pixels of the current block using the generated prediction block. Here, the operation of generating the prediction block includes: determining a BIO motion vector for each pixel constituting the current block, wherein the BIO motion vector is determined to be the vector that minimizes the sum of squares of the flow differences obtained for all masked pixels located in a plus-shaped or diamond-shaped masking window centered on the corresponding pixel; and generating a prediction value for the corresponding pixel based on the determined BIO motion vector. Attached Figure Description
[0019] Figure 1 This is an exemplary block diagram of a video encoding apparatus capable of implementing the technology disclosed herein.
[0020] Figure 2 This is an example diagram of the adjacent blocks of the current block.
[0021] Figure 3 This is an exemplary block diagram of a video decoding apparatus capable of implementing the technology disclosed herein.
[0022] Figure 4 This is a reference diagram used to illustrate the basic concepts of BIO.
[0023] Figure 5a This is a flowchart illustrating a method for bidirectional motion compensation performed based on pixel-level BIO according to an embodiment of the present disclosure.
[0024] Figure 5bThis is a flowchart illustrating a method for bidirectional motion compensation performed by a sub-block-level BIO according to an embodiment of the present disclosure.
[0025] Figure 6 This is a diagram illustrating a 5×5 masking window and a 1×1 block of the current block for BIO-based motion compensation according to the first embodiment.
[0026] Figure 7 This is a diagram illustrating a non-rectangular masking window according to the second embodiment, which can be used to determine pixel-level BIO motion vectors.
[0027] Figure 8 This is a diagram illustrating a diamond-shaped masking window for determining pixel-level BIO motion vectors and a 1×1 block of the current block according to the second embodiment.
[0028] Figure 9 This is a diagram illustrating a 5×5 masking window and a 4×4 sub-block for determining the sub-block level BIO motion vector according to the third embodiment.
[0029] Figure 10a This is a diagram used to illustrate the difference used in calculating the BIO motion vectors at the sub-block level in an overlapping manner.
[0030] Figure 10b This is an exemplary diagram showing the weights of the individual pixel positions used in determining the differences in the sub-block level BIO motion vector.
[0031] Figure 11 This is a diagram illustrating a diamond-shaped masking window and a 4×4 sub-block for determining the sub-block level BIO motion vector according to the fourth embodiment.
[0032] Figure 12 This diagram illustrates three types of positions of pixels in a sub-block where a masking window is applied, according to the fifth embodiment.
[0033] Figure 13 This is an illustration of a 5×5 masking window used in determining the sub-block level BIO motion vector according to the fifth embodiment, and a 4×4 sub-block in a grid pattern obtained by sampling pixels to which the masking window is applied.
[0034] Figure 14 This is an illustration of a diamond-shaped masking window and predicted pixels in a 4×4 sub-block for BIO-based motion compensation according to the sixth embodiment.
[0035] Figure 15 This is a diagram illustrating an example of weighted averages for each pixel in a sub-block according to the seventh embodiment.
[0036] Figure 16a An example is shown where a sub-block is located at the edge of a 16×16 current block that includes 16 4×4 sub-blocks.
[0037] Figure 16b This is an example diagram showing the weights of the pixel position differences used for the BIO motion vectors of the 4×4 sub-block located at the top left corner of the current 16×16 block. Detailed Implementation
[0038] In the following description, some embodiments of the invention will be described in detail with reference to the accompanying drawings. It should be noted that when adding reference numerals to constituent elements in the various drawings, similar reference numerals refer to similar elements even though the elements are shown in different drawings. Furthermore, in the following description of the invention, detailed descriptions of known functions and configurations incorporated herein will be omitted where such descriptions might make the subject matter of the invention relatively unclear.
[0039] The techniques disclosed herein generally relate to reducing the complexity and / or cost of bidirectional optical flow (BIO) techniques. BIO can be applied during motion compensation. Typically, BIO is used to calculate a motion vector for each pixel in the current block via optical flow and to update the prediction value at the corresponding pixel based on the motion vector value calculated for each pixel.
[0040] Figure 1 This is an exemplary block diagram of a video encoding apparatus capable of implementing the technology disclosed herein.
[0041] The video encoding apparatus includes a block divider 110, a predictor 120, a subtractor 130, a transformer 140, a quantizer 145, an encoder 150, an inverse quantizer 160, an inverse transformer 165, an adder 170, a filter unit 180, and a memory 190. Each element of the video encoding apparatus can be implemented as a hardware chip or as software, and the microprocessor can be implemented to execute the software functions corresponding to each element.
[0042] Block segmenter 110 divides each image constituting the video into multiple coding tree units (CTUs), and then uses a tree structure to recursively segment the CTUs. The leaf nodes in the tree structure are coding units (CUs), which are the basic units of encoding. A quadtree (QT) structure where nodes (or parent nodes) are divided into four child nodes (or child nodes) of the same size, or a quadtree plus binary tree (QTBT) structure combining a QT structure and a binary tree (BT) structure where nodes are divided into two child nodes, can be used as the tree structure. That is, a CTU can be divided into multiple CUs using a QTBT.
[0043] In a Quadtree Plus Binary Tree (QTBT) structure, the CTU can be partitioned first according to the QT structure. Quadtree partitioning can be repeated until the size of the partitioned blocks reaches the minimum block size (MinQTSize) allowed for leaf nodes in the QT. If the leaf nodes of the QT are not larger than the maximum block size (MaxBTSize) allowed for root nodes in the BT, they can be further partitioned into a BT structure. BT can have various partitioning types. For example, in some examples, there can be two partitioning types: one that horizontally partitions a node's block into two blocks of the same size (i.e., symmetrical horizontal partitioning), and another that vertically partitions a node's block into two blocks of the same size (i.e., symmetrical vertical partitioning). Furthermore, there can be partitioning types that asymmetrically partition a node's block into two blocks. Asymmetrical partitioning can include partitioning a node's block into two rectangular blocks with a 1:3 size ratio, or partitioning a node's block diagonally.
[0044] The segmentation information generated by the block segmenter 110 by segmenting the CTU according to the QTBT structure is encoded by the encoder 150 and sent to the video decoding device.
[0045] The block corresponding to the CU (i.e., the leaf node of QTBT) to be encoded or decoded is called the "current block".
[0046] Predictor 120 generates a prediction block by predicting the current block. Predictor 120 includes an intra-frame predictor 122 and an inter-frame predictor 124.
[0047] Typically, each current block within an image can be predicted individually. This prediction can usually be accomplished using intra-frame prediction techniques, which use data from the image containing the current block, and inter-frame prediction techniques, which use data from previously encoded images for the image containing the current block. Inter-frame prediction includes one-way prediction and two-way prediction.
[0048] For each inter-frame prediction block, a set of motion information is available. This set of motion information can include motion information about the forward and backward prediction directions. Here, the forward and backward prediction directions are two prediction directions in a bidirectional prediction mode, and the terms "forward" and "backward" do not necessarily have a geometric meaning. Rather, they generally correspond to whether a reference image is displayed before ("backward direction") or after ("forward direction") the current image. In some examples, the "forward" and "backward" prediction directions may correspond to reference image list 0 (RefPicList0) and reference image list 1 (RefPicList1) for the current image.
[0049] For each prediction direction, motion information includes a reference index and a motion vector. The reference index can be used to identify reference images in the current list of reference images (RefPicList0 or RefPicList1). The motion vector has a horizontal component (x) and a vertical component (y). Typically, the horizontal component represents the horizontal displacement in the reference image relative to the current block's position in the current image, which is needed to locate the reference block's x-coordinate. The vertical component represents the vertical displacement in the reference image relative to the current block's position, which is needed to locate the reference block's y-coordinate.
[0050] Inter-frame predictor 124 searches for the most similar block to the current block in a reference image that was encoded and decoded earlier than the current image, and uses the searched block to generate a predicted block for the current block. Then, the inter-frame predictor generates a motion vector corresponding to the displacement between the current block in the current image and the predicted block in the reference image. Typically, motion estimation is performed on the luma component, and the motion vector calculated based on the luma component is used for both the luma and chroma components. Information including information about the reference image and motion information used to predict the motion vector for the current block is encoded by encoder 150 and sent to the video decoding device.
[0051] The examples in this disclosure generally relate to bidirectional optical flow (BIO) technology. Some of the techniques in this disclosure can be performed by inter-frame predictor 124. For example, inter-frame predictor 124 can implement the following references. Figures 4 to 13 The techniques described in this disclosure are as follows. In other words, after determining the bidirectional motion vectors of the current block, the inter-frame predictor 124 can generate a predicted block for the current block based on image pixels or sub-blocks using motion compensation according to BIO techniques. In other examples, one or more other components of the encoding apparatus may additionally participate in implementing the techniques of this disclosure. Furthermore, since there is an explicit equation for calculating the motion vectors, search operations for acquiring motion information and signaling for transmitting motion information are not required.
[0052] Various methods can be used to minimize the number of bits required to encode motion information.
[0053] For example, when the reference image and motion vector of the current block are the same as those of the neighboring blocks, motion information about the current block can be sent to the decoding device by encoding information used to identify the neighboring blocks. This method is called "merging mode".
[0054] In merge mode, the inter-frame predictor 124 selects a predetermined number of merge candidate blocks (hereinafter referred to as "merge candidates") from the neighboring blocks of the current block.
[0055] like Figure 2As shown, neighboring blocks used to derive merge candidates can be all or some of the left block L, top block A, top right block AR, bottom left block BL, and top left block AL in the current image. Additionally, blocks located in a reference image other than the current image in which the current block is located (this reference image may be the same as or different from the reference image used to predict the current block) can be used as merge candidates. For example, a co-located block in the reference image that is at the same position as the current block, or a block adjacent to that co-located block, can also be used as a merge candidate.
[0056] The inter-frame predictor 124 uses such neighboring blocks to configure a merge list including a predetermined number of merge candidates. It selects merge candidates from the merge list to be used as motion information about the current block and generates merge index information to identify the selected candidate. The generated merge index information is encoded by the encoder 150 and sent to the decoding device.
[0057] Another way to encode motion information is to encode the difference in motion vectors.
[0058] In this method, the inter-frame predictor 124 uses neighboring blocks of the current block to derive predicted motion vector candidates for the current block's motion vector. As neighboring blocks used to derive the predicted motion vector candidates, [the following can be used] Figure 2 The diagram shows all or part of the left block L, top block A, top right block AR, bottom left block BL, and top left block AL adjacent to the current block in the current image. Additionally, blocks located in a reference image (which may be the same as or different from the reference image used to predict the current block) other than the current image in which the current block is located can be used as neighboring blocks for deriving predicted motion vector candidates. For example, co-located blocks in the reference image that are at the same position as the current block, or blocks adjacent to the current block at the same position, can also be used as merging candidates.
[0059] The inter-frame predictor 124 uses motion vectors from neighboring blocks to derive candidate predicted motion vectors, and uses these candidate predicted motion vectors to determine the predicted motion vector for the current block. The motion vector difference is then calculated by subtracting the predicted motion vector from the current block's motion vector.
[0060] Predicted motion vectors can be obtained by applying a predetermined function (e.g., a function used to calculate the median, mean, etc.) to the predicted motion vector candidates. In this case, the video decoding device also knows the predetermined function. Furthermore, since the neighboring blocks used to derive the predicted motion vector candidates have already been encoded and decoded, the video decoding device also knows the motion vectors of the neighboring blocks. Therefore, the video encoding device does not need to encode the information used to identify the predicted motion vector candidates. Thus, in this case, information about the motion vector difference and information about the reference image used to predict the current block are encoded.
[0061] The predicted motion vector can be determined by selecting any one of the predicted motion vector candidates. In this case, the information used to identify the selected predicted motion vector candidate is further encoded along with information about the motion vector difference and information about the reference image used to predict the current block.
[0062] Intra-predictor 122 uses pixels (reference pixels) surrounding the current block in the current image, including the current block, to predict pixels in the current block. Multiple intra-prediction modes exist depending on the prediction direction, and the surrounding pixels and equations to be used are defined differently for each prediction mode. Specifically, intra-predictor 122 can determine the intra-prediction mode to use in encoding the current block. In some examples, intra-predictor 122 can encode the current block using several intra-prediction modes and select a suitable intra-prediction mode from the tested modes. For example, intra-predictor 122 can use rate distortion analysis of several tested intra-prediction modes to calculate rate distortion values and can select the intra-prediction mode with the best rate distortion characteristics among the tested modes.
[0063] Intra-predictor 122 selects one intra-prediction mode from multiple intra-prediction modes and uses neighboring pixels (reference pixels) and equations determined according to the selected intra-prediction mode to predict the current block. Information about the selected intra-prediction mode is encoded by encoder 150 and sent to video decoding device.
[0064] Subtractor 130 subtracts the prediction block generated by intra-predictor 122 or inter-predictor 124 from the current block to generate a residual block.
[0065] Transformer 140 transforms the residual signal in the residual block with pixel values in the spatial domain into transform coefficients in the frequency domain. Transformer 140 can transform the residual signal in the residual block by using the size of the current block as the transform unit, or it can divide the residual block into multiple smaller sub-blocks and transform the residual signal in transform units corresponding to the sub-block sizes. Various methods can be used to divide the residual block into smaller sub-blocks. For example, the residual block can be divided into sub-blocks of the same predefined size, or it can be divided using a quadtree (QT) with the residual block as the root node.
[0066] The quantizer 145 quantizes the transformation coefficients output from the converter 140 and outputs the quantized transformation coefficients to the encoder 150.
[0067] Encoder 150 uses an encoding scheme such as CABAC to encode the quantized transform coefficients to generate a bitstream. Encoder 150 encodes information associated with block segmentation, such as CTU size, MinQTSize, MaxBTSize, MaxBTDepth, MinBTSize, QT segmentation flag, BT segmentation flag, and segmentation type, so that the video decoding device segments the blocks in the same way as the video encoding device.
[0068] Encoder 150 encodes information about the prediction type, indicating whether the current block is encoded by intra-frame prediction or inter-frame prediction, and encodes the intra-frame prediction information or inter-frame prediction information according to the prediction type.
[0069] When performing intra-frame prediction on the current block, the syntax elements for the intra-frame prediction mode are encoded as intra-frame prediction information. When performing inter-frame prediction on the current block, encoder 150 encodes the syntax elements for the inter-frame prediction information. The syntax elements for the inter-frame prediction information include the following information:
[0070] (1) Mode information, which indicates whether the motion information of the current block is encoded in a merging mode or a mode used to encode motion vector differences.
[0071] (2) Syntactic elements related to motion information
[0072] When encoding motion information in a merge mode, encoder 150 can encode merge index information as a syntax element of the motion information, which indicates the merge candidates selected from among the merge candidates for extracting motion information about the current block.
[0073] On the other hand, when motion information is encoded in a mode used to encode motion vector differences, information about the motion vector differences and information about the reference image are encoded as syntax elements of the motion information. When a predicted motion vector is determined by selecting one of multiple predicted motion vector candidates, the syntax elements of the motion information also include predicted motion vector identification information for identifying the selected candidate.
[0074] Inverse quantizer 160 inverse quantizes the quantized transform coefficients output from quantizer 145 to generate transform coefficients. Inverse transformer 165 transforms the transform coefficients output from inverse quantizer 160 from the frequency domain to the spatial domain and reconstructs the residual block.
[0075] Adder 170 adds the reconstructed residual block to the prediction block generated by predictor 120 to reconstruct the current block. Pixels in the reconstructed current block are used as reference samples when performing intra-frame prediction for the next block.
[0076] Filter unit 180 performs deblocking filtering on the boundaries between reconstructed blocks to eliminate block artifacts caused by block-by-block encoding / decoding, and stores the blocks in memory 190. When all blocks in an image have been reconstructed, the reconstructed image is used as a reference image for inter-frame prediction of blocks in subsequent images to be encoded.
[0077] The following will describe the video decoding device.
[0078] Figure 3 This is an exemplary block diagram of a video decoding apparatus capable of implementing the technology disclosed herein.
[0079] The video decoding device includes a decoder 310, an inverse quantizer 320, an inverse converter 330, a predictor 340, an adder 350, a filter unit 360, and a memory 370. For example... Figure 2 In the case of the video encoding device shown, each element of the video encoding device can be implemented as a hardware chip or as software, and the microprocessor can be implemented to execute the software functions corresponding to each element.
[0080] Decoder 310 decodes the bitstream received from the video encoding device, extracts information related to block segmentation to determine the current block to be decoded, and extracts prediction information and information about the residual signal required to reconstruct the current block.
[0081] Decoder 310 extracts information about the CTU size from the Sequence Parameter Set (SPS) or Picture Parameter Set (PPS), determines the CTU size, and segments the image into CTUs of the determined size. Then, the decoder identifies the CTU as the top level (i.e., the root node) of the tree structure and extracts segmentation information about the CTU to segment it using the tree structure. For example, when segmenting the CTU using a QTBT structure, a first flag (QT_split_flag) related to the QT segmentation is extracted to segment each node into four nodes in the sub-layer. For nodes corresponding to leaf nodes of the QT, a second flag (BT_split_flag) and segmentation type information related to the BT segmentation are extracted to segment the leaf node into a BT structure.
[0082] When determining the current block to be decoded through tree structure segmentation, decoder 310 extracts information about the prediction type indicating whether the current block is an intra-frame prediction or an inter-frame prediction.
[0083] When the prediction type information indicates intra-prediction, the decoder 310 extracts the syntax elements (intra-prediction mode) of the intra-prediction information about the current block.
[0084] When the prediction type information indicates inter-frame prediction, the decoder 310 extracts syntax elements for the inter-frame prediction information. First, the decoder extracts mode information indicating the coding mode in which motion information about the current block is encoded among multiple coding modes. Here, the multiple coding modes include a merge mode including a skip mode and a motion vector difference coding mode. When the mode information indicates a merge mode, the decoder 310 extracts merge index information indicating the merge candidates from which the motion vector of the current block will be derived, as a syntax element for motion. On the other hand, when the mode information indicates a motion vector difference coding mode, the decoder 310 extracts information about the motion vector difference and information about the reference picture referenced by the motion vector of the current block, as syntax elements for motion vector. When the video encoding apparatus uses one of multiple predicted motion vector candidates as the predicted motion vector of the current block, predicted motion vector identification information is included in the bitstream. Therefore, in this case, not only information about the motion vector difference and the reference picture is extracted, but also the predicted motion vector identification information is extracted as a syntax element for motion vector.
[0085] Decoder 310 extracts information about the quantized transform coefficients of the current block as information about the residual signal.
[0086] Inverse quantizer 320 performs inverse quantization on the quantized transform coefficients. Inverse transformer 330 inverse transforms the inverse-quantized transform coefficients from the frequency domain to the spatial domain to reconstruct the residual signal, thereby generating the residual block of the current block.
[0087] Predictor 340 includes intra-frame predictor 342 and inter-frame predictor 344. Intra-frame predictor 342 is activated when the prediction type of the current block is intra-frame prediction, and inter-frame predictor 344 is activated when the prediction type of the current block is inter-frame prediction.
[0088] The intra predictor 342 determines the intra prediction mode of the current block from multiple intra prediction modes based on the syntax elements about the intra prediction mode extracted from the decoder 310, and predicts the current block using reference pixels around the current block based on the intra prediction mode.
[0089] Inter-frame predictor 344 uses the syntax elements of the intra-frame prediction mode extracted from decoder 310 to determine motion information about the current block, and uses the determined motion information to predict the current block.
[0090] First, the inter-frame predictor 344 examines the inter-frame prediction mode information extracted from the decoder 310. When the mode information indicates a merging mode, the inter-frame predictor 344 configures a merging list including a predetermined number of merging candidates using the neighboring blocks of the current block. The inter-frame predictor 344 configures the merging list in the same manner as the inter-frame predictor 124 of the video encoding apparatus. Then, it selects a merging candidate from the merging candidates in the merging list using merging index information sent from the decoder 310. Motion information about the selected merging candidate (i.e., the motion vector and reference picture of the merging candidate) is set to the motion vector and reference picture of the current block.
[0091] On the other hand, when the mode information indicates a motion vector difference coding mode, the inter-frame predictor 344 uses the motion vectors of neighboring blocks of the current block to derive predicted motion vector candidates, and uses these candidates to determine the predicted motion vector for the current block. The inter-frame predictor 344 derives the predicted motion vector candidates in the same manner as the inter-frame predictor 124 of the video coding apparatus. When the video coding apparatus uses one of multiple predicted motion vector candidates as the predicted motion vector for the current block, the syntax elements of the motion information include predicted motion vector identification information. Therefore, in this case, the inter-frame predictor 344 can select the candidate indicated by the predicted motion vector identification information from among the predicted motion vector candidates as the predicted motion vector. However, when the video coding apparatus uses a predefined function for multiple predicted motion vector candidates to determine the predicted motion vector, the inter-frame predictor can use the same function used by the video coding apparatus to determine the predicted motion vector. Once the predicted motion vector for the current block is determined, the inter-frame predictor 344 adds the predicted motion vector to the motion vector difference sent from the decoder 310 to determine the motion vector for the current block. The reference image referenced by the motion vector of the current block is determined using the information about the reference image transmitted from the decoder 310.
[0092] When the motion vector and reference image of the current block are determined in merge mode or motion vector difference coding mode, the inter-frame predictor 344 uses the block in the reference image at the position indicated by the motion vector to generate the predicted block of the current block.
[0093] Examples of this disclosure generally relate to bidirectional optical flow (BIO) technology. The predetermined techniques of this disclosure can be implemented by inter-frame predictor 344. For example, inter-frame predictor 344 can implement the following references. Figures 4 to 13 The techniques described in this disclosure are as follows. In other words, the inter-frame predictor 124 can generate a predicted block for the current block based on image pixels or sub-blocks, using motion compensation according to BIO techniques. In other examples, one or more other components of the decoding apparatus may additionally participate in implementing the techniques of this disclosure.
[0094] Adder 350 reconstructs the current block by adding the residual block output from the inverse transform to the prediction block output from the inter-frame predictor or intra-frame predictor. Pixels in the reconstructed current block are used as reference samples for intra-frame prediction of blocks to be decoded later.
[0095] Filter unit 360 performs deblocking filtering on the boundaries between reconstructed blocks to eliminate block artifacts caused by block-by-block decoding, and stores the deblocked blocks in memory 370. When all blocks in an image are reconstructed, the reconstructed image is used as a reference image for inter-frame prediction of blocks in subsequent images to be decoded.
[0096] This disclosure relates to using bidirectional optical flow (BIO) estimation techniques to refine motion vector information obtained through inter-frame prediction. The encoding device performs motion estimation and compensation at the coded unit (CU) level during the inter-frame prediction operation, and then sends the resulting motion vector (MV) values to the decoding device. The encoding and decoding devices can further refine the MV values using BIO in units smaller than the CU or in sub-blocks (i.e., sub-CUs). That is, BIO can precisely compensate for the motion of the coded block CUs from n×n blocks in 1×1 blocks (i.e., pixels) based on the size of each block. Furthermore, since there is an explicit equation for calculating the motion vectors, search operations for acquiring motion information and signaling for transmitting motion information are not required.
[0097] Figure 4 This is a reference diagram used to illustrate the basic concepts of BIO.
[0098] BIO used for video encoding and decoding is based on the following assumptions: motion vector information should be bidirectional (or double-predictive) information, and the motion is a stable motion that moves sequentially along the time axis. Figure 4 The current image (Image B) is shown with reference to two reference images, Ref0 and Ref1.
[0099] First, assume that bidirectional motion vectors MV0 and MV1 have been determined by (normal) bidirectional motion prediction for the current block to be encoded in the current image, where MV0 and MV1 indicate the corresponding regions (i.e., reference blocks) in reference images Ref0 and Ref1 that are most similar to the current block. The two bidirectional motion vectors have values representing the motion of the current block. That is, these values are obtained by treating the current block as a unit and estimating and compensating for the motion of that unit as a whole.
[0100] exist Figure 4 In the example, P0 is a pixel in reference image Ref0 indicated by motion vector MV0, corresponding to pixel P in the current block, and P1 is a pixel in reference image Ref1 indicated by motion vector MV1, corresponding to pixel P in the current block. Furthermore, it is assumed that... Figure 4 The motion of pixel P in the current block is slightly different from the overall motion of the current block. For example, when located in... Figure 4 The object at pixel A in Ref0 is moved to pixel B in Ref1 via pixel P in the current block of the current image. Pixels A and B can have values that are very similar to each other. Furthermore, in this case, the point in Ref0 most similar to pixel P in the current block is not P0 indicated by the motion vector MV0, but rather the point where P0 has been moved by a predetermined displacement vector (v). x τ0,v y Pixel A of τ0). The point in Ref1 most similar to pixel P in the current block is not P1 indicated by motion vector MV1, but rather P1 has been moved by a predetermined displacement vector (-v). x τ1,-v y Pixel B of τ1). In the following text, for simplicity, (v x ,v y This is called the "BIO motion vector".
[0101] Therefore, when predicting the value of pixel P in the current block of the current image, using the values of two reference pixels A and B allows for a more accurate prediction compared to using reference pixels P0 and P1 indicated by bidirectional motion vectors MV0 and MV1. As mentioned above, considering the BIO motion vectors (v x ,v y The concept of a reference pixel used to predict a pixel within the current block, specifying pixel-level motion changes within the current block, can be extended to sub-blocks within the current block.
[0102] The following section describes a theoretical approach for generating predicted values for pixels in the current block based on the BIO technique. For simplicity, it is assumed that BIO-based bidirectional motion compensation is performed on a pixel-by-pixel basis.
[0103] Suppose that bidirectional motion vectors MV0 and MV1 are predicted for the current block to be encoded in the current image using (normal) bidirectional motion prediction, where MV0 and MV1 indicate the corresponding regions (i.e., reference blocks) in reference images Ref0 and Ref1 that are most similar to the current block encoded in the current image. The decoding device can generate bidirectional motion vectors MV0 and MV1 based on the motion vector information included in the bitstream. Furthermore, the brightness value of the pixel in reference image Ref0, represented by motion vector MV0, corresponding to pixel (i,j) in the current block, is defined as I. (0) (i,j), and the brightness value of the pixel in the reference image Ref1, represented by motion vector MV1, corresponding to the pixel (i,j) in the current block, is defined as I. (1) (i,j).
[0104] The BIO motion vector (v) corresponding to the pixels in the current block can be used to... x ,v y The brightness value of pixel A in the reference image Ref0 is defined as follows: And the brightness value of pixel B in reference image Ref1 can be defined as Therefore, the flow difference Δ between pixel A and pixel B is usually defined as Equation 1 below.
[0105] [Formula 1]
[0106]
[0107] Here, I (k) (k = 0, 1) represents the brightness of the pixels in reference images Ref0 and Ref1, represented by motion vectors MV0 and MV1, corresponding to the pixels to be predicted in the current block. (v x ,v y ) is the BIO motion vector to be calculated. For simplicity, the positions (i,j) of pixels in reference images Ref0 and Ref1 are omitted from the terms of Equation 1 above. and They represent I respectively (k) The gradient has horizontal and vertical components. τ0 and τ1 represent the temporal distance between the current image and two reference images Ref0 and Ref1. τ0 and τ1 can be calculated based on the Picture Order Count (POC). For example, τ0 = POC(current) - POC(Ref0), and τ1 = POC(Ref1) - POC(current). Here, POC(current), POC(Ref0), and POC(Ref1) represent the POCs of the current image, reference image Ref0, and reference image Ref1, respectively.
[0108] Based on the assumption that the motion is locally consistent with the surrounding pixels, the BIO motion vector of the current pixel (i,j) to be predicted considers the difference Δ of all pixels (i',j') in a certain region Ω surrounding the current pixel (i,j) in Equation 1. That is, the BIO motion vector of the current pixel (i,j) can be determined as the vector that produces the least squares sum of the differences Δ[i',j'] obtained for each pixel in the certain region Ω, as shown in Equation 2.
[0109] [Equation 2]
[0110]
[0111] Here, (i',j') represents all pixels located within the search region Ω. Since the BIO motion vector (v) of the current pixel... x ,v y The objective function (Δ) at the current pixel position can be determined by calculating an explicit equation such as Equation 2, which makes the objective function (Δ) at the current pixel position... 2 The sum of all values is minimized, therefore there is no need to search for search operations to obtain detailed motion information or for signaling to send motion information.
[0112] Typically, the search region Ω can be defined as a masking window of size (2M+1)×(2N+1) centered at the current pixel (i,j). The structure and size of the masking window greatly influence the determination of the BIO motion vector (v). x ,v y The complexity and accuracy of the algorithm are considered. Therefore, the selection of the masking window is crucial for determining the BIO motion vector (v). x ,v y The algorithm is very important.
[0113] Once the BIO motion vector (v) of the current pixel is determined x ,v y When the current pixel (i,j) is in the bidirectional prediction value based on the BIO motion vector, the following formula can be used to calculate the bidirectional prediction value pred of the current pixel (i,j). BIO .
[0114] [Formula 3]
[0115]
[0116] In Equation 3, (I (0) +I (1) ) / 2 is a typical bidirectional prediction compensation, so the remaining term can be called BIO offset.
[0117] In the following text, reference will be made to Figure 5a and Figure 5bThis paper describes a bidirectional motion compensation method based on BIO (Bidirectional I / O). The method described below is used in both video encoding and decoding devices. Although... Figure 5a and Figure 5b The image is not shown, but it is assumed that the encoding device has encoded and decoded the image to be used as a reference image and stored the image in memory. It is also assumed that the decoding device has decoded the image to be used as a reference image and stored the image in memory.
[0118] Figure 5a This is a flowchart illustrating a method for bidirectional motion compensation performed based on pixel-level BIO according to an embodiment of the present disclosure.
[0119] First, the encoding device and the decoding device determine a first motion vector indicating a first corresponding region in the first reference image that is most similar to the current block, and determine a second motion vector indicating a second corresponding region in the second reference image that is most similar to the current block (S510).
[0120] The encoding and decoding devices determine the individual BIO motion vectors (v) corresponding to each object pixel in the current block by applying BIO processing on a pixel-by-pixel basis. x ,v y (S520).
[0121] The BIO motion vector (v) can be used x ,v y The vector is defined as the vector that minimizes the sum of squares of the flow differences of each pixel (i',j') in the search region (i.e., Equation 2), which is defined by a predefined masking window centered on the corresponding object pixel (i,j).
[0122] In some examples, when determining the BIO motion vector of a pixel located at the edge of the current block, the flow difference of pixels in the region outside the current block can be ignored.
[0123] In some examples, a rectangular masking window with dimensions of (2M+1)×(2N+1) can be used. Preferably, for example, a square masking window with dimensions of 5×5 can be used. In some other examples, a non-square masking window with a shape such as a plus sign or a rhombus can be used.
[0124] The encoding and decoding devices are based on pixel-based BIO motion vectors (v x ,v y The encoding and decoding units use bidirectional prediction to generate the prediction block for the current block (S530). That is, the encoding and decoding units use the respective BIO motion vectors to generate bidirectional prediction values for the object pixels based on Equation 3.
[0125] Finally, the encoding and decoding devices use the generated prediction blocks to encode or decode the current block (S540).
[0126] Figure 5b This is a flowchart illustrating a method for bidirectional motion compensation performed based on sub-block level BIO according to an embodiment of the present disclosure.
[0127] First, the encoding device and the decoding device determine a first motion vector indicating a first corresponding region in the first reference image that is most similar to the current block, and determine a second motion vector indicating a second corresponding region in the second reference image that is most similar to the current block (S560).
[0128] The encoding and decoding devices determine the individual BIO motion vectors (v) corresponding to each sub-block within the current block by applying BIO processing on a sub-block basis. x ,v y (S570).
[0129] The BIO motion vector (v) can be used x v y The vector is defined as the vector that minimizes the sum of squares of the flow differences of pixels (i', j') located in each search region (i.e., Equation 2), which is defined by a predefined masking window centered on each pixel (i, j) within the sub-block. Alternatively, the BIO motion vector (v) can be used. x ,v y The vector is defined as the vector that minimizes the sum of squares of the flow differences of pixels (i', j') located in each search region, which is defined by a predetermined masking window centered on some pixels (i, j) within a sub-block. For example, the positions of pixels with masking windows applied and the positions of pixels without masking windows applied can form a grid pattern, a horizontal stripe pattern, or a vertical stripe pattern.
[0130] In some implementations, instead of repeatedly calculating the flow difference, the repeated differences can be weighted according to the number of times the difference is repeated. In some examples, when determining the BIO motion vector of a sub-block located at the edge of the current block, the flow difference of pixels in regions outside the current block can be ignored.
[0131] In some implementations, a rectangular masking window with dimensions of (2M+1)×(2N+1) can be used. In some implementations, the masking window can be a square shape (e.g., 5×5 size). In some other implementations, a non-square shape such as a plus sign or a rhombus can be used. In some implementations, a masking window may not be used. For example, the BIO motion vector (v... x ,v yThe vector is determined to be the one that minimizes the sum of the squares of the flow differences of each pixel in the sub-block.
[0132] The encoding and decoding devices are based on BIO motion vectors (v) calculated on a sub-block basis. x ,v y The prediction block for the current block is generated using bidirectional prediction (S580). All pixels in the sub-block share the BIO motion vector (v) calculated based on the sub-block. x ,v y That is, using a BIO motion vector (v) determined for the object sub-block. x ,v y Equation 3 is used to calculate the BIO-based prediction values of all pixels in the object sub-block.
[0133] Finally, the encoding and decoding devices use the generated prediction blocks to encode or decode the current block (S590).
[0134] In some embodiments of this disclosure, BIO is applied on a pixel-level basis. In other embodiments, BIO is applied on a block-level basis. Hereinafter, embodiments of pixel-level BIO processing will be described first, followed by embodiments of block-level BIO processing.
[0135] In the first and second embodiments described below, BIO is applied on a pixel-level basis. The masking window used in the BIO processing can have a size of (2M+1)×(2N+1) and is centered at the current pixel (i,j). For simplicity, in the following description, it is assumed that the width and height of the masking window are equal (i.e., M=N). When generating the prediction block for the current block, pixel-level BIO obtains pixel-level BIO motion vectors and generates pixel-level bidirectional prediction values based on the obtained BIO motion vectors.
[0136] First Implementation Method
[0137] In this embodiment, a rectangular masking window is used to calculate pixel-level BIO motion vectors. In this embodiment, reference will be made to... Figure 6 Describes the total number of differences Δ required to determine the BIO motion vector of the pixel to be predicted. Figure 6 This example illustrates a 5x5 masking window 610 in the current block and a pixel 621 to be predicted. A pixel 621 to be predicted in the current block is... Figure 6 The center of the masking window 610, indicated by the shaded line, and the total number of pixels within the masking window 610, which includes the pixel 621 to be predicted, is 25. Therefore, the BIO motion vector (v) of the pixel 621 to be predicted in the current block is determined. x ,v yThe required number of interpolation values Δ is 25. Finally, the BIO motion vector (v) of the pixel to be predicted is estimated by substituting the 25 interpolation values Δ into Equation 2. x ,v y Once the BIO motion vector ((v) is determined based on optical flow...) x ,v y The bidirectional prediction values of the object pixels in the current block are calculated according to Formula 3. This process is repeated for each pixel in the current block to produce prediction values for all pixels in the prediction block that constitutes the current block.
[0138] However, when determining the BIO motion vector of a pixel located at the edge of the current block, the flow difference of that pixel can be ignored even if the pixel in the region outside the current block is included in the masking window.
[0139] Second Implementation Method
[0140] Figure 7 This is a diagram illustrating a non-rectangular masking window used for motion compensation based on BIO according to the second embodiment.
[0141] Unlike the first embodiment which uses a square masking window, this embodiment employs masking windows of various shapes. Figure 7 The document presents two types of masking windows (i.e., masking windows with plus sign shapes and diamond shapes), but this disclosure does not preclude the use of masking windows of any shape other than rectangular masking windows. Using such masking windows reduces the complexity of processing all pixels in the square masking window used in the first embodiment. Figure 7 As shown, the size of the plus sign and diamond-shaped masking windows can be scaled according to the value of parameter M.
[0142] In this embodiment, reference will be made to Figure 8 Describes the total number of differences Δ required to determine the BIO motion vector of a sub-block.
[0143] Figure 8 An example is shown: a rhombus-shaped masking window 810 with M=2 and a pixel 821 to be predicted in the current block. A pixel 821 to be predicted in the current block is... Figure 8 The center of the masking window 810, indicated by the shaded line, and the number of pixels within the masking window 810 that includes the pixel 821 to be predicted, is 13. Therefore, the BIO motion vector (v) of the pixel 821 to be predicted in the current block is determined. x ,v y The required number of differences Δ is 13. Finally, the BIO motion vector (v) of the pixel 821 to be predicted is estimated by substituting the 13 differences Δ into Equation 2. x ,v yIn this implementation, these processes are performed for each pixel in the current block to calculate the BIO motion vector corresponding to each pixel.
[0144] However, when determining the BIO motion vector of a pixel located at the edge of the current block, the flow difference of that pixel can be ignored even if the pixel in the region outside the current block is included in the masking window.
[0145] In the third to eighth embodiments described below, BIO-based motion compensation is applied at the block level. During sub-block-level BIO motion compensation, the sub-block size can be M×N (where M and N are integers). All pixels in the M×N sub-block share the BIO motion vector (v) calculated at the sub-block level. x ,v y That is, using the calculated BIO motion vector (v) x ,v y The optical flow-based bidirectional prediction of all pixels in the M×N sub-block is calculated according to Equation 3. Although the method of this disclosure does not limit the size of the sub-block, it should be noted that, for simplicity, the BIO processing is described based on 4×4 sub-blocks in the following embodiments.
[0146] Third Implementation Method
[0147] In this implementation, to determine a BIO motion vector for a sub-block, a rectangular masking window centered on each pixel in the sub-block is applied to each pixel, and the difference Δ in Equation 1 is estimated for each pixel located within the masking window. Finally, these differences are substituted into Equation 2 to estimate the BIO motion vector corresponding to the sub-block.
[0148] Figure 9 An example of a 5×5 masking window 910 and a 4×4 sub-block 920 according to the scheme proposed in this embodiment is shown. Figure 9 The masking window 910 shown has a square shape with M=2. The current pixel (i,j) 921 in sub-block 920 corresponds to Figure 9 The center of the masking window 910 for the shaded area. For a pixel (i,j) of the sub-block, the total number of pixels in the masking window 910 is 25 (=(2M+1)×(2M+1)=5×5). Therefore, based on the size of the sub-block and the size of the masking window, the total number of differences required to determine the BIO motion vector of the 4×4 sub-block is 400 (=16×25). The BIO motion vector of the sub-block is determined to be the vector that minimizes the sum of the squares of these differences.
[0149] It should be noted that among the 400 differences mentioned above, all but 64 are repetitions of those 64 differences. For example, as... Figure 10a As shown, most pixels in masking window 1010a, centered on the pixel at position (0, 0) in sub-block 1020, are also located within masking window 1010b, centered on the pixel at position (1, 0) in sub-block 1020. Therefore, instead of repeatedly calculating the overlapping differences, the calculation of Equation 2 can be simplified by assigning weights to the overlapping differences based on the number of overlaps. For example, when a 5×5 masking window is applied to a 4×4 sub-block, a total of 64 distinct differences are calculated, and then a corresponding weight can be assigned to each difference. The BIO motion vector (v) can then be determined. x ,v y To minimize the sum of squares of the weighted differences. Figure 10b In the diagram, the numbers marked on the pixels are weighted values based on the amount of overlap. Here, the highlighted 4×4 block indicates the position of the sub-block.
[0150] Fourth Implementation Method
[0151] Unlike the third embodiment which uses a rectangular masking window, this embodiment employs masking windows with various patterns (such as...). Figure 7 (As shown). Using such a masking window reduces the complexity of processing all pixels within a rectangular masking window.
[0152] Figure 11 An example is shown: a rhombus-shaped masking window 1110 and a 4×4 sub-block 1120. For example... Figure 11 As shown, when using a diamond-shaped masking window 1110 with M=2, the total number of pixels in the masking window 1110 is 13. Therefore, the BIO motion vector (v) of the sub-block is determined. x ,v y The total number of differences Δ required is 208 (=16×13). Finally, the BIO motion vector corresponding to the 4×4 block is estimated by substituting the 208 differences into Equation 2. As in the third embodiment, weights corresponding to the amount of overlap can be assigned to the differences, and the weighted differences can be substituted into Equation 2 to estimate the BIO motion vector of the 4×4 sub-block.
[0153] Fifth Implementation Method
[0154] In the third and fourth embodiments, the masking window is applied to all pixels in the sub-block. Conversely, in this embodiment, the masking window is applied to some pixels in the sub-block.
[0155] Figure 12 This diagram illustrates three types of pixel positions where a masking window is applied within a sub-block. In one type, the positions of pixels with and without masking windows form a grid pattern (see [link to diagram]). Figure 12(a)). In the other two types, pixels form horizontal and vertical stripe patterns, respectively (see (a)). Figure 12 (b) and (c) in the text. (Except for...) Figure 12 In addition to the types shown, this disclosure does not exclude the use of any type that samples and processes only some pixels in a sub-block. Therefore, in the above embodiments, the computational complexity required to calculate the difference between the number of pixels in a sub-block and the number corresponding to the masking window can be reduced.
[0156] In this embodiment, reference will be made to Figure 13 Describes the total number of differences Δ required to determine the BIO motion vector of a sub-block. Figure 13 The example illustrates pixels in a 5×5 square masking window 1310 and a 4×4 sub-block 1320 sampled within a grid pattern. The total number of pixels in the 5×5 square masking window 1310 is 25. The 25 differences Δ in Equation 1 should be estimated by applying the masking window to each of the eight pixels in the sub-block indicated by the shadow. Therefore, the BIO motion vector (v) of the 4×4 sub-block is determined. x ,v y The total number of required differences Δ is 200 (=8×25). Finally, the 200 differences are substituted into Equation 2 to estimate the BIO motion vector corresponding to the 4×4 block. As in the third embodiment, weights corresponding to the amount of overlap can be assigned to the differences, and the weighted differences can be substituted into Equation 2 to estimate the BIO motion vector of the 4×4 sub-block.
[0157] Sixth Implementation Method
[0158] This implementation is a combination of the solutions presented in the fourth and fifth implementations. That is, this implementation uses masking windows with various patterns other than rectangular shapes (similar to the fourth implementation) and applies the masking windows only to some sampled pixels in the sub-block (similar to the fifth implementation). Therefore, compared to the fourth and fifth implementations, the computational complexity of this implementation is lower.
[0159] Figure 14 An example of a diamond-shaped masking window 1410 and sampling pixels in a 4×4 sub-block 1420, illustrating an example of the scheme proposed in this embodiment, is shown. Figure 14 In this case, determine the BIO motion vector (v) of the sub-block. x ,v y The total number of required differences Δ is 104 (=8×13). Finally, the 104 differences are substituted into Equation 2 to estimate the BIO motion vector (v) corresponding to the 4×4 sub-block. x ,v yAs in the third embodiment, weights corresponding to the number of overlaps can be assigned to the differences, and the weighted differences can be substituted into Equation 2 to estimate the BIO motion vector of the 4×4 sub-block.
[0160] Seventh Implementation Method
[0161] In previous embodiments, the number of calculations for each pixel in (all or some) of a sub-block corresponded to the difference Δ of the masking window size. For example, in a third embodiment, the total number of differences required to determine the BIO motion vector of a 4×4 sub-block using a 5×5 masking window was 400 (=16×25). In contrast, this embodiment does not employ a masking window. This embodiment can be considered as using a 1×1 masking window. That is, for each pixel in the sub-block, only one difference Δ from Equation 1 is calculated. For example, the total number of differences Δ considered in estimating the BIO motion vector of a 4×4 sub-block is 16. Finally, only these 16 differences Δ are substituted into Equation 2 to estimate the BIO motion vector of the 4×4 sub-block. That is, the BIO motion vector is calculated to minimize the sum of squares of the 16 differences.
[0162] Alternatively, the BIO motion vector corresponding to the 4×4 sub-block can be estimated by assigning different weights to the 16 differences and substituting the weighted differences into Equation 2. Here, higher weights can be assigned to regions inside the sub-block, and lower weights can be assigned to regions at the edges of the sub-block. Figure 15 An example of assigning weights to each pixel of a sub-block is shown.
[0163] Eighth Implementation Method
[0164] In this embodiment, when determining the BIO motion vector of a sub-block located at the edge of the current block, a constraint is imposed that the difference Δ is not calculated in the region outside the current block. For example, assuming the current block size is 16×16, and the BIO motion vector is calculated for each 4×4 sub-block, such as... Figure 16a As shown. When determining the BIO motion vectors of the 12 4×4 sub-blocks located at the edge of the current block out of the 16 4×4 sub-blocks, the difference Δ of masking pixels in the region outside the current block is not considered. Here, the masking pixels in the region outside the current block can vary depending on the size of the sub-block and the size and position of the masking window. Therefore, in this embodiment, the number of difference Δ to be calculated to determine the BIO motion vectors of the sub-blocks can depend on the position of the respective sub-block in the current block.
[0165] When this scheme is combined with the scheme of the third embodiment for assigning weights to overlapping differences, such as Figure 16b As shown, the weights for each masked pixel are given. That is, in Figure 16bPixels marked with 0 are pixels located outside the current block, and their differences are not calculated. According to this scheme, the number of differences to be calculated is less than in the third embodiment. Therefore, the computational load is reduced, and memory is saved because the values of pixels located outside the current block are not referenced.
[0166] This approach is not limited to using square masking windows; it can even be applied to masking windows of various shapes, including rhombus and plus sign shapes.
[0167] Although exemplary embodiments have been described for illustrative purposes, those skilled in the art will understand that various modifications and variations may be made without departing from the concept and scope of the embodiments. For the sake of brevity and clarity, exemplary embodiments have been described. Therefore, those skilled in the art will understand that the scope of the embodiments is not limited to those described above, but includes the claims and their equivalents.
[0168] Cross-reference to related applications
[0169] This application claims priority to Korean Patent Application No. 10-2017-0052290, filed on April 24, 2017, and Korean Patent Application No. 10-2017-0077246, filed on June 19, 2017, the entire contents of which are incorporated herein by reference.
Claims
1. An apparatus for decoding video data, the apparatus comprising: Memory; as well as One or more processors; The one or more processors are configured to perform the following operations: Determine a first motion vector indicating a first region in a first reference image corresponding to the current block, and a second motion vector indicating a second region in a second reference image corresponding to the current block; The predicted block for the current block is generated by applying bidirectional optical flow (BIO) processing on a sub-block basis; and Reconstruct the current block using the generated prediction block. The step of generating the prediction block includes the following steps: Determine the BIO motion vector of each sub-block constituting the current block; and Based on the determined BIO motion vectors, predictive values are generated for each of the pixels that make up the corresponding sub-block. The BIO motion vector is determined based on the flow difference obtained for pixels within a block surrounding the corresponding sub-block. The flow difference for a given pixel within the block is calculated based on a first point on the first reference image corresponding to the given pixel within the block and a second point on the second reference image corresponding to the given pixel within the block. The size of the sub-block is 4×4.
2. The apparatus of claim 1, wherein, The BIO motion vector is determined to be the vector that minimizes the sum of squares or weighted sum of squares of the flow differences obtained for each pixel within the block surrounding the corresponding sub-block.
3. The apparatus of claim 2, wherein, The flow difference obtained for pixels located in the inner region of the square surrounding the corresponding sub-block is assigned a higher weight; and the flow difference obtained for pixels located in the edge region of the square surrounding the corresponding sub-block is assigned a lower weight.
4. An apparatus for encoding video data, the apparatus comprising: Memory; as well as One or more processors; The one or more processors are configured to perform the following operations: Determine a first motion vector indicating a first region in a first reference image corresponding to the current block, and a second motion vector indicating a second region in a second reference image corresponding to the current block; The predicted block of the current block is generated by applying bidirectional optical flow (BIO) processing based on sub-blocks; The predicted block is used to determine the residual block of the current block; and The first motion vector, the second motion vector, and the residual block of the current block are encoded into a bit stream. The step of generating the prediction block includes the following steps: Determine the BIO motion vector of each sub-block constituting the current block; and Based on the determined BIO motion vectors, predictive values are generated for each of the pixels that make up the corresponding sub-block. The BIO motion vector is determined based on the flow difference obtained for pixels within a block surrounding the corresponding sub-block. The flow difference for a given pixel within the block is calculated based on a first point on the first reference image corresponding to the given pixel within the block and a second point on the second reference image corresponding to the given pixel within the block. The size of the sub-block is 4×4.
5. The apparatus of claim 4, wherein, The BIO motion vector is determined to be the vector that minimizes the sum of squares or weighted sum of squares of the flow differences obtained for each pixel within the block surrounding the corresponding sub-block.
6. The apparatus of claim 5, wherein, The flow difference obtained for pixels located in the inner region of the block surrounding the corresponding sub-block is assigned a higher weight; Furthermore, the weight assigned to the flow difference obtained for pixels located in the edge region of the block surrounding the corresponding sub-block is lower.
7. A method for storing a bitstream of encoded video data, the method comprising the following steps: Video data is encoded into a bitstream by performing encoding processing; as well as The bit stream is stored in a memory; The encoding process includes the following operations: Determine a first motion vector indicating a first region in a first reference image corresponding to the current block, and a second motion vector indicating a second region in a second reference image corresponding to the current block; The predicted block of the current block is generated by applying bidirectional optical flow (BIO) processing based on sub-blocks; The predicted block is used to determine the residual block of the current block; and The first motion vector, the second motion vector, and the residual block of the current block are encoded into a bit stream. The step of generating the prediction block includes the following steps: Determine the BIO motion vector of each sub-block constituting the current block; and Based on the determined BIO motion vectors, predictive values are generated for each of the pixels that make up the corresponding sub-block. The BIO motion vector is determined based on the flow difference obtained for pixels within a block surrounding the corresponding sub-block. The flow difference for a given pixel within the block is calculated based on a first point on the first reference image corresponding to the given pixel within the block and a second point on the second reference image corresponding to the given pixel within the block. The size of the sub-block is 4×4.
8. The method of claim 7, wherein, The BIO motion vector is determined to be the vector that minimizes the sum of squares or weighted sum of squares of the flow differences obtained for each pixel within the block surrounding the corresponding sub-block.
9. The method of claim 8, wherein, The flow difference obtained for pixels located in the inner region of the square surrounding the corresponding sub-block is assigned a higher weight; and the flow difference obtained for pixels located in the edge region of the square surrounding the corresponding sub-block is assigned a lower weight.