Extended motion information comparison
By accessing motion information of neighboring blocks and taking into account interpolation filter information during video encoding and decoding, the problem of low efficiency in comparing motion information in existing technologies is solved, and more efficient video encoding and decoding is achieved.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- INTERDIGITAL CE PATENT HOLDINGS SAS
- Filing Date
- 2020-09-23
- Publication Date
- 2026-06-23
AI Technical Summary
Existing video encoding and decoding technologies struggle to effectively utilize interpolation filter information for motion information comparison when taking advantage of spatial and temporal redundancy in video content, resulting in low encoding efficiency.
By accessing the motion information of neighboring blocks and comparing it with the motion information in the motion candidate list, taking into account interpolation filter information, neighboring blocks are added as candidates to the motion candidate list for encoding or decoding.
It improves the efficiency of video encoding and decoding by reducing redundancy through more accurate motion information comparison, thereby enhancing encoding efficiency and image quality.
Smart Images

Figure CN114503562B_ABST
Abstract
Description
Technical Field
[0001] This embodiment generally involves comparing motion information during video encoding and decoding. Background Technology
[0002] To achieve high compression efficiency, image and video codec schemes typically employ prediction and transform to leverage spatial and temporal redundancy in video content. Generally, intra-frame or inter-frame prediction is used to utilize intra- or inter-frame correlations, followed by transforming, quantizing, and entropy encoding / decoding of the differences between the original and predicted blocks (typically represented as prediction error or prediction residuals). To reconstruct the video, the compressed data is decoded through inverse processing corresponding to entropy encoding / decoding, quantization, transforming, and prediction. Summary of the Invention
[0003] According to one embodiment, a method for video encoding or decoding is provided, comprising: accessing a motion candidate list of a block; accessing motion information of neighboring blocks of the block; comparing the motion information of the neighboring blocks with the motion information of motion candidates in the motion candidate list, wherein the comparison takes into account interpolation filter information; adding the neighboring block as a candidate to the motion candidate list in response to the motion information of the neighboring block being different from the motion information of the motion candidates in the motion candidate list; and encoding or decoding the motion information of the block based on the motion candidate list.
[0004] According to another embodiment, an apparatus for video encoding or decoding is provided, including one or more processors, wherein the one or more processors are configured to: access a motion candidate list of a block; access motion information of neighboring blocks of the block; compare the motion information of the neighboring blocks with motion information of motion candidates in the motion candidate list, wherein the comparison takes into account interpolation filter information; add the neighboring block as a candidate to the motion candidate list in response to the motion information of the neighboring block being different from the motion information of the motion candidates in the motion candidate list; and encode or decode the motion information of the block based on the motion candidate list.
[0005] One or more embodiments also provide a computer program including instructions that, when executed by one or more processors, cause the one or more processors to perform an encoding or decoding method according to any of the above embodiments. One or more embodiments of this invention also provide a computer-readable storage medium storing instructions for encoding or decoding video data according to the above methods. One or more embodiments also provide a computer-readable storage medium storing a bitstream generated according to the above methods. One or more embodiments also provide a method and apparatus for transmitting or receiving a bitstream generated according to the above methods. Attached Figure Description
[0006] Figure 1 A block diagram of a system in which various aspects of this embodiment can be implemented is shown.
[0007] Figure 2 A block diagram of an embodiment of a video encoder is shown.
[0008] Figure 3 A block diagram of an embodiment of a video decoder is shown.
[0009] Figure 4 Interpolation filters with different smoothing properties are shown.
[0010] Figure 5 The derivation of the interpolation filter based on the values of IFindex and MV is shown.
[0011] Figure 6 The locations of spatial and temporal predictors are shown.
[0012] Figure 7 The process of generating the merge list is shown. Detailed Implementation
[0013] Figure 1A block diagram illustrating examples of systems in which various aspects and embodiments can be implemented is shown. System 100 can be implemented as a device including the various components described below and configured to perform one or more of the aspects described in this application. Examples of such devices include, but are not limited to, various electronic devices such as personal computers, laptop computers, smartphones, tablet computers, digital multimedia set-top boxes, digital television receivers, personal video recording systems, connected home appliances, and servers. Elements of system 100 can be implemented individually or in combination in a single integrated circuit, multiple ICs, and / or discrete components. For example, in at least one embodiment, the processing and encoder / decoder elements of system 100 are distributed across multiple ICs and / or discrete components. In various embodiments, system 100 is communicatively coupled to one or more other systems or other electronic devices via, for example, a communication bus or through dedicated input and / or output ports. In various embodiments, system 100 is configured to implement one or more of the aspects described in this application.
[0014] System 100 includes at least one processor 110 configured to execute instructions loaded therein for implementing various aspects described herein, such as those in the present application. Processor 110 may include embedded memory, input / output interfaces, and various other circuitry known in the art. System 100 includes at least one memory 120 (e.g., volatile and / or non-volatile memory devices). System 100 includes a storage device 140, which may include non-volatile and / or volatile memory, including but not limited to EEPROM, ROM, PROM, RAM, DRAM, SRAM, flash memory, disk drives, and / or optical disk drives. As a non-limiting example, storage device 140 may include internal storage devices, additional storage devices, and / or network-accessible storage devices.
[0015] System 100 includes an encoder / decoder module 130 configured to, for example, process data to provide encoded or decoded video, and the encoder / decoder module 130 may include its own processor and memory. The encoder / decoder module 130 represents multiple modules that can be included in a device to perform encoding and / or decoding functions. As is known, a device may include one or both encoding and decoding modules. Furthermore, the encoder / decoder module 130 may be implemented as a separate element of system 100, or may be incorporated within processor 110 as a combination of hardware and software known to those skilled in the art.
[0016] Program code to be loaded onto processor 110 or encoder / decoder 130 to execute the various aspects described in this application may be stored in storage device 140 and subsequently loaded onto memory 120 for execution by processor 110. According to various embodiments, one or more of processor 110, memory 120, storage device 140, and encoder / decoder module 130 may store one or more of various items during the execution of the processing described in this application. Such stored items may include, but are not limited to, input video, decoded video or portions of decoded video, bitstreams, matrices, variables, and intermediate or final results from the processing of equations, formulas, operations, and operational logic.
[0017] In several embodiments, the memory within processor 110 and / or encoder / decoder module 130 is used to store instructions and provide working memory for processing required during encoding or decoding. However, in other embodiments, external memory (e.g., processor 110 or encoder / decoder module 130) may be used for one or more of these functions. External memory may be memory 120 and / or storage device 140, such as volatile memory and / or non-volatile flash memory. In several embodiments, external non-volatile flash memory is used to store the television's operating system. In at least one embodiment, fast external volatile memory, such as RAM, is used as working memory for video encoding / decoding operations, such as for MPEG-2, HEVC, or VVC.
[0018] As shown in block 105, input to the components of system 100 can be provided through various input devices. Such input devices include, but are not limited to: (i) an RF section that receives RF signals transmitted over the air, for example by a radio station, (ii) a composite input terminal, (iii) a USB input terminal, and / or (iv) an HDMI input terminal.
[0019] In various embodiments, the input device of block 105 has associated corresponding input processing elements known in the art. For example, the RF section may be associated with elements suitable for: (i) selecting a desired frequency (also known as selecting a signal, or limiting the signal band to a band), (ii) down-converting the selected signal, (iii) further band-limiting to a narrower band to select, for example, a signal band (which may be referred to as a channel in some embodiments), (iv) demodulating the down-converted and band-limited signal, (v) performing error correction, and (vi) demultiplexing to select a stream of desired data packets. The RF section in various embodiments includes one or more elements performing these functions, such as frequency selectors, signal selectors, band limiters, channel selectors, filters, downconverters, demodulators, error correctors, and demultiplexers. The RF section may include tuners performing various of these functions, including, for example, down-converting the received signal to a lower frequency (e.g., intermediate frequency or near-baseband frequency) or baseband. In one set-top box embodiment, the RF section and its associated input processing elements receive RF signals transmitted via a wired (e.g., cable) medium and perform frequency selection by filtering, down-converting, and re-filtering to a desired frequency band. Various embodiments rearrange the order of the above (and other) elements, remove some of these elements, and / or add other elements that perform similar or different functions. Adding elements may include inserting elements between existing elements, such as inserting amplifiers and analog-to-digital converters. In various embodiments, the RF section includes an antenna.
[0020] Furthermore, USB and / or HDMI terminals may include corresponding interface processors for connecting system 100 to other electronic devices via USB and / or HDMI connections. It should be understood that various aspects of input processing, such as Reed-Solomon error correction, can be implemented as needed, for example, within a separate input processing IC or processor 110. Similarly, various aspects of USB or HDMI interface processing can be implemented as needed, within a separate interface IC or processor 110. The demodulated, error-corrected, and demultiplexed stream is provided to various processing elements, including, for example, processor 110 and encoder / decoder 130, which operate in combination with memory and storage elements to process the data stream as needed for presentation on an output device.
[0021] Various components of system 100 can be provided within an integrated housing. Within the integrated housing, various components can be interconnected and transmit data therebetween using a suitable connection arrangement 115 (e.g., internal buses known in the art, including I2C buses, wiring, and printed circuit boards).
[0022] System 100 includes a communication interface 150 that enables communication with other devices via a communication channel 190. The communication interface 150 may include, but is not limited to, a transceiver configured to send and receive data via the communication channel 190. The communication interface 150 may include, but is not limited to, a modem or network interface card (NIC), and the communication channel 190 may be implemented, for example, within a wired and / or wireless medium.
[0023] In various embodiments, data is streamed to system 100 using a wireless network such as IEEE 802.11. In these embodiments, Wi-Fi signals are received via a communication channel 190 and a communication interface 150 suitable for Wi-Fi communication. The communication channel 190 in these embodiments is typically connected to an access point or router that provides access to external networks, including the Internet, to allow streaming applications and other over-the-top communication. Other embodiments use a set-top box to provide streaming data to system 100, with the set-top box transmitting data via an HDMI connection to input block 105. Still other embodiments use an RF connection to input block 105 to provide streaming data to system 100.
[0024] System 100 can provide output signals to various output devices, including a display 165, a speaker 175, and other peripheral devices 185. In various examples of embodiments, other peripheral devices 185 include one or more of a standalone DVR, disc player, stereo system, lighting system, and other devices that provide functionality based on the output of system 100. In various embodiments, signaling is used to communicate control signals between system 100 and the display 165, speaker 175, or other peripheral devices 185 using signaling that enables device-to-device control with or without user intervention, such as AV.Link, CEC, or other communication protocols. Output devices can be communicatively coupled to system 100 via dedicated connections through corresponding interfaces 160, 170, and 180. Alternatively, output devices can be connected to system 100 via communication interface 150 using communication channel 190. In electronic devices (e.g., televisions), the display 165 and speaker 175 can be integrated into a single unit with other components of system 100. In various embodiments, display interface 160 includes a display driver, such as a timing controller (TCon) chip.
[0025] Display 165 and speaker 175 may alternatively be separated from one or more other components, for example, if the RF portion of input 105 is part of a separate set-top box. In various embodiments where display 165 and speaker 175 are external components, output signals may be provided via dedicated output connections including, for example, HDMI ports, USB ports, or COMP outputs.
[0026] Figure 2 An example video encoder 200, such as the High Efficiency Video Codec (HEVC) encoder, is shown. Figure 2 Encoders that improve upon the HEVC standard or employ similar HEVC technology can also be shown, such as the VVC (Various Video Codec) encoder developed by JVET (Joint Video Exploration Group).
[0027] In this application, the terms "reconstruction" and "decoding" are used interchangeably, as are the terms "encoded" and "coded," and the terms "image," "picture," and "frame" are used interchangeably. Typically, but not necessarily, the term "reconstruction" is used on the encoder side, while "decoding" is used on the decoder side.
[0028] Before being encoded, the video sequence may undergo pre-coding (201), such as applying color transformations to the input color picture (e.g., converting from RGB 4:4:4 to YCbCr 4:2:0), or performing remapping of the input picture components to obtain a signal distribution that is more resilient to compression (e.g., using histogram equalization with one of the color components). Metadata may be associated with pre-processing and appended to the bitstream.
[0029] To encode a video sequence having one or more frames, the frames are divided (202) into, for example, one or more slices, where each slice may include one or more slice segments. In HEVC, slice segments are organized into encoding / decoding units, prediction units, and transform units. The HEVC specification distinguishes between “blocks” and “units,” where a “block” addresses a specific region (e.g., luma Y) in a sampling array, and a “unit” comprises a co-located block of all encoded color components (Y, Cb, Cr, or monochromatic), syntax elements, and prediction data (e.g., motion vectors) associated with the block.
[0030] To encode and decode according to HEVC, the image is divided into square codec tree blocks (CTBs) of configurable size (typically 64×64, 128×128, or 256×256 pixels), and consecutive sets of CTBs are grouped into stripes. A codec tree unit (CTU) contains the CTBs of the encoded color components. A CTB is the root of a quadtree segmented into codec blocks (CBs), which can be further segmented into one or more prediction blocks (PBs), forming the root of a quadtree segmented into transform blocks (TBs). Transform blocks (TBs) larger than 4x4 are divided into 4x4 sub-blocks of quantization coefficients called coefficient groups (CGs). Corresponding to the codec blocks, prediction blocks, and transform blocks, a codec unit (CU) comprises a tree structure set of prediction units (PUs) and transform units (TUs). PUs contain prediction information for all color components, while TUs contain the residual codec syntax structure for each color component. The sizes of the CBs, PBs, and TBs for the luma component are appropriate for the corresponding CUs, PUs, and TUs. In this application, the term "block" may be used to refer to, for example, any one of CTU, CU, PU, TU, CG, CB, PB, and TB. Furthermore, the term "block" may also refer to macroblocks and segments specified in H.264 / AVC or other video codec standards, and more generally, to data arrays of various sizes.
[0031] In encoder 200, the frame is encoded by encoder elements as described below. The frame to be encoded is processed in units, for example, CUs. Each encoding / decoding unit is encoded using an intra-frame or inter-frame mode. When an encoding / decoding unit is encoded in intra-frame mode, it performs intra-frame prediction (260). In inter-frame mode, motion estimation (275) and compensation (270) are performed. The construction of the merge list (regular merge, MMVD (merge with MVD), triangle, CIIP (combined intra / inter-frame prediction), IBC (intra-block copy)) is performed by inheritance from or construction (pairs and zeros) of the neighboring CUs (spatial), co-located CUs (temporal), or HMVP (history-based motion vector prediction) list. The encoder determines (205) which of the intra-frame or inter-frame modes will be used to encode the encoding / decoding unit and indicates the intra / inter-frame decision by a prediction mode flag. The prediction residual is calculated by subtracting (210) the predicted block from the original image block.
[0032] The predicted residuals are then transformed (225) and quantized (230). The quantized transform coefficients, along with the motion vectors and other syntax elements, are entropy encoded (245) to output a bitstream. As a non-limiting example, context-based adaptive binary arithmetic encoding and decoding (CABAC) can be used to encode syntax elements into a bitstream.
[0033] Encoders can also skip the transform and apply quantization directly to the untransformed residual signal, for example, on a 4×4 TU basis. Encoders can also bypass transform and quantization, i.e., directly encode and decode the residual without applying a transform or quantization process. In direct PCM encoding and decoding, no prediction is applied, and the encoder-decoder unit samples are directly encoded and decoded into the bitstream.
[0034] The encoder decodes the encoded block to provide a reference for further prediction. The quantized transform coefficients are dequantized (240) and inverse transformed (250) to decode the prediction residual. The decoded prediction residual and the predicted block are combined (255) to reconstruct the image block. An in-loop filter (265) is applied to the reconstructed image, for example, to perform deblocking / SAO (sample adaptive offset) filtering to reduce the encoding of artifacts. The filtered image is stored at the reference image buffer (280).
[0035] Figure 3 A block diagram of an example video decoder 300 (such as an HEVC decoder) is shown. In decoder 300, the bitstream is decoded by decoder elements as described below. Video decoder 300 typically performs the same operations as... Figure 2 The encoding pass described herein is the reverse of the decoding pass, and the video decoder 300 performs video decoding as part of the encoded video data. Figure 3 It can also show decoders that improve upon the HEVC standard or decoders that employ similar HEVC technology, such as VVC decoders.
[0036] Specifically, the decoder's input includes a video bitstream, which can be generated by the video encoder 200. The bitstream is entropy-decoded (330) to obtain transform coefficients, motion vectors, frame segmentation information, and other encoded / decoded information. If CABAC is used for entropy encoding / decoding, the context model is initialized in the same manner as the encoder's context model, and syntax elements are decoded from the bitstream based on the context model. The construction of merge lists (regular merge, MMVD, triangle, CIIP, IBC) is performed by inheritance from neighboring CUs (spatial), co-located CUs (temporal), or by construction (pairwise and zero) from the HMVP list (325).
[0037] The screen segmentation information indicates how the screen is segmented (e.g., the size of the CTU) and how the CTU is divided into CUs (and, where applicable, possibly PUs). Therefore, the decoder can segment (335) the screen into, for example, CTUs, and each CTU into CUs based on the decoded screen segmentation information. The transform coefficients are dequantized (340) and inverse transformed (350) to decode the prediction residuals.
[0038] The decoded prediction residual and the predicted block are combined (355) to reconstruct an image block. The predicted block can be obtained (370) based on intra-frame prediction (360) or motion-compensated prediction (i.e., inter-frame prediction) (375). An in-loop filter (365) is applied to the reconstructed image. The filtered image is stored at a reference frame buffer (380).
[0039] The decoded image can undergo further post-decoding processing (385), such as inverse color transformation (e.g., conversion from YCbCr4:2:0 to RGB 4:4:4) or inverse remapping, which is the opposite of the remapping process performed in pre-encoding processing (201). Post-decoding processing can use metadata derived in pre-encoding processing and signaled in the bitstream.
[0040] Switchable interpolation filter (IF)
[0041] The principle of switchable interpolation filters (IFs) is to improve motion-compensated prediction by selecting the IF used for prediction of each block. The IF may vary depending on the smoothing characteristics, typically, such as... Figure 4 As shown, Figure 4 An example of the interpolation filter proposed in JVET-O0057 is shown (see “CE4: Switchable interpolation filter” in the paper JVET-O0057 presented by A. Henkel et al. at the 15th meeting held in Gothenburg, Sweden, July 3-12, 2019).
[0042] For example, in VVC Draft 6 (see “Text of Versatile Video Coding (Draft 6)” in document JVET-O2001, presented at the 15th meeting in Gothenburg, Sweden, July 3-12, 2019) and JVET-O0057, the IF index can be selected by codec unit (CU) and can be derived from the encoded “imv” index, which indicates the resolution of the encoded motion vector difference (MVD). Specifically, if the MVD resolution is HALF-PEL, then IFindex = 1 is selected; otherwise, IFindex = 0 is selected. When the codec unit is in merge mode, the IF index is not explicitly encoded but derived from (multiple) merge candidates.
[0043] In VVC draft 6 and JVET-O0057, the IFindex value can be one of two filters (IF-0 or IF-1), but IF-1 can only be used for HALF-PEL motion vector values. Then, if the IFindex is not equal to zero and the MV horizontal (or vertical) component is not HALF-PEL, IF-0 is used, such as... Figure 5 As shown, this demonstrates that the values of "IFindex" and "MV" allow for the derivation of motion compensation filters.
[0044] For simplicity, we will consider N = 2 IF filters (IF-0 and IF-1) in the following text. However, this principle can be applied to the case where N > 2 (IF-0, ..., IF-(N-1)). In this case, we can distinguish between IFindex = 0 and IFindex ≠ 0, which correspond to IF-0 and IF-1 below. Furthermore, filter IF-0 (IF-default) is considered the "default filter". Note that this principle can also be applied when interpolation filters are applied to motion vectors that are not HALF-PEL.
[0045] Bidirectional prediction with CU-level weighting (also known as GBI, BPWA, or BCW)
[0046] In traditional bidirectional prediction, prediction sampling (biPred[x]) is achieved by applying motion-compensated unidirectional prediction sampling (ref) to two equally weighted (w0=1, w1=1) samples. i [x+mv i The averaging of i = 0, 1) is used to establish the following:
[0047] biPred[x]=(w0.ref0[x+mv0]+w1.ref1[x+mv1]+1) / 2
[0048] In the case of generalized bidirectional prediction (also known as GBI, BPWA, or BCW), the weights (w0, w1) are not necessarily equal and are signaled in the bitstream (or inherited in merge mode) as indices in the lookup table (GBiIdx).
[0049] Among several candidate codec patterns (e.g., merge patterns such as regular merge, MMVD, triangle, CIIP, IBC) inherited from neighboring codec patterns (spatial, temporal, or build-based), some pruning is performed to limit redundancy in the build list. This pruning is performed by comparing sets of motion information to distinguish sets with different motions.
[0050] However, with the newly adopted switchable interpolation filter, "IFindex" becomes part of the motion information (because it can modify the resulting prediction), but it is not considered in the current pruning. Furthermore, GBiIdx can also be part of the motion information and can be considered in the pruning.
[0051] Sports Information
[0052] In VVC Draft 6, for the purpose of inheritance, the set of motion information stored in a 4x4 spatial hierarchy includes:
[0053] Table 1: Motion information with associated sizes
[0054]
[0055]
[0056] In all codec modes except for HMVP updates and loads, the GBi index is stored only at the CU level, not in the motion information at the 4x4 level. In VTM-6.0 (VVC test model 6.0), the GBiIdx stored in this motion information is only used by the HMVP list (updates and loads); other modes use the GBiIdx stored at the CU level.
[0057] The HMVP uses the previously used prediction FIFO list, for which it requires all motion information (as well as GBiIdx). Note that GBiIdx is CU-level information as Motion Information (MI), but MI is stored on a 4x4 basis (i.e., a 16x16 CU holds one GBiIdx and 16 MIs), and the GBiIdx in the MI is only used by the HMVP, which stores MI information, not CU information. Each time a CU is inter-coded, its motion information is added to the HMVP list. Each time a merge list is built, some candidates are selected into the HMVP list (from the end to the beginning, i.e., from oldest to newest). In the HMVP list, MI is decorrelated with its corresponding CU, therefore MI needs to contain GBiIdx. In all other inter-coded modes, MI is always linked to the corresponding CU holding GBiIdx (therefore, it is not needed in the MI).
[0058] Comparison of sports information
[0059] In VVC, motion information comparison is performed in two different ways: one when the motion information is fully available, and the other when only some of the motion information is accessible.
[0060] The comparison of complete motion information of spatial neighbors (stored in memory at a 4x4 level) is currently used for: (i) all merging modes (regular merging, MMVD, triangle, IBC, CIIP), (ii) HMVP list updates, and (iii) motion compensation of sub-block CUs.
[0061] The purpose of this comparison is to check whether the MI has the same motion vectors and associated reference frame as described in the VVC draft text. This requires the MI to be an inter-frame MI (isInter) from the same slice (sliceIdx) in the same mode (isIBC). Below is an excerpt from the draft, which describes the derivation of spatial candidates for the regular merge list. It focuses on the top neighbor (B1) after the left neighbor (A1) has been derived:
[0062] Excerpt from VVC Draft 6: 8.5.2.3 The process for deriving spatial merging candidates.
[0063] (Comparisons of complete motion information are indicated by underlining)
[0064] The variables availableFlagB1, refIdxLXB1, predFlagLXB1, and mvLXB1 are derived as follows:
[0065] – If one or more of the following conditions are true, then availableFlagB1 is set to 0, both components of mvLXB1 are set to 0, refIdxLXB1 is set to -1, and predFlagLXB1 is set to 0, where X is 0 or 1, and bcwIdxB1 is set to 0:
[0066] –availableB1 is false.
[0067] – availableA1 equals true, and the luminance positions (xNbA1, yNbA1) and (xNbB1, yNbB1) have the same motion. The moving vector and the same reference index.
[0068] Otherwise, availableFlagB1 is set to equal to 1, and the following...
[0069] The comparison of complete motion information is shown in Table 2.
[0070] Table 2: Parameters compared during the comparison of complete motion information in VVC Draft 6
[0071] Parameter name Parameter size Compare isInter 1b yes isIBC 1b yes sliceIdx 16b yes interDir 8b yes IFindex 1b no GBiIdx 8b no Mv[2] 32 * 2 (directions) * 2 (list) = 128b yes refIdx[2] 16 * 2 (list) = 32b yes
[0072] The comparison of partial motion information used only when adding HMVP candidates to the regular merge list compares only a portion of the motion information, as shown in Table 3.
[0073] Table 3: Parameters compared during the motion information comparison section in VVC Draft 6
[0074] Parameter name Parameter size Compare interDir 8b yes IFindex 1b no GBiIdx 8b no Mv[2] 32 * 2 (directions) * 2 (list) = 128b yes refIdx[2] 16 * 2 (list) = 32b yes
[0075] This embodiment proposes extending the comparison of motion information, for example, to check whether the same prediction will be generated (not just the motion vector and reference frame). To this end, some parameters can be added to the motion information comparison. This may make it more robust and potentially simpler in terms of implementation, especially hardware implementation.
[0076] First Embodiment
[0077] In this embodiment, a method is proposed to compare all available parameters so that only a simple memory comparison is needed, i.e., a single comparison of the complete structure, rather than a large number of structure accesses and comparisons.
[0078] "IFindex" and "GBiIdx" can modify the predicted motion information even if the motion vector and the reference frame are the same. Therefore, these parameters can be added to the comparison.
[0079] In this case, Table 2 and Table 3 become Table 4 and Table 5, respectively.
[0080] Table 4: All parameters compared during the comparison of complete motion information
[0081] Parameter name Parameter size Compare isInter 1b yes isIBC 1b yes sliceIdx 16b yes interDir 8b yes IFindex 1b yes GBiIdx 8b yes Mv[2] 32 * 2 (directions) * 2 (list) = 128b yes refIdx[2] 16 * 2 (list) = 32b yes
[0082] Table 5: All parameters compared during the comparison of partial motion information
[0083] Parameter name Parameter size Compare interDir 8b yes IFindex 1b yes GBiIdx 8b yes Mv[2] 32 * 2 (directions) * 2 (list) = 128b yes refIdx[2] 16 * 2 (list) = 32b yes
[0084] Therefore, the draft text becomes:
[0085] Excerpt from the VVC draft text: 8.5.2.3 The process for deriving spatial merging candidates.
[0086] The variables availableFlagB1, refIdxLXB1, predFlagLXB1, and mvLXB1 are derived as follows:
[0087] – If one or more of the following conditions are true, then availableFlagB1 is set to 0, both components of mvLXB1 are set to 0, refIdxLXB1 is set to -1, and predFlagLXB1 is set to 0, where X is 0 or 1, and bcwIdxB1 is set to 0:
[0088] –availableB1 is false.
[0089] – availableA1 equals true, and the luminance positions (xNbA1, yNbA1) and (xNbB1, yNbB1) have the same motion. The moving vector, and the same reference index, the same half-sample interpolation filter index, and the same bidirectional prediction weight index.
[0090] Otherwise, availableFlagB1 is set to equal to 1, and the following...
[0091] According to the modified VVC draft described above, instead of just comparing motion vectors and reference frames, all motion information is compared, requiring inter-frame MIs (isInter) from the same slice (sliceIdx) within the same mode (isIBC). The MI from the top candidate B1 is compared with the left candidate A1. If the MI from A1 equals the MI from B1, then B1 is set to be unusable for merge list construction; that is, B1 is considered redundant relative to A1 and is pruned from the merge list.
[0092] In the variant, it is possible to store GBiIdx in the motion information at the 4x4 level instead of the CU level for all inter-frame coding / decoding modes (not just for the HMVP list) to improve the performance of this comparison. As mentioned above, in VVC draft 6, GBiIdx is stored at the CU level instead of the MI (4x4) level. However, by storing it at the MI level, it becomes part of the information that can be retrieved by spatial prediction in merge mode (because it retrieves the MI).
[0093] In another variant, this embodiment can be extended to any new parameter that must be added to the motion information, as it modifies the resulting prediction from that motion information, and it must be inherited by subsequent CUs.
[0094] Second Embodiment
[0095] In this embodiment, it is proposed to add only IFindex to the comparison because GBiIdx is not used everywhere.
[0096] Then, Tables 2 and 3 become Tables 6 and 7, respectively.
[0097] Table 6: Parameters compared using complete motion information with added IFindex
[0098]
[0099] Table 7: Parameters for comparison using partial motion information with added IFindex
[0100]
[0101] Therefore, the draft text becomes:
[0102] Excerpt from the VVC draft text: 8.5.2.3 The process for deriving spatial merging candidates.
[0103] The variables availableFlagB1, refIdxLXB1, predFlagLXB1, and mvLXB1 are derived as follows:
[0104] – If one or more of the following conditions are true, then availableFlagB1 is set to 0, both components of mvLXB1 are set to 0, refIdxLXB1 is set to -1, and predFlagLXB1 is set to 0, where X is 0 or 1, and bcwIdxB1 is set to 0:
[0105] –availableB1 is false.
[0106] – availableA1 equals true, and the luminance positions (xNbA1, yNbA1) and (xNbB1, yNbB1) have the same motion. The moving vector, the same reference index, and the same half-sample interpolation filter index.
[0107] Otherwise, availableFlagB1 is set to equal to 1, and the following...
[0108] Based on the revised text above, instead of simply comparing motion vectors and reference frames (which require inter-frame MIs (isInter) from the same slice (sliceIdx) within the same mode (isIBC), the IFindex is also compared. The MI from the top candidate B1 is compared with the left candidate A1. If the MI of A1 is the same as the MI of B1, then B1 is set to be unavailable for merge list construction.
[0109] Since the GBiIdx of motion information is only used by the HMVP list (updating and loading), these indices are set to default values in all other cases (inter-frame codec modes) that do not affect the results of a full comparison. A full comparison only affects the HMVP list, but the impact is very limited.
[0110] In the variant, even if the HMVP list update is slightly affected, a single full comparison can be used to compare complete motion information.
[0111] In another variant, it is possible to store and load GBiIdx from the HMVP list in a manner other than through motion information, thereby completely removing GBiIdx from the motion information and thus enabling full comparison.
[0112] Third Embodiment
[0113] In this embodiment, all combinations of parameter comparisons are proposed to be considered for each process, as shown in Table 8. When using motion information comparisons (such as when generating an HMVP list), it is possible to use only the full motion information comparison method.
[0114] Table 8: Comparison of parameters added in each comparison
[0115]
[0116] In VTM-6.0, the merged list (motion candidate list) is constructed by including the following types of predictors (also known as candidates):
[0117] 1) Spatial MVP (Motion Vector Predictor) from Spatial Neighbor CU;
[0118] 2) Time MVP from the same CU;
[0119] 3) History-based MVP from FIFO table;
[0120] 4) Paired average MVP; and
[0121] 5) Zero MVP.
[0122] For each CU encoded and decoded in merge mode, the index of the selected predictor is encoded. The generation process for merge candidates for each category is as follows: Figure 7 It is shown in the figure and described below.
[0123] Spatial candidate export
[0124] The derivation of space merge candidates in VVC Draft 6 is the same as in HEVC. (The last part, "located in...", appears to be a fragment and doesn't translate directly. It's unclear what it refers to.) Figure 6 A maximum of four candidates are selected from the candidates at the indicated positions for merging. The derived order is A1, B1, B0, A0, and B2. After the candidate at position A1 is added (710, 715), the addition of the remaining candidates (720, 727, 730, 737, 740, 747, 755, 765) undergoes redundancy checking (725, 735, 745, 760), which ensures that candidates with the same motion information are excluded from the list, thereby improving encoding and decoding efficiency. To reduce computational complexity, not all possible candidate pairs are considered in the mentioned redundancy checking. Instead, only some pairs (725, 735, 745, 760) are considered, and a candidate is added to the list only if the corresponding candidate used for redundancy checking does not have the same motion information. For example, in step 725, the motion information (MI) of B1 is... B1 ) and A1 motion information (MI) A1The candidates at position B1 are compared, and the candidate at position B1 is added to the merge list only if the candidate at position B1 is different from the candidate at position A1. When there are fewer than four spatial candidates (750) in the list, position B2 (755, 760, 765) is considered.
[0125] Time candidate export ("TMVP C0 / C1")
[0126] Only one time candidate is added (770) to the merge list. Specifically, when exporting this time merge candidate, the scaled motion vector is exported based on the co-location CU belonging to the co-location reference frame. The position of the time candidate is selected between candidates C0 and C1, as follows: Figure 6 As shown. If the CU at position C0 is unavailable, is intra-frame encoded, or is outside the current line of the CTU, then position C1 is used. Otherwise, position C0 is used in the export of the time merge candidate.
[0127] Historical Merge Candidate Derivation
[0128] Following the spatial MVP and TMVP, historical MVP (HMVP) merge candidates are added (780) to the merge list. To use the HMVP, motion information from previously encoded / decoded blocks stored in the table is used as the MVP for the current CU. During the encoding / decoding process, a table with multiple HMVP candidates is maintained. The table is reset (cleared) when a new CTU row is encountered. Whenever a non-sub-block inter-frame encoded / decoded CU exists, the associated motion information is added as a new HMVP candidate to the last entry in the table, and redundant motion information is removed if it exists; otherwise, the first entry in the table (the oldest) is removed.
[0129] HMVP candidates are used in the candidate list construction process. The latest HMVP candidates in the table are checked sequentially and inserted into the candidate list after the TMVP candidates. Redundancy checks are applied from HMVP candidates to a spatial candidate for merging. Each of the last two HMVP candidates (the latest candidates) in the FIFO is compared with spatial candidates A1 and B1, and is added to the merging list only if they do not have the same motion information. Once the total number of available merging candidates reaches the maximum allowed number of merging candidates minus one, the candidate list construction process from HMVP is terminated.
[0130] Pairwise average merge candidate derivation
[0131] The pairwise average candidate is generated by averaging the first candidate pair in the existing merge candidate list. When the merge list is not full after the pairwise average merge candidate is added (790), the zero MVP is inserted to the end (795) until the maximum number of merge candidates is reached.
[0132] Different motion comparison methods as described above can be used during redundancy checks (e.g., 725, 735, 745, 760). Motion comparison methods can also be applied when different types of motion candidates are used to construct the merge list, or when motion candidates are added to the merge list in a different order.
[0133] On the decoder side, the same process used to construct the merge list is employed. Based on the index of the selected predictor from the bitstream decoding, the motion vector predictor from the merge list is selected as the motion vector predictor for the current CU.
[0134] This document describes various methods, and each method includes one or more steps or actions for implementing the method. Unless the correct operation of the method requires a specific order of steps or actions, the order and / or use of specific steps and / or actions can be modified or combined. Furthermore, terms such as "first," "second," etc., can be used in various embodiments to modify elements, components, steps, operations, etc., e.g., "first decoding" and "second decoding." Unless specifically required, the use of these terms does not imply an ordering of the modified operations. Therefore, in this example, the first decoding does not need to be performed before the second decoding and can occur, for example, before, during, or within a time period overlapping with the second decoding.
[0135] The various methods and other aspects described in this application can be used to modify the module, for example, such as Figure 2 and Figure 3 The video encoder 200 and decoder 300 shown include modules for deriving encoding / decoding parameters (203, 325) and motion compensation modules (270, 375). Furthermore, this aspect is not limited to VVC or HEVC and can be applied to, for example, other standards and recommendations, and any extensions of such standards and recommendations. Unless otherwise indicated or technically excluded, the aspects described in this application can be used individually or in combination.
[0136] Various numerical values are used in this application. These specific values are for illustrative purposes, and the aspects described are not limited to these specific values.
[0137] One embodiment provides a computer program including instructions that, when executed by one or more processors, cause the one or more processors to perform an encoding or decoding method according to any of the above embodiments. One or more embodiments also provide a computer-readable storage medium storing instructions for encoding or decoding video data according to the above methods. One or more embodiments also provide a computer-readable storage medium storing a bitstream generated according to the above methods. One or more embodiments also provide a method and apparatus for transmitting or receiving a bitstream generated according to the above methods.
[0138] Various implementations involve decoding. As used herein, "decoding" can encompass, for example, all or part of the processing performed on a received encoded sequence to produce a final output suitable for display. In various embodiments, such processing includes one or more of the processing typically performed by a decoder, such as entropy decoding, inverse quantization, inverse transform, and differential decoding. It will be clear, and is considered fully understood by those skilled in the art, that the phrase "decoding processing" is intended to specifically refer to a subset of operations or to refer generally to broader decoding processing, based on the context of the specific description.
[0139] Various implementations involve encoding. In a manner similar to the discussion above regarding "decoding," the "encoding" used in this application may include, for example, all or part of the processing performed on the input video sequence to produce an encoded bitstream.
[0140] Note that the grammatical elements used in this article are descriptive terms. Therefore, the use of other grammatical element names is not excluded.
[0141] The implementations and aspects described herein can be implemented, for example, as methods or processes, apparatuses, software programs, data streams, or signals. Even if discussed only in the context of a single form of implementation (e.g., discussed only as a method), the features under discussion can also be implemented in other forms (e.g., apparatuses or programs). Apparatuses can be implemented, for example, with appropriate hardware, software, and firmware. The methods can be implemented, for example, in an apparatus or processor, which generally refers to a processing device, including, for example, a computer, microprocessor, integrated circuit, or programmable logic device. Processors also include communication devices, such as computers, mobile phones, portable / personal digital assistants (“PDAs”), and other devices that facilitate information communication between end users.
[0142] References to “an embodiment” or “an embodiment” or “an implementation” or “an implementation”, and other variations thereof, mean that a particular feature, structure, characteristic, etc., described in connection with the embodiment is included in at least one embodiment. Therefore, the phrases “in an embodiment” or “in an embodiment” or “in an implementation” or “in an implementation” appearing throughout this application, and any other variations thereof, do not necessarily all refer to the same embodiment.
[0143] Furthermore, this application may refer to "determining" each piece of information. Determining information may include, for example, one or more of the following: estimated information, calculated information, predicted information, or information retrieved from memory.
[0144] Furthermore, this application may refer to "accessing" various pieces of information. Accessing information may include, for example, receiving information, retrieving information (e.g., from memory), storing information, moving information, copying information, calculating information, determining information, predicting information, or estimating information, or one or more of these.
[0145] Furthermore, this application may refer to "receiving" various pieces of information. Like "access," "receiving" is intended as a broad term. Receiving information may include, for example, accessing information, or retrieving information (e.g., from memory) one or more of them. Moreover, "receiving" is generally referred to in one or another during operations such as storing information, processing information, sending information, moving information, copying information, erasing information, calculating information, determining information, predicting information, or estimating information.
[0146] It should be understood that, for example, in the cases of “A / B,” “A and / or B,” and “at least one of A and B,” the use of any of the following “ / ,” “and / or,” and “at least one of…” is intended to include selecting only the first listed option (A), or only the second listed option (B), or both options (A and B). As another example, in the cases of “A, B, and / or C” and “at least one of A, B, and C,” this wording is intended to include selecting only the first listed option (A), or only the second listed option (B), or only the third listed option (C), or only the first and second listed options (A and B), or only the first and third listed options (A and C), or only the second and third listed options (B and C), or all three options (A, B, and C). As will be apparent to those skilled in the art and related fields, this can be extended to as many items as possible listed.
[0147] It will be apparent to those skilled in the art that various implementations can generate a variety of signals that are formatted to carry information, for example, that can be stored or transmitted. This information may include, for example, instructions for performing a method, or data generated by one of the implementations. For example, the signal may be formatted to carry a bitstream of the embodiments. Such a signal may be formatted as, for example, electromagnetic waves (e.g., a radio frequency portion using a frequency) or baseband signals. Formatting may include, for example, encoding the data stream and modulating a carrier wave with the encoded data stream. The information carried by the signal may be, for example, analog or digital information. As is known, the signal can be transmitted via a variety of different wired or wireless links. The signal may be stored on a processor-readable medium.
Claims
1. A method for video encoding or decoding, comprising: Access block's motion candidate list; Access the motion information of neighboring blocks of the block, wherein the motion information includes at least interpolation filter information and information indicating the weights used in bidirectional prediction, wherein the information indicating the weights used in bidirectional prediction is stored at a 4×4 level for inter-frame encoding / decoding modes; The motion information of the neighboring block is compared with the motion information of the motion candidates in the motion candidate list, wherein the comparison takes into account interpolation filter information; In response to the fact that the motion information of the neighboring block is different from the motion information of the motion candidate in the motion candidate list, the neighboring block is added as a candidate to the motion candidate list; as well as The motion information of the block is encoded or decoded based on the motion candidate list.
2. The method according to claim 1, wherein, The comparison takes into account signals indicating the weights used in bidirectional prediction.
3. The method according to claim 1, wherein, During the comparison, all available motion information of the neighboring blocks is considered.
4. The method according to claim 1, wherein, During comparison, a single memory comparison using the complete motion information structure is employed.
5. The method according to claim 1, wherein, When the predictions corresponding to the motion information of the neighboring block and the motion candidate are the same, the neighboring block is considered redundant with the motion candidate.
6. The method according to claim 1, wherein, The motion candidate list is used in merge mode to signal the motion information of the block.
7. The method according to claim 1, wherein, The motion information of the block is signaled via an index in the motion candidate list.
8. The method according to claim 1, wherein, The motion candidate list is used by all merge modes.
9. An apparatus for video encoding or decoding, comprising one or more processors, wherein the one or more processors are configured to: Access block's motion candidate list; Access the motion information of neighboring blocks of the block, wherein the motion information includes at least interpolation filter information and information indicating the weights used in bidirectional prediction, wherein the information indicating the weights used in bidirectional prediction is stored at a 4×4 level for inter-frame encoding / decoding modes; The motion information of the neighboring block is compared with the motion information of the motion candidates in the motion candidate list, wherein the comparison takes into account interpolation filter information; In response to the fact that the motion information of the neighboring block is different from the motion information of the motion candidate in the motion candidate list, the neighboring block is added as a candidate to the motion candidate list; as well as The motion information of the block is encoded or decoded based on the motion candidate list.
10. The apparatus according to claim 9, wherein, The comparison takes into account signals indicating the weights used in bidirectional prediction.
11. The apparatus according to claim 9, wherein, During the comparison, all available motion information of the neighboring blocks is considered.
12. The apparatus according to claim 9, wherein, During comparison, a single memory comparison using the complete motion information structure is employed.
13. The apparatus according to claim 9, wherein, When the predictions corresponding to the motion information of the neighboring block and the motion candidate are the same, the neighboring block is considered redundant with the motion candidate.
14. The apparatus according to claim 9, wherein, The motion candidate list is used in merge mode to signal the motion information of the block.
15. The apparatus according to claim 9, wherein, The motion information of the block is signaled via an index in the motion candidate list.
16. The apparatus according to claim 9, wherein, The motion candidate list is used by all merge modes.
17. A non-transitory computer-readable storage medium storing instructions, said instructions implementing an encoding or decoding method when executed, said encoding or decoding method comprising: Access block's motion candidate list; Access the motion information of neighboring blocks of the block, wherein the motion information includes at least interpolation filter information and information indicating the weights used in bidirectional prediction, wherein the information indicating the weights used in bidirectional prediction is stored at a 4×4 level for inter-frame encoding / decoding modes; The motion information of the neighboring block is compared with the motion information of the motion candidates in the motion candidate list, wherein the comparison takes into account interpolation filter information; In response to the fact that the motion information of the neighboring block is different from the motion information of the motion candidate in the motion candidate list, the neighboring block is added as a candidate to the motion candidate list; as well as The motion information of the block is encoded or decoded based on the motion candidate list.
18. The medium according to claim 17, wherein, The comparison takes into account signals indicating the weights used in bidirectional prediction.
19. The medium according to claim 17, wherein, During the comparison, all available motion information of the neighboring blocks is considered.
20. The medium according to claim 17, wherein, During comparison, a single memory comparison using the complete motion information structure is employed.