Encoding method and apparatus, decoding method and apparatus, and storage medium
By generating a list of motion vector prediction sub-candidates for ATMVP candidates and using CABAC encoding bypass or shared context, the problem of increased coding complexity in HEVC is solved, improving the efficiency of video coding, especially when using affine motion modes and ATMVP.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- CANON KK
- Filing Date
- 2019-10-18
- Publication Date
- 2026-06-16
AI Technical Summary
The existing video coding standard HEVC increases coding complexity and reduces coding efficiency when using affine motion modes and alternative temporal motion vector prediction (ATMVP).
The encoding process is simplified by generating a list of motion vector prediction sub-candidates for ATMVP candidates and encoding the motion vector prediction sub-indexes using CABAC encoding bypass or shared context.
It reduces coding complexity and improves coding efficiency, especially when using affine motion mode and ATMVP, reducing bit overhead and improving coding performance.
Smart Images

Figure CN116723322B_ABST
Abstract
Description
[0001] (This application is a divisional application of the application filed on October 18, 2019, with application number 2019800679973 and invention title "Video Encoding and Decoding".) Technical Field
[0002] This invention relates to video encoding and decoding. Background Technology
[0003] Recently, the Joint Video Experts Group (JVET) (a collaborative team comprised of MPEG and ITU-T Study Group 16 VCEG) began working on a new video coding standard called Multifunctional Video Coding (VVC). VVC aims to provide a significant improvement in compression performance (i.e., typically twice as much) over the existing HEVC standard and is scheduled for completion in 2020. Key target applications and services include, but are not limited to, 360-degree and High Dynamic Range (HDR) video. In summary, JVET evaluated feedback from 32 organizations using formal subjective testing conducted by independent test laboratories. Some recommendations indicate that compression efficiency is typically improved by 40% or more compared to HEVC. Specific effects were shown on Ultra High Definition (UHD) video test materials. Therefore, for the final standard, we can expect an improvement in compression efficiency far exceeding the target of 50%.
[0004] The JVET Exploratory Model (JEM) utilizes all HEVC tools. Another tool absent in HEVC is the use of "affine motion patterns" when applying motion compensation. Motion compensation in HEVC is limited to translation, but many types of motion exist, such as zooming in / out, rotation, perspective motion, and other irregular motions. When using affine motion patterns, more complex transformations are applied to the block to attempt to predict these forms of motion more accurately. Therefore, it is desirable to explore whether affine motion patterns can be used while achieving good coding efficiency with lower complexity.
[0005] Another tool not present in HEVC uses Alternate Temporal Motion Vector Prediction (ATMVP). ATMVP is a specific motion compensation. Instead of considering only the motion information of the current block from the time reference frame, it considers the motion information of each juxtaposed block. Therefore, ATMVP segments the current block using the relevant motion information of each sub-block. In the current VTM (VVC Test Model) reference software, ATMVP is signaled as a merge candidate to be inserted into the merge candidate list. When ATMVP is enabled at the SPS level, the maximum number of merge candidates increases by one. Therefore, six candidates are considered instead of the five when this mode is disabled.
[0006] These, and other tools described later, raise issues related to encoding efficiency and the complexity of encoding indices (e.g., merge indices) used to signal which candidate to select from the candidate list (e.g., from a merge candidate list used with merge pattern encoding). Summary of the Invention
[0007] Therefore, a solution is desired for at least one of the problems mentioned above.
[0008] According to a first aspect of the present invention, a method for encoding a motion vector prediction sub-index is provided, the method comprising:
[0009] Generate a list of motion vector prediction sub-candidates that include ATMVP candidates;
[0010] Select one of the motion vector prediction sub-candidates from the list; and
[0011] A motion vector prediction sub-index (merged index) is generated for the selected motion vector prediction sub-candidates using CABAC encoding, wherein one or more bits of the motion vector prediction sub-index are bypassed by CABAC encoding.
[0012] In one embodiment, all bits of the motion vector prediction sub-index except the first bit are bypassed CABAC encoded.
[0013] According to a second aspect of the present invention, a method for decoding a motion vector prediction sub-index is provided, the method comprising:
[0014] Generate a list of motion vector prediction sub-candidates that include ATMVP candidates;
[0015] The motion vector prediction sub-index is decoded using CABAC decoding, wherein one or more bits of the motion vector prediction sub-index are bypassed by CABAC decoding; and
[0016] Use the decoded motion vector prediction sub-index to identify one of the motion vector prediction sub-candidates in the list.
[0017] In one embodiment, all bits of the motion vector prediction sub-index except the first bit are bypassed CABAC decoded.
[0018] According to a third aspect of the present invention, an apparatus is provided for encoding a motion vector prediction sub-index, the apparatus comprising:
[0019] A component used to generate a list of motion vector prediction sub-candidates that include ATMVP candidates;
[0020] A component for selecting one of the motion vector prediction sub-candidates in the list; and
[0021] A component for generating a motion vector prediction sub-index (merged index) for selected motion vector prediction sub-candidates using CABAC encoding, wherein one or more bits of the motion vector prediction sub-index are bypassed CABAC encoded.
[0022] According to a fourth aspect of the present invention, an apparatus is provided for decoding a motion vector prediction sub-index, the apparatus comprising:
[0023] A component used to generate a list of motion vector prediction sub-candidates that include ATMVP candidates;
[0024] Components for decoding the motion vector prediction sub-index using CABAC decoding, wherein one or more bits of the motion vector prediction sub-index are bypassed by CABAC decoding; and
[0025] A component used to identify one of the motion vector prediction sub-candidates in the list using the decoded motion vector prediction sub-index.
[0026] According to a fifth aspect of the present invention, a method for encoding a motion vector prediction sub-index is provided, the method comprising:
[0027] Generate a list of motion vector prediction sub-candidates;
[0028] Select one of the motion vector prediction sub-candidates from the list; and
[0029] CABAC encoding is used to generate motion vector prediction sub-indexes for selected fixed motion vector prediction sub-candidates, where two or more bits of the motion vector prediction sub-indexes share the same context.
[0030] In one embodiment, all bits of the motion vector prediction sub-index share the same context.
[0031] According to a sixth aspect of the present invention, a method for decoding a motion vector prediction sub-index is provided, the method comprising:
[0032] Generate a list of motion vector prediction sub-candidates;
[0033] The motion vector prediction sub-index is decoded using CABAC decoding, wherein two or more bits of the motion vector prediction sub-index share the same context; and
[0034] Use the decoded motion vector prediction sub-index to identify one of the motion vector prediction sub-candidates in the list.
[0035] In one embodiment, all bits of the motion vector prediction sub-index share the same context.
[0036] According to a seventh aspect of the present invention, an apparatus is provided for encoding a motion vector prediction sub-index, the apparatus comprising:
[0037] A component used to generate a list of candidate motion vector prediction sub-projects;
[0038] A component for selecting one of the motion vector prediction sub-candidates in the list; and
[0039] A component for using CABAC encoding to generate a motion vector prediction sub-index for a selected motion vector prediction sub-candidate, wherein two or more bits of the motion vector prediction sub-index share the same context.
[0040] According to an eighth aspect of the present invention, an apparatus for decoding a motion vector prediction sub-index is provided, the apparatus comprising:
[0041] A component used to generate a list of candidate motion vector prediction sub-projects;
[0042] Components for decoding the motion vector prediction sub-index using CABAC decoding, wherein two or more bits of the motion vector prediction sub-index share the same context; and
[0043] A component used to identify one of the motion vector prediction sub-candidates in the list using the decoded motion vector prediction sub-index.
[0044] According to a ninth aspect of the present invention, a method for encoding a motion vector prediction sub-index is provided, the method comprising:
[0045] Generate a list of motion vector prediction sub-candidates;
[0046] Select one of the motion vector prediction sub-candidates from the list; and
[0047] CABAC encoding is used to generate motion vector prediction sub-indexes for selected motion vector prediction sub-candidates, wherein the context variable of at least one bit of the motion vector prediction sub-index of the current block depends on the motion vector prediction sub-indexes of at least one neighboring block of the current block.
[0048] In one embodiment, the context variable of at least one bit of the motion vector prediction sub-index depends on the corresponding motion vector prediction sub-indexes of at least two adjacent blocks.
[0049] In another embodiment, the context variable of at least one bit of the motion vector prediction sub-index depends on the motion vector prediction sub-index of the left-hand adjacent block to the left of the current block and the motion vector prediction sub-index of the upper adjacent block above the current block.
[0050] In another embodiment, the left adjacent block is A2 and the upper adjacent block is B3.
[0051] In another embodiment, the left adjacent block is A1 and the upper adjacent block is B1.
[0052] In another embodiment, the context variable has three different possible values.
[0053] Another embodiment includes: comparing the motion vector prediction sub-index of at least one adjacent block with the index value of the motion vector prediction sub-index of the current block, and setting the context variable based on the comparison result.
[0054] Another embodiment includes: comparing the motion vector prediction sub-index of at least one adjacent block with a parameter representing the bit position of a bit or one of the bits in the motion vector prediction sub-index of the current block; and setting the context variable based on the comparison result.
[0055] Another embodiment includes: performing a first comparison by comparing the motion vector prediction sub-index of a first adjacent block with a parameter representing the bit position of a bit or one of the bits in the motion vector prediction sub-index of the current block; performing a second comparison by comparing the motion vector prediction sub-index of a second adjacent block with the parameter; and setting the context variable based on the results of the first and second comparisons.
[0056] According to a tenth aspect of the present invention, a method for decoding a motion vector prediction sub-index is provided, the method comprising:
[0057] Generate a list of motion vector prediction sub-candidates;
[0058] The motion vector prediction sub-index is decoded using CABAC decoding, wherein at least one bit of the context variable of the motion vector prediction sub-index of the current block depends on the motion vector prediction sub-index of at least one neighboring block of the current block; and
[0059] Use the decoded motion vector prediction sub-index to identify one of the motion vector prediction sub-candidates in the list.
[0060] In one embodiment, the context variable of at least one bit of the motion vector prediction sub-index depends on the corresponding motion vector prediction sub-indexes of at least two adjacent blocks.
[0061] In another embodiment, the context variable of at least one bit of the motion vector prediction sub-index depends on the motion vector prediction sub-index of the left-hand adjacent block to the left of the current block and the motion vector prediction sub-index of the upper-hand adjacent block above the current block.
[0062] In another embodiment, the left adjacent block is A2 and the upper adjacent block is B3.
[0063] In another embodiment, the left adjacent block is A1 and the upper adjacent block is B1.
[0064] In another embodiment, the context variable has three different possible values.
[0065] Another embodiment includes: comparing the motion vector prediction sub-index of at least one adjacent block with the index value of the motion vector prediction sub-index of the current block, and setting the context variable based on the comparison result.
[0066] Another embodiment includes: comparing the motion vector prediction sub-index of at least one adjacent block with a parameter representing the bit position of a bit or one of the bits in the motion vector prediction sub-index of the current block; and setting the context variable based on the comparison result.
[0067] Another embodiment includes: performing a first comparison by comparing the motion vector prediction sub-index of a first adjacent block with a parameter representing the bit position of a bit or one of the bits in the motion vector prediction sub-index of the current block; performing a second comparison by comparing the motion vector prediction sub-index of a second adjacent block with the parameter; and setting the context variable based on the results of the first and second comparisons.
[0068] According to an eleventh aspect of the present invention, an apparatus for encoding a motion vector prediction sub-index is provided, the apparatus comprising:
[0069] A component used to generate a list of candidate motion vector prediction sub-projects;
[0070] A component for selecting one of the motion vector prediction sub-candidates in the list; and
[0071] A component for generating a motion vector prediction sub-index for a selected motion vector prediction sub-candidate using CABAC encoding, wherein the context variable of at least one bit of the motion vector prediction sub-index of the current block depends on the motion vector prediction sub-index of at least one neighboring block of the current block.
[0072] According to a twelfth aspect of the present invention, an apparatus for decoding a motion vector prediction sub-index is provided, the apparatus comprising:
[0073] A component used to generate a list of candidate motion vector prediction sub-projects;
[0074] Components for decoding the motion vector prediction sub-index using CABAC decoding, wherein a context variable of at least one bit of the motion vector prediction sub-index of the current block depends on the motion vector prediction sub-index of at least one neighboring block of the current block; and
[0075] A component used to identify one of the motion vector prediction sub-candidates in the list using the decoded motion vector prediction sub-index.
[0076] According to a thirteenth aspect of the present invention, a method for encoding a motion vector prediction sub-index is provided, the method comprising:
[0077] Generate a list of motion vector prediction sub-candidates;
[0078] Select one of the motion vector prediction sub-candidates from the list; and
[0079] CABAC encoding is used to generate the motion vector prediction sub-index of the selected motion vector prediction sub-candidate, wherein at least one bit of the context variable of the motion vector prediction sub-index of the current block depends on the skip flag of the current block.
[0080] According to a fourteenth aspect of the present invention, a method for encoding a motion vector prediction sub-index is provided, the method comprising:
[0081] Generate a list of motion vector prediction sub-candidates;
[0082] Select one of the motion vector prediction sub-candidates from the list; and
[0083] CABAC encoding is used to generate motion vector prediction sub-indexes for selected motion vector prediction sub-candidates, wherein at least one bit of the context variable of the motion vector prediction sub-index of the current block depends on another parameter or syntactic element of the current block that is available before decoding the motion vector prediction sub-index.
[0084] According to a fifteenth aspect of the present invention, a method for encoding a motion vector prediction sub-index is provided, the method comprising:
[0085] Generate a list of motion vector prediction sub-candidates;
[0086] Select one of the motion vector prediction sub-candidates from the list; and
[0087] CABAC encoding is used to generate motion vector prediction sub-indexes for selected motion vector prediction sub-candidates, wherein at least one bit of the context variable of the motion vector prediction sub-index of the current block depends on another parameter or syntactic element of the current block as an indicator of motion complexity in the current block.
[0088] According to a sixteenth aspect of the present invention, a method for decoding a motion vector prediction sub-index is provided, the method comprising:
[0089] Generate a list of motion vector prediction sub-candidates;
[0090] The motion vector prediction sub-index is decoded using CABAC decoding, wherein at least one bit of the context variable of the motion vector prediction sub-index of the current block depends on the skip flag of the current block; and
[0091] Use the decoded motion vector prediction sub-index to identify one of the motion vector prediction sub-candidates in the list.
[0092] According to a seventeenth aspect of the present invention, a method for decoding a motion vector prediction sub-index is provided, the method comprising:
[0093] Generate a list of motion vector prediction sub-candidates;
[0094] The motion vector prediction sub-index is decoded using CABAC decoding, wherein at least one bit of the context variable of the motion vector prediction sub-index of the current block depends on another parameter or syntactic element of the current block that is available before decoding the motion vector prediction sub-index; and
[0095] Use the decoded motion vector prediction sub-index to identify one of the motion vector prediction sub-candidates in the list.
[0096] According to an eighteenth aspect of the present invention, a method for decoding a motion vector prediction sub-index is provided, the method comprising:
[0097] Generate a list of motion vector prediction sub-candidates;
[0098] The motion vector prediction sub-index is decoded using CABAC decoding, wherein at least one bit of the context variable of the motion vector prediction sub-index of the current block depends on another parameter or syntactic element of the current block, which serves as an indicator of the motion complexity in the current block; and
[0099] Use the decoded motion vector prediction sub-index to identify one of the motion vector prediction sub-candidates in the list.
[0100] According to a nineteenth aspect of the present invention, an apparatus for encoding a motion vector prediction sub-index is provided, the apparatus comprising:
[0101] A component used to generate a list of candidate motion vector prediction sub-projects;
[0102] A component for selecting one of the motion vector prediction sub-candidates in the list; and
[0103] A component for generating a motion vector prediction sub-index for a selected motion vector prediction sub-candidate using CABAC encoding, wherein at least one bit of the context variable of the motion vector prediction sub-index of the current block depends on the skip flag of the current block.
[0104] According to a twentieth aspect of the present invention, an apparatus is provided for encoding a motion vector prediction sub-index, the apparatus comprising:
[0105] A component used to generate a list of candidate motion vector prediction sub-projects;
[0106] A component for selecting one of the motion vector prediction sub-candidates in the list; and
[0107] A component for generating a motion vector prediction sub-index for a selected motion vector prediction sub-candidate using CABAC encoding, wherein at least one bit of the motion vector prediction sub-index of the current block is a context variable that depends on another parameter or syntax element of the current block that is available before the motion vector prediction sub-index is decoded.
[0108] According to a twenty-first aspect of the present invention, an apparatus for encoding a motion vector prediction sub-index is provided, the apparatus comprising:
[0109] A component used to generate a list of candidate motion vector prediction sub-projects;
[0110] A component for selecting one of the motion vector prediction sub-candidates in the list; and
[0111] A component for generating a motion vector prediction sub-index for a selected motion vector prediction sub-candidate using CABAC encoding, wherein at least one bit of the motion vector prediction sub-index of the current block is a context variable that depends on another parameter or syntactic element of the current block as an indicator of motion complexity in the current block.
[0112] According to a twenty-second aspect of the present invention, an apparatus is provided for decoding a motion vector prediction sub-index, the apparatus comprising:
[0113] A component used to generate a list of candidate motion vector prediction sub-projects;
[0114] Components for decoding the motion vector prediction sub-index using CABAC decoding, wherein at least one bit of the context variable of the motion vector prediction sub-index of the current block depends on the skip flag of the current block; and
[0115] A component used to identify one of the motion vector prediction sub-candidates in the list using the decoded motion vector prediction sub-index.
[0116] According to a twenty-third aspect of the present invention, an apparatus is provided for decoding a motion vector prediction sub-index, the apparatus comprising:
[0117] A component used to generate a list of candidate motion vector prediction sub-projects;
[0118] Components for decoding the motion vector prediction sub-index using CABAC decoding, wherein at least one bit of the motion vector prediction sub-index of the current block's context variable depends on another parameter or syntactic element of the current block available before decoding the motion vector prediction sub-index; and
[0119] A component used to identify one of the motion vector prediction sub-candidates in the list using the decoded motion vector prediction sub-index.
[0120] According to a twenty-fourth aspect of the present invention, an apparatus for decoding a motion vector prediction sub-index is provided, the apparatus comprising:
[0121] A component used to generate a list of candidate motion vector prediction sub-projects;
[0122] A component for decoding the motion vector prediction sub-index using CABAC decoding, wherein at least one bit of the context variable of the motion vector prediction sub-index of the current block depends on another parameter or syntactic element of the current block, which serves as an indicator of the motion complexity in the current block; and
[0123] A component used to identify one of the motion vector prediction sub-candidates in the list using the decoded motion vector prediction sub-index.
[0124] According to a twenty-fifth aspect of the present invention, a method for encoding a motion vector prediction sub-index is provided, the method comprising:
[0125] Generate a list of motion vector prediction sub-candidates;
[0126] Select one of the motion vector prediction sub-candidates from the list; and
[0127] CABAC encoding is used to generate a motion vector prediction sub-index for the selected motion vector prediction sub-candidates, wherein at least one bit of the context variable of the motion vector prediction sub-index of the current block depends on the affine motion vector prediction sub-candidates in the list (if they exist).
[0128] In one embodiment, the context variable depends on the position of the first affine motion vector prediction sub-candidate in the list.
[0129] According to a twenty-sixth aspect of the present invention, a method for decoding a motion vector prediction sub-index is provided, the method comprising:
[0130] Generate a list of motion vector prediction sub-candidates;
[0131] The motion vector prediction sub-index is decoded using CABAC decoding, wherein at least one bit of the context variable of the motion vector prediction sub-index of the current block depends on the affine motion vector prediction sub-candidates in the list (if they exist); and
[0132] Use the decoded motion vector prediction sub-index to identify one of the motion vector prediction sub-candidates in the list.
[0133] In one embodiment, the context variable depends on the position of the first affine motion vector prediction sub-candidate in the list.
[0134] According to a twenty-seventh aspect of the present invention, an apparatus is provided for encoding a motion vector prediction sub-index, the apparatus comprising:
[0135] A component used to generate a list of candidate motion vector prediction sub-projects;
[0136] A component for selecting one of the motion vector prediction sub-candidates in the list; and
[0137] The component is used to generate a motion vector prediction sub-index for a selected motion vector prediction sub-candidate using CABAC encoding, wherein at least one bit of the context variable of the motion vector prediction sub-index of the current block depends on the affine motion vector prediction sub-candidate in the list (if it exists).
[0138] According to a twenty-eighth aspect of the present invention, an apparatus is provided for decoding a motion vector prediction sub-index, the apparatus comprising:
[0139] A component used to generate a list of candidate motion vector prediction sub-projects;
[0140] Components for decoding the motion vector prediction sub-index using CABAC decoding, wherein at least one bit of the context variable of the motion vector prediction sub-index of the current block depends on affine motion vector prediction sub-candidates in the list (if they exist); and
[0141] A component used to identify one of the motion vector prediction sub-candidates in the list using the decoded motion vector prediction sub-index.
[0142] According to a twenty-ninth aspect of the present invention, a method for encoding a motion vector prediction sub-index is provided, the method comprising:
[0143] Generate a list of motion vector prediction sub-candidates that includes affine motion vector prediction sub-candidates;
[0144] Select one of the motion vector prediction sub-candidates from the list; and
[0145] CABAC encoding is used to generate motion vector prediction sub-indexes for selected motion vector prediction sub-candidates, wherein at least one bit of the context variable of the motion vector prediction sub-index of the current block depends on the affine flags of the current block and / or at least one neighboring block of the current block.
[0146] According to a thirtieth aspect of the present invention, a method for decoding a motion vector prediction sub-index is provided, the method comprising:
[0147] Generate a list of motion vector prediction sub-candidates that includes affine motion vector prediction sub-candidates;
[0148] The motion vector prediction sub-index is decoded using CABAC decoding, wherein at least one bit of the context variable of the motion vector prediction sub-index of the current block depends on the affine flag of the current block and / or at least one neighboring block of the current block; and
[0149] Use the decoded motion vector prediction sub-index to identify one of the motion vector prediction sub-candidates in the list.
[0150] According to a thirty-first aspect of the present invention, an apparatus for encoding a motion vector prediction sub-index is provided, the apparatus comprising:
[0151] A component for generating a list of motion vector prediction sub-candidates, including affine motion vector prediction sub-candidates;
[0152] A component for selecting one of the motion vector prediction sub-candidates in the list; and
[0153] A component for generating a motion vector prediction sub-index for a selected motion vector prediction sub-candidate using CABAC encoding, wherein at least one bit of the context variable of the motion vector prediction sub-index of the current block depends on the affine flag of the current block and / or at least one neighboring block of the current block.
[0154] According to a thirty-second aspect of the present invention, an apparatus for decoding a motion vector prediction sub-index is provided, the apparatus comprising:
[0155] A component for generating a list of motion vector prediction sub-candidates, including affine motion vector prediction sub-candidates;
[0156] Components for decoding the motion vector prediction sub-index using CABAC decoding, wherein at least one bit of the context variable of the motion vector prediction sub-index of the current block depends on the affine flags of the current block and / or at least one neighboring block of the current block; and
[0157] A component used to identify one of the motion vector prediction sub-candidates in the list using the decoded motion vector prediction sub-index.
[0158] According to a thirty-third aspect of the present invention, a method for encoding a motion vector prediction sub-index is provided, the method comprising:
[0159] Generate a list of motion vector prediction sub-candidates;
[0160] Select one of the motion vector prediction sub-candidates from the list; and
[0161] CABAC encoding is used to generate motion vector prediction sub-indexes for the selected motion vector prediction sub-candidates, wherein at least one bit of the context variable of the motion vector prediction sub-index of the current block is derived from the context variable of at least one of the skip flag and affine flag of the current block.
[0162] According to a thirty-fourth aspect of the present invention, a method for decoding a motion vector prediction sub-index is provided, the method comprising:
[0163] Generate a list of motion vector prediction sub-candidates;
[0164] The motion vector prediction sub-index is decoded using CABAC decoding, wherein at least one bit of the context variable of the motion vector prediction sub-index of the current block is derived from the context variable of at least one of the skip flag and affine flag of the current block; and
[0165] Use the decoded motion vector prediction sub-index to identify one of the motion vector prediction sub-candidates in the list.
[0166] According to a thirty-fifth aspect of the present invention, an apparatus for encoding a motion vector prediction sub-index is provided, the apparatus comprising:
[0167] A component used to generate a list of candidate motion vector prediction sub-projects;
[0168] A component for selecting one of the motion vector prediction sub-candidates in the list; and
[0169] A component for generating a motion vector prediction sub-index for a selected motion vector prediction sub-candidate using CABAC encoding, wherein at least one bit of the context variable of the motion vector prediction sub-index of the current block is derived from the context variable of at least one of the skip flag and affine flag of the current block.
[0170] According to a thirty-sixth aspect of the present invention, an apparatus for decoding a motion vector prediction sub-index is provided, the apparatus comprising:
[0171] A component used to generate a list of candidate motion vector prediction sub-projects;
[0172] A component for decoding the motion vector prediction sub-index using CABAC decoding, wherein a context variable of at least one bit of the motion vector prediction sub-index of the current block is derived from a context variable of at least one of the skip flag and affine flag of the current block; and
[0173] A component used to identify one of the motion vector prediction sub-candidates in the list using the decoded motion vector prediction sub-index.
[0174] According to a thirty-seventh aspect of the present invention, a method for encoding a motion vector prediction sub-index is provided, the method comprising:
[0175] Generate a list of motion vector prediction sub-candidates;
[0176] Select one of the motion vector prediction sub-candidates from the list; and
[0177] CABAC encoding is used to generate motion vector prediction sub-indexes for selected motion vector prediction sub-candidates, wherein at least one bit of the context variable of the motion vector prediction sub-index of the current block has only two distinct possible values.
[0178] According to a thirty-eighth aspect of the present invention, a method for decoding a motion vector prediction sub-index is provided, the method comprising:
[0179] Generate a list of motion vector prediction sub-candidates;
[0180] The motion vector prediction sub-index is decoded using CABAC decoding, wherein at least one bit of the context variable of the motion vector prediction sub-index of the current block has only two distinct possible values; and
[0181] Use the decoded motion vector prediction sub-index to identify one of the motion vector prediction sub-candidates in the list.
[0182] According to a thirty-ninth aspect of the present invention, an apparatus for encoding a motion vector prediction sub-index is provided, the apparatus comprising:
[0183] A component used to generate a list of candidate motion vector prediction sub-projects;
[0184] A component for selecting one of the motion vector prediction sub-candidates in the list; and
[0185] The component used for generating motion vector prediction sub-indexes for selected motion vector prediction sub-candidates using CABAC encoding, wherein the context variable of at least one bit of the motion vector prediction sub-index of the current block has only two distinct possible values.
[0186] According to a fortieth aspect of the present invention, an apparatus for decoding a motion vector prediction sub-index is provided, the apparatus comprising:
[0187] A component used to generate a list of candidate motion vector prediction sub-projects;
[0188] A component for decoding the motion vector prediction sub-index using CABAC decoding, wherein at least one bit of the context variable of the motion vector prediction sub-index of the current block has only two distinct possible values; and
[0189] A component used to identify one of the motion vector prediction sub-candidates in the list using the decoded motion vector prediction sub-index.
[0190] Another aspect of the invention relates to programs that, when executed by a computer or processor, cause the computer or processor to perform any of the methods described above. The programs may be provided separately or may be carried on, by, or within a carrier medium. The carrier medium may be non-transitory, such as a storage medium, specifically a computer-readable storage medium. The carrier medium may also be transient, such as a signal or other transmission medium. Signals may be transmitted via any suitable network, including the Internet.
[0191] Another aspect of the invention relates to a camera comprising any of the foregoing device aspects. In one embodiment, the camera further includes a zoom component.
[0192] According to a forty-first aspect of the present invention, a method for encoding a motion information prediction sub-index is provided, the method comprising: generating a list of motion information prediction sub-candidates; selecting one of the motion information prediction sub-candidates in the list as an affine merging mode predictor when using an affine merging mode; selecting one of the motion information prediction sub-candidates in the list as a non-affine merging mode predictor when using a non-affine merging mode; and generating a motion information prediction sub-index for the selected motion information prediction sub-candidate using CABAC encoding, wherein one or more bits of the motion information prediction sub-index are bypassed by CABAC encoding.
[0193] Appropriately, CABAC encoding includes using the same context variable for at least one bit of the motion information prediction sub-index for the current block in both affine merging and non-affine merging modes. Alternatively, CABAC encoding includes using a first context variable for at least one bit of the motion information prediction sub-index for the current block in affine merging mode or a second context variable in non-affine merging mode; and the method further includes including data indicating the use of affine merging mode in the bit stream when using affine merging mode.
[0194] Appropriately, the method further includes: including data used to determine the maximum number of motion information prediction sub-candidates that can be included in the generated list of motion information prediction sub-candidates in the bit stream. Appropriately, all bits of the motion information prediction sub-index except the first bit are bypassed CABAC encoded. Appropriately, the first bit is CABAC encoded. Appropriately, the same syntactic elements are used to encode the motion information prediction sub-index for the selected motion information prediction sub-candidates in both the affine merging mode and the non-affine merging mode.
[0195] According to a forty-second aspect of the present invention, a method for decoding a motion information prediction sub-index is provided, the method comprising: generating a list of motion information prediction sub-candidates; decoding the motion information prediction sub-index using CABAC decoding, wherein one or more bits of the motion information prediction sub-index are bypassed by CABAC decoding; identifying one of the motion information prediction sub-candidates in the list as an affine merging mode predictor using the decoded motion information prediction sub-index when using an affine merging mode; and identifying one of the motion information prediction sub-candidates in the list as a non-affine merging mode predictor using the decoded motion information prediction sub-index when using a non-affine merging mode.
[0196] Appropriately, CABAC decoding includes using the same context variable for at least one bit of the motion information prediction sub-index for the current block, both when using an affine merging mode and when using a non-affine merging mode. Optionally, the method further includes obtaining data from a bitstream indicating the use of an affine merging mode, and CABAC decoding includes using a first context variable for at least one bit of the motion information prediction sub-index for the current block when the obtained data indicates the use of an affine merging mode; and using a second context variable when the obtained data indicates the use of a non-affine merging mode.
[0197] Suitablely, the method further includes obtaining data from the bit stream for indicating the use of an affine merging mode, wherein, when the obtained data indicates the use of an affine merging mode, the generated list of motion information prediction sub-candidates includes affine merging mode prediction sub-candidates; and when the obtained data indicates the use of a non-affine merging mode, the generated list of motion information prediction sub-candidates includes non-affine merging mode prediction sub-candidates.
[0198] Appropriately, the method further includes obtaining data from the bit stream for determining the maximum number of motion information prediction sub-candidates that can be included in the list of generated motion information prediction sub-candidates. Appropriately, all bits of the motion information prediction sub-index except the first bit are bypassed CABAC decoded. Appropriately, the first bit is CABAC decoded. Appropriately, decoding the motion information prediction sub-index includes parsing the same syntactic elements from the bit stream in both affine merging and non-affine merging modes. Appropriately, the motion information prediction sub-candidates include information for obtaining motion vectors. Appropriately, the list of generated motion information prediction sub-candidates includes ATMVP candidates. Appropriately, in both affine merging and non-affine merging modes, the list of generated motion information prediction sub-candidates has the same maximum number of motion information prediction sub-candidates that can be included therein.
[0199] According to a forty-third aspect of the present invention, an apparatus is provided for encoding a motion information prediction sub-index, the apparatus comprising: means for generating a list of motion information prediction sub-candidates; means for selecting one of the motion information prediction sub-candidates in the list as an affine merging mode predictor when using an affine merging mode; means for selecting one of the motion information prediction sub-candidates in the list as a non-affine merging mode predictor when using a non-affine merging mode; and means for generating a motion information prediction sub-index for the selected motion information prediction sub-candidate using CABAC encoding, wherein one or more bits of the motion information prediction sub-index are bypassed by CABAC encoding. Suitably, the apparatus includes means for performing a method for encoding the motion information prediction sub-index according to aspect forty-first.
[0200] According to a forty-fourth aspect of the present invention, an apparatus is provided for decoding a motion information prediction sub-index, the apparatus comprising: components for generating a list of motion information prediction sub-candidates; components for decoding the motion information prediction sub-index using CABAC decoding, wherein one or more bits of the motion information prediction sub-index are bypassed by CABAC decoding; components for identifying one of the motion information prediction sub-candidates in the list as an affine merging mode predictor using the decoded motion information prediction sub-index when using an affine merging mode; and components for identifying one of the motion information prediction sub-candidates in the list as a non-affine merging mode predictor using the decoded motion information prediction sub-index when using a non-affine merging mode. Suitably, the apparatus includes components for performing a method for decoding a motion information prediction sub-index according to a forty-second aspect.
[0201] According to a forty-fifth aspect of the present invention, a method for encoding a motion information prediction sub-index of an affine merging pattern is provided, the method comprising: generating a list of motion information prediction sub-candidates; selecting one of the motion information prediction sub-candidates in the list as an affine merging pattern predictor; and generating a motion information prediction sub-index for the selected motion information prediction sub-candidate using CABAC encoding, wherein one or more bits of the motion information prediction sub-index are bypassed by CABAC encoding.
[0202] Suitablely, when using a non-affine merging mode, the method further includes selecting one of the motion information predictor candidates in the list as the non-affine merging mode predictor. Suitablely, CABAC encoding includes: using a first context variable for at least one bit of the motion information predictor index of the current block when using an affine merging mode or using a second context variable when using a non-affine merging mode; and the method further includes: when using an affine merging mode, including data indicating the use of an affine merging mode in the bit stream. Optionally, CABAC encoding includes: using the same context variable for at least one bit of the motion information predictor index of the current block when using an affine merging mode and when using a non-affine merging mode.
[0203] Appropriately, the method further includes including in the bit stream data used to determine the maximum number of motion information prediction sub-candidates that can be included in the list of generated motion information prediction sub-candidates.
[0204] Appropriately, all bits of the motion information prediction sub-index except the first bit are bypassed CABAC encoded. Appropriately, the first bit is CABAC encoded. Appropriately, the same syntactic elements are used to encode the motion information prediction sub-index for the selected motion information prediction sub-candidates in both affine merging and non-affine merging modes.
[0205] According to a forty-sixth aspect of the present invention, a decoding method is provided for decoding a motion information prediction sub-index of an affine merging pattern, the method comprising: generating a list of motion information prediction sub-candidates; decoding the motion information prediction sub-index using CABAC decoding, wherein one or more bits of the motion information prediction sub-index are bypassed by CABAC decoding; and, when using an affine merging pattern, using the decoded motion information prediction sub-index to identify one of the motion information prediction sub-candidates in the list as an affine merging pattern predictor.
[0206] Suitablely, when using a non-affine merging mode, the method further includes: using the decoded motion information predictor index to identify one of the motion information predictor candidates in the list as a non-affine merging mode predictor. Suitablely, the method further includes: obtaining data from the bit stream indicating the use of an affine merging mode, and CABAC decoding including at least one bit of the motion information predictor index for the current block: using a first context variable when the obtained data indicates the use of an affine merging mode; and using a second context variable when the obtained data indicates the use of a non-affine merging mode. Optionally, CABAC decoding includes: using the same context variable for at least one bit of the motion information predictor index for the current block in both the affine merging mode and non-affine merging mode cases.
[0207] Suitable, the method further includes: obtaining data from the bit stream for indicating the use of an affine merging mode, wherein, when the obtained data indicates the use of an affine merging mode, the generated list of motion information prediction sub-candidates includes affine merging mode prediction sub-candidates, and when the obtained data indicates the use of a non-affine merging mode, the generated list of motion information prediction sub-candidates includes non-affine merging mode prediction sub-candidates.
[0208] Appropriately, decoding the motion information prediction sub-index includes parsing the same syntactic elements from the bitstream in both affine merging and non-affine merging modes. Appropriately, the method further includes obtaining data from the bitstream for determining the maximum number of motion information prediction sub-candidates that can be included in the generated list of motion information prediction sub-candidates. Appropriately, all bits of the motion information prediction sub-index except the first bit are bypassed CABAC decoded. Appropriately, the first bit is CABAC decoded. Appropriately, the motion information prediction sub-candidates include information for obtaining motion vectors. Appropriately, the generated list of motion information prediction sub-candidates includes ATMVP candidates. Appropriately, in both affine merging and non-affine merging modes, the generated list of motion information prediction sub-candidates has the same maximum number of motion information prediction sub-candidates that can be included.
[0209] According to a forty-seventh aspect of the present invention, an apparatus is provided for encoding a motion information prediction sub-index of an affine merging pattern, the apparatus comprising: means for generating a list of motion information prediction sub-candidates; means for selecting one of the motion information prediction sub-candidates in the list as an affine merging pattern predictor; and means for generating a motion information prediction sub-index for the selected motion information prediction sub-candidate using CABAC encoding, wherein one or more bits of the motion information prediction sub-index are bypassed by CABAC encoding. Suitably, the apparatus includes means for performing a method for encoding a motion information prediction sub-index according to a forty-fifth aspect.
[0210] According to a forty-eighth aspect of the present invention, an apparatus is provided for decoding a motion information prediction sub-index of an affine merging pattern, the apparatus comprising: components for generating a list of motion information prediction sub-candidates; components for decoding the motion information prediction sub-index using CABAC decoding, wherein one or more bits of the motion information prediction sub-index are bypassed by CABAC decoding; and components for identifying one of the motion information prediction sub-candidates in the list as an affine merging pattern predictor using the decoded motion information prediction sub-index when using an affine merging pattern. Suitably, the apparatus includes components for performing a method for decoding a motion information prediction sub-index according to a forty-sixth aspect.
[0211] In one embodiment, the camera is adapted to indicate when the zoom component is operable and to signal an affine mode based on the indication that the zoom component is operable.
[0212] In another embodiment, the camera also includes a panning component.
[0213] In another embodiment, the camera is adapted to indicate when the panning component is operable and to signal the affine mode based on the indication that the panning component is operable.
[0214] According to another aspect of the present invention, a moving device is provided that includes a camera embodying any of the above-described camera aspects.
[0215] In one embodiment, the mobile device further includes at least one position sensor adapted to sense changes in the orientation of the mobile device.
[0216] In one embodiment, the mobile device is adapted to signal an affine pattern based on a change in the orientation of the sensing mobile device.
[0217] Other features of the invention are characterized by the other independent and dependent claims.
[0218] Any feature in one aspect of the invention may be applied to other aspects of the invention in any suitable combination. In particular, a method aspect may be applied to an apparatus aspect, and vice versa.
[0219] Furthermore, features implemented in hardware can be implemented in software, and vice versa. Any references to software and hardware features here should be interpreted accordingly.
[0220] Any device feature described herein can also be provided as a method feature, and vice versa. As used herein, device plus functional features can be alternatively expressed in terms of their corresponding structural aspects, such as a properly programmed processor and associated memory.
[0221] It should also be understood that specific combinations of the various features described and defined in any aspect of the invention may be implemented, provided, and / or used independently. Attached Figure Description
[0222] The accompanying diagram will now be used as an example, in which:
[0223] Figure 1 This is a diagram used to illustrate the coding structure used in HEVC;
[0224] Figure 2 This is a block diagram illustrating a data communication system that can implement one or more embodiments of the present invention;
[0225] Figure 3 This is a block diagram illustrating components of a processing apparatus that can implement one or more embodiments of the present invention;
[0226] Figure 4 This is a flowchart illustrating the steps of an encoding method according to an embodiment of the present invention;
[0227] Figure 5 This is a flowchart illustrating the steps of a decoding method according to an embodiment of the present invention;
[0228] Figure 6a and 6b Examples show spatial and temporal blocks that can be used to generate motion vector predictors;
[0229] Figure 7 This illustrates the simplified steps for deriving AMVP prediction subsets.
[0230] Figure 8 This is a schematic diagram of motion vector export processing in the merging mode;
[0231] Figure 9 Example of segmentation and temporal motion vector prediction for the current block;
[0232] Figure 10 (a) An example of the encoding used for the merged index in HEVC, or the encoding when ATMVP is not enabled at the SPS level;
[0233] Figure 10 (b) Example of encoding merged indexes when ATMVP is enabled at the SPS level;
[0234] Figure 11 (a) An example of a simple affine sports field;
[0235] Figure 11 (b) Examples of more complex affine sports fields;
[0236] Figure 12 It is a flowchart of partial decoding processing of some syntactic elements related to the encoding pattern;
[0237] Figure 13 This is a flowchart illustrating the candidate output for merging;
[0238] Figure 14 This is a flowchart illustrating a first embodiment of the present invention;
[0239] Figure 15 This is a flowchart of partial decoding processing of some syntactic elements related to the encoding pattern in the twelfth embodiment of the present invention;
[0240] Figure 16 This is a flowchart illustrating the generation of a merge candidate list in the twelfth embodiment of the present invention;
[0241] Figure 17 This is a block diagram illustrating a CABAC encoder suitable for use in embodiments of the present invention;
[0242] Figure 18 This is a schematic block diagram of a communication system for implementing one or more embodiments of the present invention;
[0243] Figure 19 It is a schematic block diagram of a computing device;
[0244] Figure 20 This is a diagram illustrating a webcam system;
[0245] Figure 21 This is a diagram illustrating a smartphone;
[0246] Figure 22 This is a flowchart of partial decoding processing of some syntactic elements related to the encoding pattern according to the sixteenth embodiment;
[0247] Figure 23 This is a flowchart illustrating a single index signaling notification scheme for both merge mode and affine merge mode; and
[0248] Figure 24 This is a flowchart illustrating the affine merge candidate derivation process used in the affine merge mode. Detailed Implementation
[0249] The embodiments of the invention described below relate to improving the encoding and decoding of indexes using CABAC. It should be understood that, according to alternative embodiments of the invention, implementations of other context-based arithmetic coding schemes functionally similar to CABAC are also possible. Before describing the embodiments, video encoding and decoding techniques, as well as associated encoders and decoders, will be described.
[0250] Figure 1 This relates to the coding structure used in the High Efficiency Video Coding (HEVC) video standard. Video sequence 1 consists of a series of digital images i. Each such digital image is represented by one or more matrices. The matrix coefficients represent pixels.
[0251] The sequence of images 2 can be divided into slices 3. In some cases, a slice can constitute the entire image. These slices are divided into non-overlapping coding tree units (CTUs). A coding tree unit (CTU) is the basic processing unit of the High Efficiency Video Coding (HEVC) video standard and conceptually corresponds structurally to the macroblock unit used in several previous video standards. A CTU is sometimes also called a maximum coding unit (LCU). A CTU has luma and chroma component parts, each component part being called a coding tree block (CTB). These different color components are not... Figure 1 As shown in the image.
[0252] For HEVC, the CTU is typically 64 pixels × 64 pixels, but for VVC, this size can be 128 pixels × 128 pixels. Quadtree decomposition can be used to iteratively divide each CTU into smaller variable-size coding units (CUs).
[0253] The coding unit is the basic coding element and consists of two seed units called the prediction unit (PU) and the transform unit (TU). The maximum size of the PU or TU is equal to the size of the CU. The prediction unit corresponds to the partition of the CU used for predicting pixel values. It is possible to partition the CU into various different partitions of the PU, as shown in Figure 6, including partitions divided into 4 square PUs and two different partitions divided into 2 rectangular PUs. The transform unit is the basic unit for spatial transformation using DCT. The CU can be based on a quadtree representation divided into 7 partitions of the TU.
[0254] Each slice is embedded in a Network Abstraction Layer (NAL) unit. Additionally, the encoding parameters of the video sequence are stored in dedicated NAL units called parameter sets. In HEVC and H.264 / AVC, two types of parameter set NAL units are used: First, the Sequence Parameter Set (SPS) NAL unit, which collects all parameters that remain unchanged throughout the entire video sequence. Typically, it handles the encoding profile, video frame size, and other parameters. Second, the Picture Parameter Set (PPS) NAL unit, which includes parameters that can be changed from one picture (or frame) in the sequence to other pictures (or frames). HEVC also includes the Video Parameter Set (VPS) NAL unit, which contains parameters describing the overall structure of the bitstream. VPS is a new type of parameter set defined in HEVC and is applied to all layers of the bitstream. A layer can contain multiple temporal sublayers, and all version 1 bitstreams are confined to a single layer. HEVC has certain layer extensions for scalability and multiple views, and these extensions will allow multiple layers with a backward-compatible version 1 base layer.
[0255] Figure 2 and Figure 18 An example of a data communication system that can implement one or more embodiments of the present invention is illustrated. The data communication system includes a transmission device 191 (e.g., server 201) operable to transmit data packets of a data stream (e.g., bit stream 101) to a receiving device 195 (e.g., client terminal 202) via a data communication network 200. The data communication network 200 can be a wide area network (WAN) or a local area network (LAN). Such a network can be, for example, a wireless network (Wifi / 802.11a or b or g), an Ethernet network, an Internet network, or a hybrid network consisting of several different networks. In a particular embodiment of the invention, the data communication system can be a digital television broadcasting system, wherein server 201 sends the same data content to multiple clients.
[0256] The data stream 204 (or bitstream 101) provided by server 201 may consist of multimedia data representing video (e.g., image sequence 151) and audio data. In some embodiments of the invention, the audio and video data streams may be captured by server 201 using a microphone and a camera, respectively. In some embodiments, the data streams may be stored on server 201 or received by server 201 from other data providers, or generated at server 201. Server 201 is provided with an encoder 150 for encoding the video and audio streams, particularly for providing a compressed bitstream 101 for transmission, which is a more compact representation of the data presented as input to the encoder.
[0257] To achieve a better ratio of data quality to data volume, video data can be compressed, for example, according to HEVC, H.264 / AVC, or VVC formats.
[0258] Client 202 receives the transmitted bit stream 101, and its decoder 100 decodes the reconstructed bit stream to reproduce video images (e.g., video signal 109) on a display device and reproduce audio data using a speaker.
[0259] Despite Figure 2 and Figure 18 The examples consider streaming scenarios, but it will be appreciated that in some embodiments of the invention, media storage devices such as optical discs can be used for data communication between the encoder and decoder.
[0260] In one or more embodiments of the invention, the video image is transmitted together with data representing compensation offsets of the reconstructed pixels to be applied to the image, so as to provide filtered pixels in the final image.
[0261] Figure 3 A processing device 300 configured to implement at least one embodiment of the present invention is illustrated schematically. The processing device 300 may be a device such as a microcomputer, workstation, or lightweight portable device. The device 300 includes a communication bus 313 connected to:
[0262] - This refers to the central processing unit 311 of the CPU, such as a microprocessor;
[0263] - Read-only memory 307, denoted as ROM, is used to store computer programs that implement the present invention;
[0264] - A random access memory 312, represented as RAM, for storing executable code of the methods according to embodiments of the present invention, and registers suitable for recording variables and parameters required for implementing the methods for encoding digital image sequences and / or decoding bit streams according to embodiments of the present invention; and
[0265] - A communication interface 302 connected to the communication network 303, through which digital data to be processed is transmitted or received.
[0266] Optionally, the device 300 may also include the following components:
[0267] - A data storage component 304, such as a hard disk, is used to store a computer program for implementing one or more embodiments of the present invention, as well as data used or generated during the implementation of one or more embodiments of the present invention;
[0268] - A disk drive 305 for disk 306, the disk drive being adapted to read data from disk 306 or write data to said disk; and
[0269] - Screen 309, which is used to display data and / or serve as a graphical interface for user interaction by means of keyboard 310 or any other indicating device.
[0270] Device 300 can be connected to various peripheral devices such as digital camera 320 or microphone 308, each of which is connected to an input / output card (not shown) to provide multimedia data to device 300.
[0271] The communication bus 313 provides communication and interoperability between various elements included in or connected to the device 300. The representation of the bus is not limiting, and in particular, the central processing unit is operable to communicate instructions directly or by means of other elements of the device 300 to any element of the device 300.
[0272] Disk 306 may be replaced by any information medium such as a rewritable or non-rewritable compact disc (CD-ROM), ZIP disc, or memory card, and generally by an information storage component that can be read by a microcomputer or microprocessor. Disk 306 may be integrated into or not integrated into the device, may be portable, and is adapted to store one or more programs that execute to enable the implementation of the method for encoding digital image sequences and / or the method for decoding bit streams according to the present invention.
[0273] Executable code can be stored in read-only memory 307, hard disk 304, or removable digital media (such as, for example, disk 306 as described above). According to a variation, the executable code of the program can be received via interface 302 through communication network 303 to be stored in one of the storage components of device 300 (such as hard disk 304) before execution.
[0274] The central processing unit 311 is adapted to control and direct the execution of instructions or software code portions of one or more programs according to the invention, and instructions stored in one of the aforementioned storage components. Upon power-up, one or more programs stored in non-volatile memory (e.g., on hard disk 304 or in read-only memory 307) are transferred to random access memory 312 (which then contains the executable code of one or more programs) and registers for storing variables and parameters necessary for implementing the invention.
[0275] In this embodiment, the device is a programmable device that implements the invention using software. However, alternatively, the invention can be implemented in hardware (e.g., in the form of an application-specific integrated circuit or ASIC).
[0276] Figure 4 A block diagram illustrating an encoder according to at least one embodiment of the present invention is shown. The encoder is represented by connected modules, each module being adapted to implement, for example, in the form of programming instructions executed by the CPU 311 of the device 300, at least one corresponding step of a method for encoding images in an image sequence according to one or more embodiments of the present invention.
[0277] Encoder 400 receives the raw sequence 401 of digital images i0 to in as input. Each digital image is represented by a set of samples (sometimes also called pixels) (hereinafter referred to as pixels).
[0278] After performing the encoding process, the encoder 400 outputs a bit stream 410. The bit stream 410 includes multiple encoding units or slices, each slice including a slice header for transmitting the encoded values of the encoding parameters used for slice encoding, and a slice body including the encoded video data.
[0279] Module 402 divides the input digital image i0 to in 401 into pixel blocks. These blocks correspond to image portions and can have variable sizes (e.g., 4×4, 8×8, 16×16, 32×32, 64×64, 128×128 pixels, and several rectangular block sizes can also be considered). An encoding mode is selected for each input block. Two families of encoding modes are provided: a spatial prediction-based encoding mode (intra-frame prediction) and a temporal prediction-based encoding mode (inter-frame coding, merging, skipping). Possible encoding modes were tested.
[0280] Module 403 implements intra-frame prediction processing, wherein the block to be encoded is predicted by means of predictors calculated based on the neighboring pixels of the given block. If intra-frame coding is selected, the selected intra-frame predictors and an indication of the difference between the given block and its predictors are encoded to provide residuals.
[0281] Temporal prediction is implemented by motion estimation module 404 and motion compensation module 405. First, a reference image is selected from reference image set 416, and motion estimation module 404 selects a portion of the reference image (also referred to as a reference region or image portion) that is closest to the given block to be encoded (closest in terms of pixel value similarity). Then, motion compensation module 405 uses the selected region to predict the block to be encoded. Motion compensation module 405 calculates the difference between the selected reference region and the given block (also referred to as the residual block). The selected reference region is indicated using motion vectors.
[0282] Therefore, in both cases (spatial and temporal predictions), the residuals are calculated by subtracting the predictors from the original blocks.
[0283] In the intra-frame prediction implemented by module 403, the prediction direction is encoded. In the inter-frame prediction implemented by modules 404, 405, 416, 418, and 417, at least one motion vector or data used to identify such a motion vector is encoded for the time prediction.
[0284] If inter-frame prediction is selected, information related to motion vectors and residual blocks is encoded. To further reduce the bit rate, assuming the motion is homogeneous, the motion vectors are encoded by the difference relative to the motion vector predictors. Motion vector predictors are obtained from the set of motion information predictor candidates by the motion vector prediction and encoding module 417 from the motion vector field 418.
[0285] The encoder 400 also includes a selection module 406, which selects the encoding mode by applying an encoding cost criterion (such as a rate-distortion criterion). To further reduce redundancy, a transform module 407 applies a transform (such as DCT) to the residual block, and then the obtained transform data is quantized by a quantization module 408 and entropy encoded by an entropy encoding module 409. Finally, the encoded residual block of the current block being encoded is inserted into the bit stream 410.
[0286] Encoder 400 also decodes the encoded image to produce a reference image (e.g., a reference image in reference image / picture 416) for motion estimation of subsequent images. This allows the encoder and decoder receiving the bitstream to have the same reference frame (using the reconstructed image or a portion of the image). Inverse quantization (“dequantization”) module 411 performs inverse quantization (“dequantization”) of the quantized data, followed by an inverse transform by transform module 412. Intra-frame prediction module 413 uses the prediction information to determine which predictor to use for a given block, and motion compensation module 414 actually adds the residual obtained by module 412 to the reference region obtained from reference image set 416.
[0287] Then, module 415 applies post-filtering to filter the reconstructed pixel frames (image or image portion). In embodiments of the invention, a SAO loop filter is used, wherein a compensation offset is added to the pixel values of the reconstructed pixels in the reconstructed image. It should be understood that post-filtering is not always necessary. Furthermore, any other type of post-filtering may be performed in addition to or instead of SAO loop filtering.
[0288] Figure 5 A block diagram of a decoder 60 according to an embodiment of the present invention is shown. The decoder 60 can be used to receive data from an encoder. The decoder is represented by connected modules, each module being adapted to implement corresponding steps of the method implemented by the decoder 60, for example, in the form of programming instructions to be executed by the CPU 311 of the device 300.
[0289] Decoder 60 receives bitstream 61 including coding units (e.g., data corresponding to blocks or coding units), each coding unit consisting of a header containing information related to encoded parameters and a body containing encoded video data. (See also: Regarding...) Figure 4 As described, for a given block, entropy encoding is performed on the encoded video data at a predetermined number of bits, and the index of the motion vector predictor is also encoded. The received encoded video data is entropy decoded by module 62. The residual data is then dequantized by module 63, and subsequently, an inverse transform is applied by module 64 to obtain the pixel values.
[0290] The pattern data used to indicate the coding mode is also entropy decoded, and based on this pattern, intra-frame type decoding or inter-frame type decoding is performed on the coded blocks (units / sets / groups) of image data.
[0291] In intra-frame mode, the intra-frame inverse prediction module 65 determines the intra-frame predictor based on the intra-frame prediction mode specified in the bitstream.
[0292] If the mode is inter-frame, motion prediction information is extracted from the bitstream to locate (identify) the reference region used by the encoder. The motion prediction information includes the reference frame index and motion vector residuals. Motion vector predictors are added to the motion vector residuals by the motion vector decoding module 70 to obtain the motion vectors.
[0293] Motion vector decoding module 70 applies motion vector decoding to each current block encoded by motion prediction. Once the index of the motion vector predictor for the current block has been obtained, the actual value of the motion vector associated with the current block can be decoded, and this actual value is used to apply motion compensation via module 66. A portion of the reference image indicated by the decoded motion vectors is extracted from the reference image 68 to apply motion compensation 66. The motion vector field data 71 is updated using the decoded motion vectors for subsequent motion vector prediction.
[0294] Finally, the decoded block is obtained. Appropriately, post-filtering is applied by post-filtering module 67. Decoder 60 ultimately provides or obtains the decoded video signal 69.
[0295] CABAC
[0296] HEVC uses several types of entropy coding, such as Context-Based Adaptive Binary Arithmetic Coding (CABAC), Golomb-Rice Code, or a simple binary representation called fixed-length coding. Most of the time, binary coding is performed to represent different syntactic elements. This binary coding process is also very specific and depends on the different syntactic elements. Arithmetic coding represents syntactic elements based on their current probabilities. CABAC is an extension of arithmetic coding, which separates the probabilities of syntactic elements based on a “context” defined by context variables. This corresponds to conditional probabilities. Context variables can be derived from the decoded top left block (e.g., ...). Figure 6b A2 in the middle (as described in more detail below) and the block on the upper left ( Figure 6b The current syntax value of B3 is derived.
[0297] CABAC has been adopted as a specification part of the H.264 / AVC and H.265 / HEVC standards. In H.264 / AVC, it is one of two alternative methods to entropy coding. The other method specified in H.264 / AVC is a low-complexity entropy coding technique based on the use of context-adaptive switching sets of variable-length coding, known as Context-Adaptive Variable-Length Coding (CAVLC). Compared to CABAC, CAVLC offers reduced implementation costs at the cost of lower compression efficiency. For standard or high-definition resolution TV signals, CABAC typically offers a 10%–20% bitrate saving relative to CAVLC while maintaining the same objective video quality. In HEVC, CABAC is one of the entropy coding methods used. Many bits are also coded via bypass CABAC. In addition, some syntax elements are encoded using unary codes or Golomb codes, which are used as other types of entropy coding.
[0298] Figure 17 The main blocks of the CABAC encoder are shown.
[0299] Non-binary input syntax elements are binarized by binarizer 1701. CABAC's encoding strategy is based on the finding that highly efficient encoding of syntax element values (such as components of motion vector differences or transform coefficient levels) in a hybrid block-based video encoder can be achieved by employing a binarization scheme as a preprocessing unit for subsequent stages of context modeling and binary arithmetic coding. Generally, the binarization scheme defines a unique mapping from syntax element values to binary decision sequences (so-called bins), which can also be interpreted in terms of binary coding trees. The binarization scheme in CABAC is designed based on several basic prototypes whose structure enables simple online computation and is suitable for some appropriate model probability distributions.
[0300] The individual bins can be processed in one of two basic ways depending on the setting of switch 1702. When the switch is in the "Normal" setting, the bins are fed to the context modeler 1703 and the normal encoding engine 1704. When the switch is in the "Bypass" setting, the context modeler is bypassed and the bins are fed to the bypass encoding engine 1705. Another switch 1706 has "Normal" and "Bypass" settings similar to those of switch 1702, allowing bins encoded by the applicable encoding engines in encoding engines 1704 and 1705 to form a bitstream as the output of the CABAC encoder.
[0301] It should be understood that another switch 1706 can be used with the storage device to group some bins (e.g., bins used to encode blocks or coding units) encoded by encoding engine 1705 to provide bypass encoded data blocks in the bit stream, and to group some bins (e.g., bins used to encode blocks or coding units) encoded by encoding engine 1704 to provide another "regular" (or arithmetically) encoded data block in the bit stream. This separate grouping of bypass encoded and regular encoded data can result in improved throughput during decoding processing.
[0302] By decomposing individual syntactic element values into a sequence of bins, further processing of each bin value in CABAC depends on an associated encoding mode decision, which can be chosen as either a regular mode or a bypass mode. The latter is chosen for bins relevant to symbolic information or less effective bins (which are assumed to be uniformly distributed and thus allow the entire regular binary arithmetic encoding process to be simply bypassed). In the regular encoding mode, individual bin values are encoded using a regular binary arithmetic encoding engine, where the associated probability model is determined by a fixed choice without any context modeling, or adaptively chosen depending on the relevant context model. As an important design decision, the latter is typically applied only to the most frequently observed bins, while other bins (typically less frequently observed bins) are processed using a joint approach (typically a zero-order probability model). In this way, CABAC enables selective context modeling at the sub-symbol level and thus provides efficient tools for leveraging inter-symbol redundancy with significantly reduced overall modeling or learning costs. For the specific choice of context model, four basic design types are employed in CABAC, with only two of them applied to the encoding at the transform coefficient level. The design of these four prototypes is based on prior knowledge of the typical characteristics of the source data to be modeled and reflects the objective of finding a good trade-off between avoiding unnecessary modeling costs and making full use of statistical dependencies.
[0303] At the lowest level of processing in CABAC, individual bin values enter the binary arithmetic encoder in either a regular or bypass coding mode. For the latter, a fast branching of the coding engine with significantly reduced complexity is used, while for the former, the encoding of a given bin value depends on the actual state of the associated adaptive probability model passed to the M encoder along with the bin value (the item selected in CABAC for the table-based adaptive binary arithmetic coding engine).
[0304] Inter-frame coding
[0305] HEVC uses three different inter-frame modes: inter-frame mode (Advanced Motion Vector Prediction (AMVP)), "classic" merging mode (i.e., "non-affine merging mode" or also called "regular" merging mode), and "classic" merging skip mode (i.e., "non-affine merging skip mode" or also called "regular" merging skip mode). The main difference between these modes is the data signaled in the bitstream. For motion vector coding, the current HEVC standard includes a competitive scheme for motion vector prediction, which was not present in earlier versions of the standard. This means that several candidates compete on the encoder side using a rate-distortion metric to find the best motion vector predictor or best motion information for either inter-frame or merging mode (i.e., "classic / regular" merging mode or "classic / regular" merging skip mode). An index corresponding to the best predictor or best candidate for the motion information is then inserted into the bitstream. The decoder can derive the same set of predictors or candidates from the decoded indexes and use the best predictor or candidate. In HEVC's content extension, a new coding tool called Intra-Block Copy (IBC) signals whether it is one of the three inter-frame modes. The difference between IBC and the equivalent inter-frame mode is determined by checking if the reference frame is the current frame. This can be achieved, for example, by checking the reference index of list L0, and if it is the last frame in that list, then it is inferred to be an intra-block copy. Another approach is to compare the picture order counts of the current frame and the reference frame: if they are equal, it is an intra-block copy.
[0306] The design of predictor and candidate derivations is important in achieving optimal coding efficiency without disproportionately impacting complexity. In HEVC, two motion vector derivations are used: one for inter-frame mode (Advanced Motion Vector Prediction (AMVP)) and one for merge mode (merge derivation processing for classic merge mode and classic merge skip mode). These processes are described below.
[0307] Figure 6a and 6b Examples can be used to generate spatial and temporal blocks of motion vector predictors in Advanced Motion Vector Prediction (AMVP) and merging modes of HEVC encoding and decoding systems, and Figure 7 This illustrates the simplified steps for processing the AMVP prediction subset.
[0308] like Figure 6aAs indicated in the diagram, two spatial predictors (i.e., two spatial motion vectors in AMVP mode) are selected from the motion vectors of the top block (indicated by the letter "B") and the left block (indicated by the letter "A"), which include the top corner block (block B2) and the left corner block (block A0), and one temporal predictor is selected from the motion vectors of the bottom right block (H) and the center block of the collocated block.
[0309] Table 1 below outlines the criteria for referencing, such as... Figure 6a and 6b The diagram shows the naming convention used when referring to blocks relative to the current block. This naming convention is used as a shorthand, but it should be understood that other notation systems may be used, particularly in future versions of the standard.
[0310]
[0311]
[0312] Table 1
[0313] It should be noted that the “current block” can be of variable size, such as 4×4, 16×16, 32×32, 64×64, 128×128, or any size in between. The block size is preferably a factor of 2 (i.e., 2^n × 2^m, where n and m are positive integers), as this results in more efficient use of bits when using binary encoding. The current block does not need to be square, but this is often a preferred embodiment to address encoding complexity.
[0314] Go to Figure 7 The first step aims to find the bottom left blocks A0 and A1 (in... Figure 6a The first spatial predictor is selected (Cand 1, 706) based on the spatial location shown in the example. To do this, these blocks (700, 702) are selected one after another in a given order, and for each selected block, the following condition (704) is evaluated in a given order, and the first block that satisfies the condition is set as the predictor:
[0315] - Motion vectors from the same reference list and the same reference image;
[0316] - Motion vectors from other reference lists and the same reference images;
[0317] - Scaled motion vectors from the same reference list and different reference images; or
[0318] - Scaled motion vectors from other reference lists and different reference images.
[0319] If no value is found, the left-hand predictor is considered unavailable. In this case, it indicates that the relevant blocks were intra-coded or that these blocks do not exist.
[0320] The purpose of the following steps is to place block B0 on the upper right, block B1 above, and block B2 on the upper left (in... Figure 6a In the example spatial location, a second spatial predictor is selected (Cand 2, 716). To do this, these blocks are selected one after another in a given order (708, 710, 712), and for each selected block, the above conditions (714) are evaluated in a given order, and the first block that satisfies the above conditions is set as the predictor.
[0321] Similarly, if no value is found, the top predictor is considered unavailable. In this case, it indicates that the relevant blocks were intra-coded or that these blocks do not exist.
[0322] In the next step (718), if both predictors are available, they are compared to each other to eliminate one of them if they are equal (i.e., same motion vector values, same reference list, same reference index, and same orientation type). If only one spatial predictor is available, the algorithm searches for a temporal predictor in the next step.
[0323] The temporal motion predictor (Cand 3, 726) is derived as follows: In the availability check module 722, the bottom right (H, 720) position of the juxtaposed block in the previous / reference frame is considered first. If it does not exist or if the motion vector predictor is unavailable, the center (Centre, 724) of the juxtaposed block is selected for checking. These temporal positions (Centre and H) are... Figure 6a As depicted in the text. In any case, scale 723 is applied to these candidates to match the time distance between the current frame and the first frame in the reference list.
[0324] The motion predictor values are then added to the predictor set. Next, the number of predictors (Nb_Cand) is compared with the maximum number of predictors (Max_Cand) (728). As mentioned above, in the current version of the HEVC standard, the AMVP export process requires a maximum number (Max_Cand) of predictors (two) to be generated for the motion vector predictors.
[0325] If the maximum number is reached, construct the final list or set of AMVP predictors (732). Otherwise, add zero predictors to the list (730). A zero predictor is a motion vector equal to (0, 0).
[0326] like Figure 7As illustrated, the final list or set of AMVP predictors (732) is constructed from subsets of spatial motion predictor candidates (700 to 712) and subsets of temporal motion predictor candidates (720, 724).
[0327] As mentioned above, the motion prediction sub-candidates in the classic merge mode or classic merge skip mode represent all the necessary motion information: orientation, list, reference frame index, and motion vector. A list of indexed candidates is generated through a merge-derived process. In the current HEVC design, the maximum number of candidates for these two merge modes (i.e., classic merge mode and classic merge skip mode) is equal to five (four spatial candidates and one temporal candidate).
[0328] Figure 8 This is a schematic diagram of motion vector derivation processing for merge modes (classic merge mode and classic merge skip mode). In the first step of the derivation process, five block positions (800 to 808) are considered. These positions are... Figure 6a The spatial locations are depicted using the reference numerals A1, B1, B0, A0, and B2. In the next step, the availability of spatial motion vectors is checked, and up to five motion vectors are selected / obtained for consideration (810). A predictor is considered available if it exists and the block is not intra-coded. Therefore, motion vectors corresponding to the five blocks are selected as candidates according to the following criteria:
[0329] If the "left" A1 motion vector (800) is available (810), that is, if it exists and if the block is not intra-coded, the motion vector of the "left" block is selected and used as the first candidate in the candidate list (814);
[0330] If the "above" B1 motion vector (802) is available (810), then the candidate "above" block motion vector is compared with the "left" A1 motion vector (if it exists) (812). If the B1 motion vector is equal to the A1 motion vector, then B1 is not added to the list of spatial candidates (814). Conversely, if the B1 motion vector is not equal to the A1 motion vector, then B1 is added to the list of spatial candidates (814).
[0331] If the "upper right" B0 motion vector (804) is available (810), then the motion vector of the "upper right" is compared with the B1 motion vector (812). If the B0 motion vector is equal to the B1 motion vector, then the B0 motion vector is not added to the list of spatial candidates (814). Conversely, if the B0 motion vector is not equal to the B1 motion vector, then the B0 motion vector is added to the list of spatial candidates (814).
[0332] If the "bottom left" A0 motion vector (806) is available (810), then the "bottom left" motion vector is compared with the A1 motion vector (812). If the A0 motion vector equals the A1 motion vector, then the A0 motion vector is not added to the list of spatial candidates (814). Conversely, if the A0 motion vector is not equal to the A1 motion vector, then the A0 motion vector is added to the list of spatial candidates (814); and
[0333] If the list of spatial candidates does not contain four candidates, the availability of the "upper left" motion vector B2 (808) is checked (810). If available, it is compared with motion vectors A1 and B1. If motion vector B2 is equal to either motion vector A1 or B1, it is not added to the list of spatial candidates (814). Conversely, if motion vector B2 is not equal to either motion vector A1 or B1, it is added to the list of spatial candidates (814).
[0334] At the end of this phase, the list of space candidates includes up to four candidates.
[0335] For time candidates, two positions can be used: the bottom right position of the juxtaposed block (816, in...). Figure 6a The middle is represented by H) and the center of the juxtaposed block (818). These positions are in Figure 6a Described in the text.
[0336] Regarding Figure 7 The temporal motion predictor of the AMVP motion vector derivation process, described in relation to this, first aims to check the availability of the block at position H (820). Next, if the block is unavailable, the availability of the block at the central position is checked (820). If at least one motion vector at these positions is available, the temporal motion vector can be scaled relative to the reference frame with index 0 for both lists L0 and L1 as needed (822) to create a temporal candidate (824) that is added to the list of merged motion vector predictor candidates. This temporal candidate is placed after the spatial candidate in the list. Lists L0 and L1 are two lists of reference frames containing zero or one or more reference frames.
[0337] If the number of candidates (Nb_Cand) is strictly less than the maximum number of candidates (Max_Cand, which is signaled in the bitstream header and equal to five in the current HEVC design) (826), and if the current frame is type B, then a combined candidate is generated (828). The combined candidate is generated based on the available candidates in the list of merging motion vector prediction sub-candidates. This mainly involves combining (pairing) the motion information of a candidate from list L0 with the motion information of a candidate from list L1.
[0338] If the number of candidates (Nb_Cand) remains strictly less than the maximum number of candidates (Max_Cand) (830), then (832) zero motion candidates are generated until the number of candidates in the merged motion vector prediction sub-candidate list reaches the maximum number of candidates.
[0339] At the end of this process, a list or set of candidates for merging motion vector prediction sub-candidates is constructed (i.e., a list or set of candidates for merging modes (classic merging mode and classic merging skip mode)) (834). Figure 8 As illustrated, the list or set of merging motion vector prediction sub-candidates is constructed from a subset of spatial candidates (800 to 808) and a subset of temporal candidates (816, 818) (834).
[0340] Alternative Time Motion Vector Prediction (ATMVP)
[0341] Alternative Temporal Motion Vector Prediction (ATMVP) is a special type of motion compensation. Instead of considering only the motion information from a single temporal reference frame for the current block, it considers the motion information from each juxtaposed block. Therefore, as... Figure 9 As depicted in the text, this temporal motion vector prediction uses the relevant motion information of each sub-block to give the segmentation of the current block.
[0342] In the current VTM reference software, ATMVP is signaled as a merge candidate inserted into the merge candidate list (i.e., a list and set of candidates for merge modes (classic merge mode and classic merge skip mode)). When ATMVP is enabled at the SPS level, the maximum number of merge candidates increases by one. Therefore, six candidates are considered instead of the five candidates if this ATMVP mode is disabled.
[0343] Additionally, when this prediction is enabled at the SPS level, all bins of the merge index (i.e., the identifier or index used to identify candidates from the list of merge candidates) are context-encoded via CABAC. In HEVC, or when ATMVP is not enabled at the SPS level in JEM, only the first bin is context-encoded, and the remaining bins are context-bypass encoded (i.e., bypass CABAC encoding). Figure 10 (a) Example of encoding for merged indexes used in HEVC or when ATMVP is not enabled at the SPS level in JEM. This corresponds to unary maximum code. Additionally, the first bit is CABAC encoded, and the other bits are bypassed CABAC encoded.
[0344] Figure 10(b) Example of index merging encoding when ATMVP is enabled at the SPS level. Additionally, all bits are CABAC encoded (bits 1 through 5). It should be noted that each bit used to encode the index has its own context—in other words, their probabilities are separate.
[0345] Affine mode
[0346] In HEVC, only the translational motion model is applied to motion compensation prediction (MCP). However, in the real world, there are many types of motion, such as zooming in / out, rotation, perspective motion, and other irregular motions.
[0347] In JEM, a simplified affine transformation motion compensation prediction is applied, and the general principles of affine modes are described below based on an excerpt from document JVET-G1001, presented at the JVET conference in Torino, July 13-21, 2017. This document, in its entirety, is incorporated herein by reference within the scope of its description of other algorithms used in JEM.
[0348] like Figure 11 As shown in (a), the affine motion field of the block is described by two control point motion vectors.
[0349] The motion vector field (MVF) of the block is described by the following equation:
[0350]
[0351] Where (v 0x ,v 0y ) is the motion vector of the control point at the top left corner, and (v 1x ,v 1y ) is the motion vector of the control point at the top right corner. And w is the width of block Cur (the current block).
[0352] To further simplify motion compensation prediction, a sub-block-based affine transformation prediction was applied. The sub-block size M×N was derived as shown in Equation 2, where MvPre is the fractional precision of the motion vector (1 / 16 in JEM), (v... 2x ,v 2y ) is the motion vector of the top left control point calculated according to Equation 1.
[0353]
[0354] After deriving from Equation 2, M and N can be adjusted downwards if necessary so that they are the divisors of w and h, respectively. h is the height of the current block Cur (current block).
[0355] To derive the motion vectors of each M×N sub-block, calculate according to Equation 1 as follows: Figure 6aThe motion vectors of the central samples of each sub-block shown are extracted and rounded to 1 / 16 fractional precision. A motion-compensated interpolation filter is then applied to generate predictions for each sub-block with the derived motion vectors.
[0356] Affine mode is a motion compensation mode similar to inter-frame modes (AMVP, "classic" merge, "classic" merge skip). Its principle is to generate motion information for each pixel based on two or three adjacent motion information. In JEM, such as... Figure 11 As depicted in (a) / (b), the affine pattern is derived for each 4×4 block (each square is a 4×4 block, and...). Figure 11 The entire block in (a) / (b) is a 16×16 block, which is divided into 16 such 4×4 square blocks—each 4×4 square block has an associated motion vector. It should be understood that, in embodiments of the invention, the affine mode can derive motion information for blocks of different sizes or shapes, as long as that motion information can be derived. The affine mode is enabled by utilizing a flag, which is available for AMVP mode and merge modes (i.e., classic merge mode (also known as "non-affine merge mode") and classic merge skip mode (also known as "non-affine merge skip mode")). This flag is CABAC encoded. In embodiments, the context depends on the left block ( Figure 6b Position A2) and the block on the upper left ( Figure 6b The sum of affine symbols at position B3).
[0357] Therefore, for affine symbols, three context variables (0, 1, or 2) can be given in JEM by the following formula:
[0358] Ctx=IsAffine(A2)+IsAffine(B3)
[0359] The IsAffine(block) function returns 0 if the block is not an affine block and 1 if the block is an affine block.
[0360] Affine Merging Candidate Derivation
[0361] In JEM, the affine merge pattern (or affine merge skip pattern) derives motion information for the current block from the first affine neighboring block (i.e., the first neighboring block encoded using the affine pattern) at positions A1, B1, B0, A0, B2. These positions are in... Figure 6a and 6bThe description is as follows. However, how to derive the affine parameters is not fully defined, and the object of the present invention is to improve this aspect, for example, by defining the affine parameters of the affine merging pattern so that a wider selection of affine merging candidates can be achieved (i.e., by using identifiers (such as indexes) that not only the first adjacent block of the affine but also at least one other candidate can be used for the selection).
[0362] For example, according to some embodiments of the invention, an affine merging pattern having its own list of affine merging candidates (candidates for deriving / obtaining motion information of the affine pattern) and an affine merging index (for identifying an affine merging candidate from the list of affine merging candidates) is used to encode or decode a block.
[0363] Use signals to notify affine merging
[0364] Figure 12 This is a flowchart of partial decoding of some syntactic elements related to the encoding pattern used to signal the use of the affine merging pattern. In this diagram, the skip flag (1201), prediction pattern (1211), merging flag (1203), merging index (1208), and affine flag (1206) can be decoded.
[0365] For all CUs in the inter-frame slice, decode the skip flag (1201). If the CU is not skipped (1202), decode the pred mode (predictive mode) (1211). This syntax element indicates whether the current CU is encoded in inter-frame or intra-frame mode (to be decoded in inter-frame or intra-frame mode). Note that if the CU is skipped (1202), its current mode is inter-frame mode. If the CU is not skipped (1202: no), the CU is encoded in AMVP or merge mode. If the CU is in inter-frame mode (1212), decode the merge flag (1203). If the CU is merged (1204) or if the CU is skipped (1202: yes), verify / check (1205) whether the affine flag (1206) needs to be decoded, i.e., determine in (1205) whether the current CU is already encoded in affine mode. If the current CU is a 2N×2N CU, decode the flag, which means that the height and width of the CU should be equal in the current VVC. Furthermore, at least one adjacent CU A1 or B1 or B0 or A0 or B2 must be encoded using an affine mode (affine merge mode or AMVP mode with affine mode enabled). Ultimately, the current CU should not be a 4×4 CU, but CU 4×4 is disabled by default in the VTM reference software. If condition (1205) is false, it is determined that the current CU is encoded in the classic merge mode (or classic merge skip mode) as specified in HEVC, and the merge index is decoded (1208). If the affine flag (1206) is set to equal to 1 (1207), the CU is a merged affine CU (i.e., a CU encoded in affine merge mode) or a merge skip affine CU (i.e., a CU encoded in affine merge skip mode), and the merge index (1208) does not need to be decoded (because affine merge mode is used, i.e., the CU will be decoded using the first adjacent block that is affine with affine mode). Otherwise, the current CU is a classic (basic) merge or merge skip CU (i.e., a CU encoded in classic merge or merge skip mode), and the merge index candidate is decoded (1208).
[0366] In this specification, “notifying with signals” can refer to inserting (providing / including) or extracting / obtaining from a bit stream one or more syntactic elements that represent enabled or disabled modes or other information into (providing / including) the bit stream.
[0367] Merge candidate exports
[0368] Figure 13 This is a flowchart illustrating the export of merge candidates (i.e., candidates for classic merge mode or classic merge skip mode). The export is... Figure 8This is built upon the motion vector derivation processing of the merging mode (i.e., the merging candidate list derivation of HEVC). The main changes compared to HEVC are the addition of ATMVP candidates (1319, 1321, 1323), a full redundancy check of candidates (1325), and a new order of candidates. ATMVP predictions are set as specific candidates because they represent some motion information of the current CU. The value of the first sub-block (top left) is compared with the temporal candidates, and if they are equal, the temporal candidate is not added to the merging list (1320). ATMVP candidates are not compared with other spatial candidates. Instead, temporal candidates are compared with the individual spatial candidates already in the list (1325), and if they are redundant candidates, they are not added to the merging candidate list.
[0369] When a space candidate is added to the list, it is compared with other space candidates in the list (1312), which is not the case in the final version of HEVC.
[0370] In the current VTM version, the list of merge candidates is set in the following order because it has been determined to provide the best results under coding test conditions:
[0371] ·A1
[0372] ·B1
[0373] ·B0
[0374] ·A0
[0375] ·ATMVP
[0376] ·B2
[0377] ·time
[0378] ·combination
[0379] ·Zero_MV
[0380] It is important to note that the spatial candidate B2 is set after the ATMVP candidate.
[0381] Additionally, when ATMVP is enabled at the slice level, the maximum number of candidates in the candidate list is 6 instead of 5 for HEVC.
[0382] Now refer to Figures 12 to 16 as well as Figures 22 to 24 Exemplary embodiments of the present invention are described below. It should be noted that, unless explicitly stated otherwise, embodiments may be combined; for example, certain combinations of embodiments may improve coding efficiency while increasing complexity, but this may be acceptable in certain use cases.
[0383] First Embodiment
[0384] As described above, in the current VTM reference software, ATMVP is signaled as a merge candidate inserted into the merge candidate list. ATMVP can be enabled or disabled for the entire sequence (at the SPS level). When ATMVP is disabled, the maximum number of merge candidates is 5. When ATMVP is enabled, the maximum number of merge candidates increases by 1 from 5 to 6.
[0385] In the encoder, use Figure 13 The method generates a list of merge candidates. For example, a merge candidate is selected from the list based on a rate-distortion criterion. The selected merge candidate is signaled to the decoder using a syntactic element called the merge index in the bitstream.
[0386] In the current VTM reference software, the way merged indexes are encoded differs depending on whether ATMVP is enabled or disabled.
[0387] Figure 10 (a) Example of encoding the merged index when ATMVP is not enabled at the SPS level. The five merge candidates, Cand0, Cand1, Cand2, Cand3, and Cand4, are encoded as 0, 10, 110, 1110, and 1111, respectively. This corresponds to unary maximum encoding. Additionally, the first bit is encoded using a single context via CABAC, and the other bits are side-coded.
[0388] Figure 10 (b) Example of encoding the merge index when ATMVP is enabled. The six merge candidates, Cand 0, Cand 1, Cand 2, Cand 3, Cand 4, and Cand 5, are encoded as 0, 10, 110, 1110, 11110, and 11111, respectively. In this case, all bits of the merge index (from bit 1 to bit 5) are context-encoded using CABAC. Each bit has its own context, and there are separate probability models for different bits.
[0389] In the first embodiment of the present invention, as Figure 14As shown, when ATMVP is included in the merge candidate list as a merge candidate (e.g., when ATMVP is enabled at the SPS level), the encoding of the merge index is modified so that only the first bit of the merge index is encoded via CABAC using a single context. When ATMVP is not enabled at the SPS level, the context is set up in the same way as in the current VTM reference software. The other bits (bits 2 through 5) are bypassed. When ATMVP is not included in the merge candidate list as a merge candidate (e.g., when ATMVP is disabled at the SPS level), there are 5 merge candidates. Only the first bit of the merge index is encoded via CABAC using a single context. When ATMVP is not enabled at the SPS level, the context is set up in the same way as in the current VTM reference software. The other bits (bits 2 through 4) are bypassed.
[0390] The decoder generates the same list of merge candidates as the encoder. This can be achieved by using... Figure 13 This is accomplished using a method where, when ATMVP is not included in the merge candidate list as a merge candidate (e.g., when ATMVP is disabled at the SPS level), there are 5 merge candidates. Only the first bit of the merge index is decoded via CABAC using a single context. The other bits (bits 2 through 4) are bypassed. Compared to the current reference software, when ATMVP is included in the merge candidate list as a merge candidate (e.g., when ATMVP is enabled at the SPS level), only the first bit of the merge index is decoded via CABAC using a single context during merge candidate decoding. The other bits (bits 2 through 5) are bypassed. The decoded merge index is used to identify the merge candidate selected by the encoder from the merge candidate list.
[0391] Compared to the VTM 2.0 reference software, the advantage of this embodiment is that it reduces the complexity of merge index decoding and decoder design (as well as encoder design) without affecting encoding efficiency. In fact, using this embodiment, only one CABAC state is needed for the merge index, instead of the five required for encoding / decoding the current VTM merge index. Furthermore, the complexity of the worst-case scenario is reduced because other bits are bypassed by CABAC encoding, which reduces the number of operations compared to encoding all bits using CABAC.
[0392] Second Embodiment
[0393] In the second embodiment, all bits of the merge index are CABAC encoded, but they all share the same context. A single context can exist as in the first embodiment, in which case this single context is shared between the bits. Therefore, when ATMVP is included as a merge candidate in the merge candidate list (e.g., when ATMVP is enabled at the SPS level), only one context is used compared to 5 in the VTM2.0 reference software. The advantage of this embodiment compared to the VTM2.0 reference software is that it reduces the complexity of merge index decoding and decoder design (and encoder design) without affecting encoding efficiency.
[0394] Alternatively, as described below in conjunction with embodiments three through fifteen, context variables can be shared between bits, making two or more contexts available, but the current context is shared by bits.
[0395] When ATMVP is disabled, the same context is still used for all bits.
[0396] This embodiment and all subsequent embodiments can be applied even if ATMVP is not available or is disabled.
[0397] In a variant of the second embodiment, any two or more bits of the merge index are CABAC encoded and share the same context. The remaining bits of the merge index are bypassed. For example, the first N bits of the merge index can be CABAC encoded, where N is two or more.
[0398] Third Embodiment
[0399] In the first embodiment, the first bit of the merged index is CABAC encoded using a single context.
[0400] In the third embodiment, the context variable for the merge index bit depends on the value of the merge index of the adjacent block. This allows for more than one context for the target bit, where each context corresponds to a different value of the context variable.
[0401] Adjacent blocks can be any already decoded block, such that their merge index is available to the decoder when the current block is decoded. For example, adjacent blocks could be... Figure 6b Any one of the blocks A0, A1, A2, B0, B1, B2, and B3 shown in the figure.
[0402] In the first variant, only the context variable is used to CABAC encode the first bit.
[0403] In the second variant, the first N bits of the merge index (where N is two or more) are CABAC encoded, and context variables are shared among these N bits.
[0404] In the third variant, any N bits of the merge index (where N is two or more) are CABAC encoded, and context variables are shared among these N bits.
[0405] In the fourth variant, the first N bits of the merge index (where N is two or more) are CABAC encoded, and N context variables are used for these N bits. Assuming the context variables have K values, then K×N CABAC states are used. For example, in this embodiment, for an adjacent block, the context variables can conveniently have two values, such as 0 and 1. In other words, 2N CABAC states are used.
[0406] In the fifth variant, any N bits of the merge index (where N is two or more) are adaptively PM encoded, and N context variables are used for these N bits.
[0407] The same variations apply to the fourth through sixteenth embodiments described below.
[0408] Fourth embodiment
[0409] In the fourth embodiment, the context variable for the bits used for the merge index depends on the corresponding values of the merge indexes of two or more adjacent blocks. For example, the first adjacent block could be the left block A0, A1, or A2, and the second adjacent block could be the upper block B0, B1, B2, or B3. There are no particular limitations on how two or more merge index values are combined. Examples are given below.
[0410] The context variable can conveniently have three distinct values, such as 0, 1, and 2, in which case there are two adjacent blocks. If the fourth variant described in conjunction with the third embodiment is applied to this embodiment with three distinct values, then K is 3 instead of 2. In other words, 3N CABAC states are used.
[0411] Fifth embodiment
[0412] In the fifth embodiment, the context variable for the bit used for the merge index depends on the corresponding value of the merge index of adjacent blocks A2 and B3.
[0413] Sixth Embodiment
[0414] In the sixth embodiment, the context variable for the merge index depends on the corresponding values of the merge indices of adjacent blocks A1 and B1. This variation has the advantage of alignment with merge candidate derivations. As a result, reduced memory access can be achieved in some decoder and encoder implementations.
[0415] Seventh Embodiment
[0416] In the seventh embodiment, the context variable with bit position idx_num in the merge index for the current block is obtained according to the following formula:
[0417] ctxIdx=(Merge_index_left==idx_num)+(Merge_index_up==idx_num)
[0418] Here, Merge_index_left is the merge index of the left block, Merge_index_up is the merge index of the upper block, and the symbol == is the equality symbol.
[0419] For example, when there are 6 merge candidates, 0 <= idx_num <= 5.
[0420] The left block can be block A1, and the top block can be block B1 (as in the sixth embodiment). Alternatively, the left block can be block A2, and the top block can be block B3 (as in the fifth embodiment).
[0421] If the merge index of the left block is equal to idx_num, then the formula (Merge_index_left == idx_num) equals 1. The following table shows the result of the formula (Merge_index_left == idx_num):
[0422]
[0423] Of course, the table for the formula (Merge_index_up == idx_num) is the same.
[0424] The table below shows the unary maximum code and the relative bit positions of each bit for each merged index value. This table is related to... Figure 10 (b) Correspondingly.
[0425]
[0426] If the left block is not a merged block or an affine merged block (i.e., encoded using an affine merge pattern), then the left block is considered unusable. The same condition applies to the upper block.
[0427] For example, when only the first bit is CABAC encoded, the context variable ctxIdx is set to:
[0428] If no left and top / above blocks have a merge index, or if the left block merge index is not the first index (i.e., not 0) and the top block merge index is not the first index (i.e., not 0), then it equals 0;
[0429] If one of the blocks on the left and the block above has a merge index equal to the first index while the other block does not, then the value is 1; and
[0430] If for both the left and top blocks, the merge index equals the first index, then it equals 2.
[0431] More generally, for the target bit at position idx_num encoded by CABAC, the context variable ctxIdx is set to:
[0432] If no left block and the top / above block have a merge index, or if the left block merge index is not the i-th index (where i = idx_num) and the top block merge index is not the i-th index, then the value is 0;
[0433] If one of the blocks on the left and the block above has a merge index equal to the i-th index while the other block does not, then the value is 1; and
[0434] If for both the left and top blocks, the merge index equals the i-th index, then it equals 2. Here, the i-th index represents the first index when i=0, the second index when i=1, and so on.
[0435] Eighth embodiment
[0436] In the eighth embodiment, the context variable with bit position idx_num in the merge index for the current block is obtained according to the following formula:
[0437] Ctx = (Merge_index_left > idx_num) + (Merge_index_up > idx_num), where Merge_index_left is the merge index of the left block, Merge_index_up is the merge index of the upper block, and the symbol > means "greater than".
[0438] For example, when there are 6 merge candidates, 0 <= idx_num <= 5.
[0439] The left block can be block A1, and the top block can be block B1 (as in the fifth embodiment). Alternatively, the left block can be block A2, and the top block can be block B3 (as in the sixth embodiment).
[0440] If the merge index of the left block is greater than idx_num, then the formula (Merge_index_left>idx_num) equals 1. If the left block is not a merged block or an affine merged block (i.e., encoded using an affine merge pattern), then the left block is considered unusable. The same conditions apply to the upper block.
[0441] The following table shows the results of the formula (Merge_index_left>idx_num):
[0442]
[0443]
[0444] For example, when only the first bit is CABAC encoded, the context variable ctxIdx is set to:
[0445] If no left and top / above blocks have a merge index, or if the left block merge index is less than or equal to the first index (i.e., not 0) and the top block merge index is less than or equal to the first index (i.e., not 0), then it equals 0.
[0446] If one of the blocks on the left and the block above has a merge index greater than the first index while the other block does not, then the value is equal to 1; and
[0447] If for both the left and top blocks, the merge index is greater than the first index, then it equals 2.
[0448] More generally, for the target bit at position idx_num encoded by CABAC, the context variable ctxIdx is set to:
[0449] If no left and top / above blocks have a merge index, or if the left block merge index is less than the i-th index (where i = idx_num) and the top block merge index is less than or equal to the i-th index, then the value is 0;
[0450] If one of the blocks on the left and the block above has a merge index greater than the i-th index while the other block does not, then the value is equal to 1; and
[0451] If for both the left and top blocks, the merge index is greater than the i-th index, then it equals 2.
[0452] Compared to the seventh embodiment, the eighth embodiment provides a further increase in coding efficiency.
[0453] Ninth Embodiment
[0454] In the fourth to eighth embodiments, the context variable of the bit used for the merge index of the current block depends on the corresponding value of the merge index of two or more adjacent blocks.
[0455] In the ninth embodiment, the context variable for the bit used as the merge index of the current block depends on the corresponding merge flags of two or more adjacent blocks. For example, the first adjacent block may be the left block A0, A1, or A2, and the second adjacent block may be the upper block B0, B1, B2, or B3.
[0456] When encoding blocks using the merge mode, the merge flag is set to 1, and when using other modes such as skip mode or affine merge mode, the merge flag is set to 0. Note that in VMT 2.0, affine merge is a different mode from the basic or "classic" merge mode. A dedicated affine flag can be used to signal the affine merge mode. Alternatively, the list of merge candidates can include affine merge candidates, in which case the affine merge mode can be selected and signaled using a merge index.
[0457] Then set the context variable to:
[0458] If neither the left-adjacent block nor the top-adjacent block has its merge flag set to 1, then it is 0;
[0459] If the merge flag of one of the left-adjacent and top-adjacent blocks is set to 1 while the other block is not, then the merge flag is 1; and
[0460] If both the left-adjacent block and the top-adjacent block set their merge flag to 1, then it is 2.
[0461] Compared to VTM2.0, this simple approach achieves improved coding efficiency. Another advantage compared to the seventh and eighth embodiments is the lower complexity, as only the merge flags of adjacent blocks need to be checked instead of the merge index.
[0462] In the variant, the context variable for the bit used as the merge index for the current block depends on the merge flag of the individual adjacent blocks.
[0463] Tenth Embodiment
[0464] In the third to ninth embodiments, the context variable of the bit used for the merge index of the current block depends on the merge index value or merge flag of one or more adjacent blocks.
[0465] In the tenth embodiment, the context variable for the merge index of the current block depends on the value of the skip flag of the current block (current coding unit or CU). The skip flag is equal to 1 when the current block uses merge skip mode, and equal to 0 otherwise.
[0466] The skip flag is the first example of other variables or syntactic elements that have been decoded or parsed in the current block. These other variables or syntactic elements are preferably indicators of the complexity of the motion information in the current block. Since the occurrence of a merge index value depends on the complexity of the motion information, variables or syntactic elements such as the skip flag are typically associated with merge index values.
[0467] More specifically, merge skip modes are typically chosen for static scenes or scenes involving constant motion. As a result, the merge index value for merge skip modes is generally lower than that for classic merge modes used to encode inter-frame predictions containing block residuals. This typically occurs for more complex motions. However, the choice between these modes is often also related to quantization and / or RD criteria.
[0468] Compared to VTM2.0, this simple measure provides an increase in coding efficiency. It is also very easy to implement because it does not involve adjacent blocks or checking merge index values.
[0469] In the first variant, the context variable for the bit used as the merge index for the current block is simply set to be equal to the skip flag for the current block. This bit may be only the first bit. The other bits are bypassed as in the first embodiment.
[0470] In the second variant, all bits of the merge index are CABAC encoded, and each has its own context variable depending on the merge flag. This requires 10 probabilistic states when there are 5 CABAC encoded bits in the merge index (corresponding to 6 merge candidates).
[0471] In the third variation, to limit the number of states, only the N bits of the merge index are CABAC encoded, where N is two or more, such as the first N bits. This requires 2N states. For example, when the first 2 bits are CABAC encoded, 4 states are needed.
[0472] Typically, instead of the skip flag, any other variable or syntactic element that has already been decoded or parsed for the current block and is an indicator of the complexity of motion information in the current block can be used.
[0473] Eleventh Embodiment
[0474] The eleventh embodiment relates to, as previously referenced Figure 11 The affine merging signal notifications described in (a), 11(b) and 12.
[0475] In the eleventh embodiment, the context variable for the CABAC-coded bit of the merge index for the current block (current CU) depends on the affine merge candidate in the merge candidate list (if any). This bit can be simply the first bit of the merge index, or the first N bits, where N is two or more, or any N bits. The other bits are bypassed.
[0476] Affine prediction is designed to compensate for complex motions. Therefore, for complex motions, the merge index typically has a higher value compared to less complex motions. The merge index of the current CU may have a small value if the first affine merge candidate is lower in the list, or if there are no affine merge candidates at all.
[0477] Therefore, in effect, the context variable depends on the presence and / or position of at least one affine merge candidate in the list.
[0478] For example, context variables can be set to:
[0479] If A1 is an affine, then it equals 1.
[0480] If B1 is an affine, then it equals 2.
[0481] If B0 is an affine, then it equals 3.
[0482] If A0 is an affine, then it equals 4.
[0483] If B2 is an affine, then it equals 5.
[0484] If no adjacent blocks are affine, then the value is 0.
[0485] When the merge index of the current block is decoded or parsed, the affine flags of the merge candidates at these locations have already been checked. As a result, no further memory accesses are required to derive the context for the merge index of the current block.
[0486] Compared to VTM2.0, this embodiment offers increased coding efficiency. No additional memory access is required because step 1205 already involves checking the affine patterns of adjacent CUs.
[0487] In the first variant, to limit the number of states, the context variable can be set to:
[0488] If no adjacent blocks are affine, or if A1 or B1 is affine, then the value is 0.
[0489] If B0, A0, or B2 is an affine, then it equals 1.
[0490] In the second variant, to limit the number of states, the context variable can be set to:
[0491] If no adjacent blocks are affine, then the value is 0.
[0492] If A1 or B1 is an affine, then it equals 1.
[0493] If B0, A0, or B2 are affine, then it equals 2.
[0494] In the third variant, the context variable can be set as:
[0495] If A1 is an affine, then it equals 1.
[0496] If B1 is an affine, then it equals 2.
[0497] If B0 is an affine, then it equals 3.
[0498] If A0 or B2 is an affine, then it equals 4.
[0499] If no adjacent blocks are affine, then the value is 0.
[0500] Note that these positions are already checked when the merge index is decoded or parsed, because affine flag decoding depends on these positions. As a result, no additional memory access is required to derive the merge index context encoded after the affine flags.
[0501] Twelfth Embodiment
[0502] In the twelfth embodiment, the affine pattern is signaled to include the insertion of the affine pattern as a candidate motion predictor.
[0503] In one example of the twelfth embodiment, affine merge (and affine merge skip) are signaled as merge candidates (i.e., as one of the merge candidates used in conjunction with the classic merge mode or the classic merge skip mode). In this case, Figure 12 Modules 1205, 1206, and 1207 were removed. Additionally, to avoid impacting the coding efficiency of the merge mode, the maximum possible number of merge candidates was incremented. For example, in the current VTM version, this value was set to 6; therefore, if this embodiment were applied to the current version of VTM, the value would be 7.
[0504] The advantage is that the design of syntactic elements for the merged pattern is simplified because fewer syntactic elements need to be decoded. In some cases, improvements / changes in encoding efficiency can be observed.
[0505] The two possible ways to implement this example will now be described:
[0506] Regardless of the values of other merge MVs, the merge index of affine merge candidates always has the same position in the list. The position of a candidate motion predictor indicates its likelihood of being selected, and therefore, a motion vector predictor is more likely to be selected if it is placed higher in the list (lower index value).
[0507] In the first example, the merge index of the affine merge candidate always has the same position within the merge candidate list. This means it has a fixed "merge idx" value. For example, this value could be set to 5 because the affine merge pattern should represent complex motions that are not the most likely content. An additional advantage of this embodiment is that the current block can be set as an affine block when parsing the current block (only decoding / reading syntactic elements without decoding the data itself). As a result, this value can be used to determine the CABAC context for the affine flag used in AMVP. Therefore, the conditional probability should be improved for this affine flag, and the encoding efficiency should be better.
[0508] In the second example, the affine merge candidate is exported along with other merge candidates. In this example, the new affine merge candidate is added to the merge candidate list (for classic merge mode or classic merge skip mode). Figure 16 This example is shown. With Figure 13 In contrast, the affine merge candidate is the first affine neighbor block (1917) from A1, B1, B0, A0, and B2. If compared with... Figure 12 If the same conditions as 1205 are valid (1927), then a motion vector field generated using affine parameters is produced to obtain affine merging candidates (1929). Depending on the ATMVP, time, and the use of affine merging candidates, the initial list of merging candidates can have 4, 5, 6, or 7 candidates.
[0509] The order among all these candidates is important because the more likely candidates should be processed first to ensure they are more likely to advance to the motion vector candidate – the preferred order is as follows:
[0510] A1
[0511] B1
[0512] B0
[0513] A0
[0514] Affine merging
[0515] ATMVP
[0516] B2
[0517] time
[0518] combination
[0519] Zero_MV
[0520] It is important to note that the affine merge candidate is positioned before the ATMVP candidate but after the four main adjacent blocks. The advantage of placing the affine merge candidate before the ATMVP candidate compared to placing it after the ATMVP and temporal prediction sub-candidates is increased coding efficiency. This increase in coding efficiency depends on the GOP (Group of Pictures) structure and the quantization parameter (QP) settings for each picture within the GOP. However, for the most commonly used GOP and QP settings, this order yields the increase in coding efficiency.
[0521] Another advantage of this solution is the concise design of classic merging and classic merge skipping modes (i.e., merging modes with additional candidates such as ATMVP or affine merging candidates) for both syntax and derivation processing. Furthermore, the merging index of the affine merging candidate can change based on the availability or value (redundancy check) of previous candidates in the merging candidate list. Therefore, efficient signaling is achieved.
[0522] In another example, the merge index of an affine merge candidate can vary based on one or more conditions.
[0523] For example, the merge index or position associated with an affine merge candidate in the list changes according to a criterion. The principle is that when an affine merge candidate has a high probability of being selected, a low value is set for the merge index corresponding to the affine merge candidate (and a higher value is set when there is a low probability of being selected).
[0524] In the twelfth embodiment, the affine merge candidate has a merge index value. To improve the encoding efficiency of the merge index, the context variable used for the bits of the merge index depends on whether the affine flags used for adjacent blocks and / or for the current block are efficient.
[0525] For example, context variables can be determined using the following formula:
[0526] ctxIdx=IsAffine(A1)+IsAffine(B1)+IsAffine(B0)+IsAffine(A0)+IsAffine(B2)
[0527] The resulting context value can have values of 0, 1, 2, 3, 4, or 5.
[0528] Affine symbols increase coding efficiency.
[0529] In the first variant, to involve fewer adjacent blocks, ctxIdx = IsAffine(A1) + IsAffine(B1). The resulting context value can have a value of 0, 1, or 2.
[0530] In the second variation, fewer adjacent blocks are also involved, ctxIdx = IsAffine(A2) + IsAffine(B3). Again, the resulting context value can have a value of 0, 1, or 2.
[0531] In the third variant, adjacent blocks are not involved, and ctxIdx = IsAffine(current block). The resulting context value can have a value of 0 or 1.
[0532] Figure 15 This is a flowchart of partial decoding processing involving some syntactic elements related to the encoding pattern using the third variant. In this flowchart, the skip flag (1601), prediction pattern (1611), merge flag (1603), merge index (1608), and affine flag (1606) can be decoded. This flowchart is consistent with the previously described... Figure 12 The flowchart is similar, so a detailed description is omitted. The difference is that the merge index decoding process takes into account the affine flags, allowing the use of the affine flags decoded before the merge index when obtaining the context variable for the merge index, which is not the case in VTM2.0. In VTM2.0, the affine flags of the current block cannot be used to obtain the context variable for the merge index because they always have the same value "0".
[0533] Thirteenth Embodiment
[0534] In the tenth embodiment, the context variable of the merge index bit of the current block depends on the value of the skip flag for the current block (current coding unit or CU).
[0535] In the thirteenth embodiment, instead of directly using the skip flag value to derive the context variable for the target bit of the merge index, the context value for the target bit is derived from the context variable used to encode the skip flag for the current CU. This is possible because the skip flag itself is CABAC encoded and therefore has a context variable.
[0536] Preferably, the context variable of the target bit of the merge index of the current CU is set to be equal to the context variable used to encode the skip flag of the current CU (the context variable of the target bit of the merge index of the current CU is copied from the context variable used to encode the skip flag of the current CU).
[0537] The target bit can be only the first bit. The other bits can be bypassed and encoded as in the first embodiment.
[0538] The context variable for the skip flag used in the current CU is derived in the manner specified in VTM2.0. Compared to the VTM2.0 reference software, this embodiment has the advantage of reducing the complexity of merge index decoding and decoder design (and encoder design) without affecting encoding efficiency. In fact, using this embodiment, at least one CABAC state is required to encode the merge index, instead of the five CABAC states required for encoding (encoding / decoding) the current VTM merge index. Furthermore, this reduces worst-case complexity because other bits are bypassed by CABAC encoding, reducing the number of operations compared to encoding all bits using CABAC.
[0539] Fourteenth Embodiment
[0540] In the thirteenth embodiment, the context value of the target bit is derived from the context variable of the skip flag of the current CU.
[0541] In the fourteenth embodiment, the context value of the target bit is derived from the context variable of the affine flag of the current CU.
[0542] This is possible because the affine symbol itself is CABAC encoded and therefore has context variables.
[0543] Preferably, the context variable of the target bit of the merge index of the current CU is set to be equal to the context variable of the affine flag of the current CU (the context variable of the target bit of the merge index of the current CU is copied from the context variable of the affine flag of the current CU).
[0544] The target bit can be only the first bit. The other bits are bypassed and encoded as in the first embodiment.
[0545] The context variables of the current CU's affine flags are derived in the manner specified in VTM2.0.
[0546] Compared to the VTM 2.0 reference software, this embodiment offers the advantage of reducing the complexity of merge index decoding and decoder design (and encoder design) without compromising encoding efficiency. In fact, using this embodiment, at least one CABAC state is required for the merge index, instead of the five CABAC states required for encoding (encoding / decoding) the current VTM merge index. Furthermore, this reduces worst-case complexity because other bits are bypassed by CABAC encoding, reducing the number of operations compared to encoding all bits using CABAC.
[0547] Fifteenth Embodiment
[0548] In some of the foregoing embodiments, the context variable has more than two values, such as three values: 0, 1, and 2. However, to reduce complexity and the number of states to be processed, the number of permitted context variable values can be limited to two, such as 0 and 1. This can be achieved, for example, by changing any initial context variable with a value of 2 to 1. In practice, this simplification has no or only a limited impact on coding efficiency.
[0549] Combination of embodiments with other embodiments
[0550] Any two or more of the above embodiments can be combined.
[0551] The preceding description focuses on the encoding and decoding of merge indexes. For example, a first embodiment involves: generating a list of merge candidates including ATMVP candidates (for classic merge mode or classic merge skip mode, i.e., non-affine merge mode or non-affine merge skip mode); selecting one of the merge candidates from the list; and generating a merge index for the selected merge candidate using CABAC encoding, wherein one or more bits of the merge index are bypassed by CABAC encoding. In principle, the invention can be applied to modes other than merge modes (e.g., affine merge mode) involving: generating a list of motion information prediction sub-candidates (e.g., a list of affine merge candidates or a list of motion vector prediction sub-(MVP) candidates); selecting one of the motion information prediction sub-candidates (e.g., MVP candidates) from the list; and generating an identifier or index for the selected motion information prediction sub-candidate from the list (e.g., a selected affine merge candidate or a selected MVP candidate for predicting the motion vector of the current block). Thus, the invention is not limited to merge modes (i.e., classic merge mode and classic merge skip mode), and the index to be encoded or decoded is not limited to merge indexes. For example, in the development of VVC, it is conceivable that the techniques of the foregoing embodiments can be applied to (or extended to) modes other than the merge mode, such as the AMVP mode of HEVC or its equivalent mode or affine merge mode in VVC. The appended claims should be interpreted accordingly.
[0552] As discussed, in the foregoing embodiments, one or more motion information candidates (e.g., motion vectors) and / or one or more affine parameters for an affine merging pattern (affine merging or affine merging skipping pattern) are obtained from a first neighboring block that is affinely encoded among spatially adjacent blocks (e.g., at positions A1, B1, B0, A0, B2) or temporally associated blocks (e.g., a "central" block having juxtaposed blocks or its spatially adjacent blocks (such as "H" etc.). These positions are in Figure 6a and 6bThe description is as follows. In order to obtain (e.g., derive, share, or “merge”) one or more motion information and / or affine parameters between the current block (or the group of sample / pixel values currently being encoded / decoded, such as the current CU) and neighboring blocks (spatially adjacent or temporally associated with the current block), one or more affine merge candidates are added to the list of merge candidates (i.e., classic merge mode candidates) such that when the selected merge candidate (which is then signaled using a merge index, such as using a syntax element, such as “merge_idx” in HEVC or its functionally equivalent syntax element) is an affine merge candidate, the affine merge mode is used to encode / decode the current CU / block for that affine merge candidate.
[0553] As described above, one or more affine merging candidates used to obtain (e.g., export or share) one or more motion information and / or affine parameters for an affine merging pattern can also be signaled using a separate list (or set) of affine merging candidates (which may be the same as or different from the list of merging candidates used for a classic merging pattern).
[0554] According to embodiments of the present invention, when the techniques of the above embodiments are applied to the affine merging mode, the techniques described above can be used in accordance with the relevant provisions. Figure 8 The motion vector derivation processing shown and described for the classic merging pattern, or as per [reference] Figure 13 The merge candidate derivation process shown and described uses the same technique to generate the affine merge candidate list. The advantage of sharing the same technique to generate / compile the lists of affine merge candidates (for affine merge mode or affine merge skip mode) and merge candidates (for classic merge mode or classic merge skip mode) compared to having separate techniques is reduced complexity in the encoding / decoding process.
[0555] According to another embodiment, the following is about Figure 24 The individual techniques shown can be used to generate / compile a list of affine merging candidates.
[0556] Figure 24 This is a flowchart illustrating the affine merge candidate derivation process for affine merge modes (affine merge mode and affine merge skip mode). In the first step of the derivation process, five block positions (2401 to 2405) are considered to obtain / derive spatial affine merge candidate 2413. These positions are... Figure 6a (and Figure 6bThe spatial locations depicted are labeled A1, B1, B0, A0, and B2 in the attached diagram. In the next step, the availability of spatial motion vectors is checked, and it is determined whether the inter-frame mode coded blocks associated with each location A1, B1, B0, A0, and B2 are each encoded using an affine mode (e.g., any one of affine merging, affine merging skip, or affine AMVP mode) (2410). Up to five motion vectors (i.e., spatial affine merging candidates) are selected / obtained / derived. A predictor is considered available if it exists (e.g., information for obtaining / deriving the motion vectors associated with that location exists) and the block is not intra-frame coded and is affine (i.e., coded using an affine mode).
[0557] Then, affine motion information (2411) is derived / obtained for each available block location (2410). This is based on an affine model of the block location (and, for example, regarding...). Figure 11 The affine model parameters discussed in (a) and 11(b) are used to derive this to the current block. Then, a pruning process (2412) is applied to remove candidates that give the same affine motion compensation (or have the same affine model parameters) as another candidate previously added to the list.
[0558] At the end of this phase, the list of spatial affine merging candidates includes up to five candidates.
[0559] If the number of candidates (Nb_Cand) is strictly less than the maximum number of candidates (2426) (here, Max_Cand is the value signaled in the bitstream header and equals five for affine merge mode, but may vary / variable depending on the implementation).
[0560] Then, constructed affine merging candidates (i.e., additional affine merging candidates, which are generated to provide some diversity and close to the target number, playing a similar role to, for example, the bidirectional predictive merging candidates in HEVC) are generated (2428). These constructed affine merging candidates are based on motion vectors associated with the neighboring spatial and temporal positions of the current block. First, control points (2418, 2419, 2420, 2421) are defined to generate motion information for generating the affine model. Two of these control points are associated with, for example, the neighboring spatial and temporal positions of the current block. Figure 11 v0 and v1 in (a) and 11(b) correspond to each other. These four control points correspond to the four corners of the current block.
[0561] If a block location exists at position B2 (2405) and if the block is encoded in inter-frame mode (2414), then the motion information of the top left of the control point (2418) is obtained from the motion information of the block location at position B2 (e.g., making the motion information of the top left of the control point (2418) equal to the motion information of the block location at position B2). Otherwise, if there exists (e.g.) Figure 6b The block position at location B3 (2406) as described in the text, and if the block is encoded in inter-frame mode (2414), then the motion information of the top left of the control point (2418) is obtained from the motion information of the block position at location B3 (e.g., making the motion information of the top left of the control point (2418) equal to the motion information of the block position at location B3), and if this is not the case, if there exists (such as...) Figure 6b The block position at location A2 (2407) is described in the diagram. If the block is encoded in inter-frame mode (2414), the motion information of the top left side (2418) of the control point is obtained from the motion information of the block position at location A2 (e.g., making the motion information of the top left side (2418) of the control point equal to the motion information of the block position at location A2). When no block is available for the control point, the control point is considered unavailable.
[0562] If a block location exists at position B1 (2402) and if that block is encoded in inter-frame mode (2415), the motion information for the top right side of the control point (2419) is obtained from the motion information of the block location at position B1 (e.g., the motion information for the top right side of the control point (2419) is made equal to the motion information of the block location at position B1). Otherwise, if a block location exists at position B0 (2403) and if that block is encoded in inter-frame mode (2415), the motion information for the top right side of the control point (2419) is obtained from the motion information of the block location at position B0 (e.g., the motion information for the top right side of the control point (2419) is made equal to the motion information of the block location at position B0). When no block is available for the control point, the control point is considered unavailable.
[0563] If a block location exists at position A1 (2401) and if that block is encoded in inter-frame mode (2416), the motion information for the bottom left side of the control point (2420) is obtained from the motion information of the block location at position A1 (e.g., making the motion information for the bottom left side of the control point (2420) equal to the motion information of the block location at position A1). Otherwise, if a block location exists at position A0 (2404) and if that block is encoded in inter-frame mode (2416), the motion information for the bottom left side of the control point (2420) is obtained from the motion information of the block location at position A0 (e.g., making the motion information for the bottom left side of the control point (2420) equal to the motion information of the block location at position A0). When no block is available for the control point, the control point is considered unavailable.
[0564] If it exists (e.g.) Figure 6aThe control point is located at position H (2408) as depicted in the diagram. If the block is encoded in inter-frame mode (2417), the motion information of the bottom right side (2421) of the control point is obtained from, for example, the motion information of the temporal candidate of the juxtaposed block at position H (e.g., making the motion information of the bottom right side (2421) of the control point equal to, for example, the motion information of the temporal candidate of the juxtaposed block at position H). When no block is available for the control point, the control point is considered unavailable.
[0565] Based on these control points, up to 10 constructed affine merge candidates (2428) can be generated. These candidates are generated using 4, 3, or 2 control points based on affine patterns. For example, the first constructed affine merge candidate can be generated using 4 control points. Then, the next 4 constructed affine merge candidates are 4 possible ways to generate them using 4 different sets of 3 control points (i.e., 4 different possible combinations of sets containing 3 of the 4 available control points). Then, the other constructed affine merge candidates are candidates generated using 2 different sets of 2 control points (i.e., 2 different possible combinations of sets containing 2 of the 4 control points).
[0566] If, after adding these additional (constructed) affine merge candidates, the number of candidates (Nb_Cand) remains strictly less than (2430) the maximum number of candidates (Max_Cand), then other additional virtual motion information candidates (2432), such as zero motion vector candidates (or even combined double prediction merge candidates where applicable), are added / generated until the number of candidates in the affine merge candidate list reaches the target number (e.g., the maximum number of candidates).
[0567] At the end of this process, a list or set of affine merge pattern candidates is generated / constructed (i.e., a list or set of candidates for affine merge patterns (affine merge mode and affine merge skip mode)) (2434). Figure 24 As illustrated, the list or set of affine merging (motion vector predictor) candidates is constructed / generated from a subset of spatial candidates (2401 to 2407) and temporal candidates (2408) (2434). It should be understood that, according to embodiments of the invention, a number / type of candidates can also be used for checking availability, pruning, or potential candidates (e.g., in a manner similar to...). Figure 13 or Figure 16 The method of exporting the merge candidate list in the middle can also add other affine merge candidate export processes in different orders of ATMVP candidates to generate a list / set of affine merge candidates.
[0568] The following examples illustrate how a list (or set) of affine merge candidates can be used to signal (e.g., encode or decode) the selected affine merge candidates (signaling can be done using a merge index for the merge mode or a separate affine merge index specifically used with the affine merge mode).
[0569] In the following embodiments: a merge mode (i.e., a merge mode other than the affine merge mode defined later, in other words, a classic non-affine merge mode or a classic non-affine merge skip mode) is a merge mode for which motion information of spatially adjacent or temporally associated blocks is obtained (or derived from or shared with the current block) for the current block; a merge mode predictor sub-candidate (i.e., a merge candidate) is information relating to one or more spatially adjacent or temporally associated blocks for which the current block can obtain / derive motion information in the merge mode; a merge mode predictor is a selected merge mode predictor sub-candidate, wherein the information of the selected merge mode predictor sub-candidate is used when predicting the motion information of the current block and during signaling notification in the merge mode (e.g., encoding or decoding) process to identify the index (e.g., merge index) of the merge mode predictor sub-candidate from the list (or set) of merge mode predictor sub-candidates; an affine merge mode is a merge mode in which, in In this merging mode, motion information of spatially adjacent or temporally associated blocks is obtained (derived from or shared with the current block) for the current block so that the motion information and / or affine parameters of the current block's affine mode processing (or affine motion model processing) can utilize the obtained / derived / shared motion information; affine merge mode predictor candidates (i.e., affine merge candidates) are information relating to one or more spatially adjacent or temporally associated blocks from which motion information in the affine merge mode can be obtained / derived for the current block; the affine merge mode predictor is a selected affine merge mode predictor candidate, wherein the information of the selected affine merge mode predictor candidate can be used in the affine motion model when predicting the motion information of the current block and during signal notification in the affine merge mode (e.g., encoding or decoding) processing of the index (e.g., affine merge index) of the affine merge mode predictor candidate identified from the list (or set) of affine merge mode predictor candidates. It should be understood that in the following embodiments, an affine merge pattern is a merge pattern having its own affine merge index (as an identifier of a variable) for identifying an affine merge pattern predicting a sub-candidate from a list / set of candidates (also referred to as an "affine merge list" or "sub-block merge list"), wherein the affine merge index is signaled to identify that particular affine merge pattern predicting a sub-candidate.
[0570] It should be understood that, in the following embodiments, "merge mode" refers to any one of the classic merge mode or classic merge skip mode in HEVC / JEM / VTM, or any functionally equivalent mode, assuming that the acquisition (e.g., export or sharing) of motion information and the signal notification of the merge index are used in the mode as described above. "Affine merge mode" also refers to any one of the affine merge mode or affine merge skip mode (if it exists and uses such acquisition / export), or any other functionally equivalent mode (assuming the same features are used in the mode).
[0571] Sixteenth Embodiment
[0572] In the sixteenth embodiment, CABAC encoding is used to signal the motion information predictor index used to identify affine merging pattern predictors (candidates) from the affine merging candidate list, wherein one or more bits of the motion information predictor index are bypassed by CABAC encoding.
[0573] According to a first variation of this embodiment, at the encoder, the motion information prediction sub-index of the affine merging mode is encoded by: generating a list of motion information prediction sub-candidates; selecting one of the motion information prediction sub-candidates in the list as the affine merging mode predictor; and generating a motion information prediction sub-index of the selected motion information prediction sub-candidate using CABAC encoding, wherein one or more bits of the motion information prediction sub-index are bypassed by CABAC encoding. Data indicating the index of the selected motion information prediction sub-candidate is then included in the bit stream. The decoder then decodes the motion information prediction sub-index of the affine merging mode from the bit stream including this data by: generating a list of motion information prediction sub-candidates; decoding the motion information prediction sub-index using CABAC decoding, wherein one or more bits of the motion information prediction sub-index are bypassed by CABAC decoding; and, when using the affine merging mode, identifying one of the motion information prediction sub-candidates in the list as the affine merging mode predictor using the decoded motion information prediction sub-index.
[0574] According to another variation of the first variation, when using the merging mode, one of the motion information predictor candidates in the list can also be selected as the merging mode predictor, such that when using the merging mode, the decoder can use the decoded motion information predictor index (e.g., the merging index) to identify one of the motion information predictor candidates in the list as the merging mode predictor. In this other variation, the affine merging index is used to signal the affine merging mode predictor (candidate), and signaling the affine merging index is implemented using a signaling index similar to the signaling merging index according to any of the first to fifteenth embodiments or the signaling merging index used in the current VTM or HEVC.
[0575] In this variant, when using the merge mode, the signal-notified merge index can be implemented using either the signal-notified merge index according to any of the first to fifteenth embodiments or the signal-notified merge index used in the current VTM or HEVC. In this variant, the signal-notified affine merge index and the signal-notified merge index can use different signal-notified index schemes. The advantage of this variant is that it achieves better encoding efficiency by using efficient index encoding / signal notification for both the affine merge mode and the merge mode. Furthermore, in this variant, separate syntactic elements can be used for the merge index (such as "Merge_idx[][]" in HEVC or its functional equivalents) and the affine merge index (such as "A_Merge_idx[][]"). This allows the merge index and the affine merge index to be signal-notified (encoded / decoded) independently.
[0576] According to another variation, when using the merge mode and one of the motion information predictor candidates in the list can also be selected as the merge mode predictor, the CABAC encoding uses the same context variable for at least one bit of the motion information predictor index (e.g., merge index or affine merge index) for the current block in both modes (i.e., when using affine merge mode and when using merge mode), such that at least one bit of the affine merge index and the merge index share the same context variable. The decoder then uses the decoded motion information predictor index to identify one of the motion information predictor candidates in the list as the merge mode predictor when using the merge mode, wherein the CABAC decoding uses the same context variable for at least one bit of the motion information predictor index for the current block in both modes (i.e., when using affine merge mode and when using merge mode).
[0577] According to a second variation of this embodiment, at the encoder, the motion information predictor sub-index is encoded as follows: a list of motion information predictor candidates is generated; when using an affine merging mode, one of the motion information predictor candidates in the list is selected as the affine merging mode predictor; when using a merging mode, one of the motion information predictor candidates in the list is selected as the merging mode predictor; and a motion information predictor sub-index of the selected motion information predictor candidate is generated using CABAC encoding, wherein one or more bits of the motion information predictor sub-index are bypassed by CABAC encoding. Data indicating the index of the selected motion information predictor candidate is then included in the bit stream. The decoder then decodes the motion information predictor index from the bitstream by: generating a list of motion information predictor candidates; decoding the motion information predictor index using CABAC decoding, wherein one or more bits of the motion information predictor index are bypassed by CABAC decoding; when using affine merging mode, identifying one of the motion information predictor candidates in the list as an affine merging mode predictor using the decoded motion information predictor index; and when using merging mode, identifying one of the motion information predictor candidates in the list as a merging mode predictor using the decoded motion information predictor index.
[0578] According to another variation of the second variation, the signal-notified affine merge index and the signal-notified merge index use the same signal-notified index scheme according to any one of the first to fifteenth embodiments or the signal-notified merge index used in the current VTM or HEVC. This other variation has the advantage of simple design during implementation, which also reduces complexity. In this variation, when using the affine merge mode, the encoder's CABAC encoding includes: at least one bit of the motion information prediction sub-index (affine merge index) for the current block, using a context variable that can be separated from another context variable used for at least one bit of the motion information prediction sub-index (merge index) when using the merge mode; and including data indicating the use of the affine merge mode in the bit stream, so that the context variables for the affine merge mode and the merge mode can be distinguished (clearly identified) for the CABAC decoding process. The decoder then obtains data from the bit stream indicating the use of the affine merge mode in the bit stream; and when using the affine merge mode, the CABAC decoder uses this data to distinguish between the context variables for the affine merge index and the merge index. Furthermore, at the decoder, the data used to indicate the use of the affine merging mode can also be used to generate a list (or set) of affine merging mode prediction sub-candidates if the obtained data indicates the use of the affine merging mode, or to generate a list (or set) of merging mode prediction sub-candidates if the obtained data indicates the use of the merging mode.
[0579] This variant allows merged indexes and affine merged indexes to be notified using the same signaling index scheme, while merged indexes and affine merged indexes are still encoded / decoded independently of each other (e.g., by using separate context variables).
[0580] One way to use the same signaling indexing scheme is to use the same syntax elements for both affine merge indexes and merge indexes. That is, to use the same syntax elements to encode the motion information prediction sub-index of the selected motion information prediction sub-candidate for both affine merge mode and merge mode. Then at the decoder, the motion information prediction sub-index is decoded by parsing the same syntax elements from the bitstream, regardless of whether the current block is encoded (and being decoded) using affine merge mode or merge mode.
[0581] Figure 22 This diagram illustrates partial decoding of some syntactic elements related to the encoding pattern (i.e., the same signal-notified indexing scheme) according to a variant of the sixteenth embodiment. The diagram shows the affine merge index (2255 - "Merge idx affine") signaling the affine merge pattern (2257: Yes) and the merge index (2258 - "Merge idx") signaling the merge pattern (2257: No) using the same signal-notified indexing scheme. It should be understood that in some variants, the affine merge candidate list may include ATMVP candidates, as in the merge candidate list of the current VTM. The encoding of the affine merge index is similar to... Figure 10 (a) and Figure 10 The encoding of the merge index for the merge pattern is described in (b). In some variants, even if the affine merge candidate export does not define ATMVP merge candidates, when ATMVP is enabled for a merge pattern with up to 5 other candidates (i.e., a total of 6 candidates), as shown... Figure 10 The affine merge index as described in (b) is encoded such that the maximum number of candidates in the affine merge candidate list matches the maximum number of candidates in the merge candidate list. Therefore, each bit of the affine merge index has its own context. All context variables used to signal the bits of the merge index are independent of the context variables used to signal the bits of the affine merge index.
[0582] According to another variation, the same signaled index scheme shared by the signaled merge index and the affine merge index uses CABAC encoding only for the first bin, as in the first embodiment. That is, all bits of the motion information prediction sub-index except the first bit are bypassed CABAC encoded. In this other variation of the sixteenth embodiment, when ATMVP is included as a candidate in either the merge candidate list or the affine merge candidate list (e.g., when ATMVP is enabled at the SPS level), the encoding of each index (i.e., the merge index or the affine merge index) is modified such that... Figure 14 As shown, only the first bit of the index is encoded via CABAC using a single context variable. When ATMVP is not enabled at the SPS level, this single context is set up in the same way as in the current VTM reference software. The other bits (bits 2 through 5 or bit 4 if there are only 5 candidates in the list) are bypassed. When ATMVP is not included as a candidate in the merge candidate list (e.g., when ATMVP is disabled at the SPS level), there are 5 merge candidates and 5 affine merge candidates available. Only the first bit of the merge index for the merge pattern is encoded via CABAC using a first single context variable. And only the first bit of the affine merge index for the affine merge pattern is encoded via CABAC using a second single context variable. When ATMVP is not enabled at the SPS level for both the merge index and the affine merge index, these first and second context variables are set up in the same way as in the current VTM reference software. The other bits (bits 2 through 4) are bypassed.
[0583] The decoder generates the same list of merge candidates as the encoder, as well as the same list of affine merge candidates. This is achieved by using... Figure 22The method is as follows. Although the same signaling scheme is used for both merge mode and affine merge mode, an affine flag (2256) is used to determine whether the data currently being decoded is for the merge index or the affine merge index, so that the first and second context variables can be separated (or distinguished) from each other for CABAC decoding processing. That is, the affine flag (2256) (i.e., used at step 2257) is used during index decoding processing to determine whether the data being decoded is for "merge idx2258" or "merge idx affine 2255". When ATMVP is not included as a candidate in the merge candidate list (e.g., when ATMVP is disabled at the SPS level), there are 5 merge candidates for both candidate lists (for merge mode and affine merge mode). Only the first bit of the merge index is decoded via CABAC using the first single context variable. And only the first bit of the affine merge index is decoded via CABAC using the second single context variable. All other bits (from the 2nd to the 4th bit) are bypassed for decoding. Compared to the current reference software, when ATMVP is included as a candidate in the merge candidate list (e.g., when ATMVP is enabled at the SPS level), a first single context variable is used when decoding the merge index, and a second single context variable is used when decoding the affine merge index. Only the first bit of the merge index is decoded via CABAC. The other bits (bits 2 through 5 or bit 4) are bypassed. The decoded index is then used to identify the candidate selected by the encoder from the corresponding candidate list (i.e., merge candidate or affine merge candidate).
[0584] The advantage of this variant is that using the same signal-notified index scheme for both merged and affine merged indices reduces the complexity of the index decoding and decoder design (and encoder design) for implementing these two different modes without significantly impacting coding efficiency. In fact, for this variable, signal-notified indexing requires only two CABAC states (one for each of the first and second individual context variables), instead of the nine or ten required when all bits of the merged and affine merged indices are CABAC encoded / decoded. Furthermore, worst-case complexity is reduced because all other bits (except the first bit) are CABAC bypass encoded, which reduces the number of operations required during CABAC encoding / decoding compared to encoding all bits using CABAC.
[0585] According to another variant, CABAC encoding or decoding uses the same context variable for at least one bit of the sub-index predicting motion information for the current block, whether using an affine merge mode or a merge mode. In this other variant, the context variables for the first bit of the merge index and the first bit of the affine merge index are independent of which index is being encoded or decoded; that is, the first and second individual context variables (from the previous variant) are indistinguishable / separable and are the same single context variable. Therefore, contrary to the previous variant, the merge index and the affine merge index share a single context variable during CABAC processing. Figure 23 As shown, the signaling notification indexing scheme is the same for both merged and affine merged indexes; that is, only one type of index, "merge idx(2308)", is encoded or decoded for both modes. For the CABAC decoder, the same syntactic elements are used for both merged and affine merged indexes, and they do not need to be distinguished when considering context variables. Therefore, it is not necessary to... Figure 22 In step (2257), the affine flag (2306) is used to determine whether the current block is encoded (to be decoded) in affine merge mode, and... Figure 23 There is no branch after step 2306 because only one index (“merge idx”) needs to be decoded. An affine flag is used to utilize the affine merging pattern for motion information prediction (i.e., during prediction processing after the CABAC decoder has decoded the index (“merge idx”). Furthermore, only the first bit of this index (i.e., the merge index and the affine merge index) is encoded using a single context via CABAC, and the other bits are bypassed, as described for the first embodiment. Therefore, in this other variation, a context variable for the first bit of the merge index and the affine merge index is shared by both, signaled to the merge index and the affine merge index. If the candidate list sizes are different for the merge index and the affine merge index, the maximum number of bits used to signal the relevant index for each case may also be different, i.e., they are independent of each other. Therefore, if necessary, the number of bypassed bits can be adjusted accordingly based on the value of the affine flag (2306), for example, to enable parsing of the relevant index data from the bitstream.
[0586] The advantage of this variant is that it reduces the complexity of merge index and affine merge index decoding processes and decoder design (as well as encoder design) without significantly impacting coding efficiency. In fact, for this other variant, only one CABAC state is needed to signal both the merge index and affine merge index, instead of the nine or ten CABAC states required in the previous variant. Furthermore, it reduces worst-case complexity because all other bits (except the first bit) are CABAC bypass encoded, which reduces the number of operations required during CABAC encoding / decoding compared to encoding all bits using CABAC.
[0587] In the aforementioned variations of this embodiment, the affine merge index and the merge index are signaled to share one or more contexts as described in any of the first to fifteenth embodiments. The advantage of this approach is reduced complexity due to the decreased number of contexts required to encode or decode these indexes.
[0588] In the aforementioned variations of this embodiment, the motion information prediction sub-candidate includes information for obtaining (or deriving) one or more of the following: direction, list identifier, reference frame index, and motion vector. Preferably, the motion information prediction sub-candidate includes information for obtaining motion vector prediction sub-candidates. In a preferred variation, the motion information prediction sub-index (e.g., an affine merging index) is used to signal the affine merging mode prediction sub-candidate, and the signaling of the affine merging index is implemented using a signaling index similar to that of the signaling merge index according to any one of the first to fifteenth embodiments or the signaling merge index used in the current VTM or HEVC (where the motion information prediction sub-candidate of the affine merging mode is the merge candidate).
[0589] In the aforementioned variations of this embodiment, such as in the first embodiment or in some of the other second to fifteenth embodiments, the generated motion information prediction sub-candidate list includes ATMVP candidates. Optionally, the generated motion information prediction sub-candidate list does not include ATMVP candidates.
[0590] In the aforementioned variation of this embodiment, the maximum number of candidates that can be included in the candidate lists of the merge index and the affine merge index is fixed. The maximum number of candidates that can be included in the candidate lists of the merge index and the affine merge index can be the same. Then, the encoder includes data used to determine (or indicate) the maximum number (or target number) of motion information prediction sub-candidates that can be included in the generated motion information prediction sub-candidate list in the bitstream, and the decoder obtains data from the bitstream used to determine the maximum number (or target number) of motion information prediction sub-candidates that can be included in the generated motion information prediction sub-candidate list. This makes it possible to parse the data used to decode the merge index or the affine merge index from the bitstream. The data used to determine (or indicate) the maximum number (or target number) can be the maximum number (or target number) at the time of decoding itself, or it can enable the decoder to determine the maximum / target number in combination with other parameters / syntactic elements (e.g., "five_minus_max_num_merge_cand" or "MaxNumMergeCand-1" or their functional equivalents used in HEVC).
[0591] Optionally, if the maximum number (or target number) of candidates in the candidate lists of the merge index and the affine merge index can vary or differ (e.g., because the use of ATMVP candidates or any other optional candidates may be enabled or disabled for one list but not for others, or because the lists are generated / exported using different candidate lists), then the maximum number (or target number) of motion information prediction sub-candidates that can be included in the generated motion information prediction sub-candidate lists in the case of using the affine merge mode and the case of using the merge mode can be determined separately, and the encoder includes the data used to determine the maximum / target number in the bitstream. The decoder then obtains the data used to determine the maximum / target number from the bitstream and uses the obtained data to parse or decode the motion information prediction sub-index. An affine flag can then be used to switch between parsing or decoding, for example, the merge index and the affine merge index.
[0592] Implementation of the embodiments of the present invention
[0593] One or more of the foregoing embodiments are derived from performing one or more of the method steps of the foregoing embodiments. Figure 3 The processor 311 of the processing device 300, or Figure 4 Encoder 400 in Figure 5 Decoder 60 in Figure 17 It is implemented by the corresponding functional modules / units of the CABAC encoder or its corresponding CABAC decoder.
[0594] Figure 19This is a schematic block diagram of a computing device 1300 for implementing one or more embodiments of the present invention. The computing device 1300 may be a device such as a microcomputer, workstation, or lightweight portable device. The computing device 1300 includes a communication bus connected to: - a central processing unit (CPU) 2001, such as a microprocessor; - a random access memory (RAM) 2002 for storing executable code of methods according to embodiments of the present invention and registers adapted to record variables and parameters required for implementing methods for encoding or decoding at least a portion of an image according to embodiments of the present invention, the storage capacity of which may be expanded, for example, by an optional RAM connected to an expansion port; - a read-only memory (ROM) 2003 for storing a computer program for implementing embodiments of the present invention; - a network interface (NET) 2004, which is typically connected to a communication network through which digital data to be processed is transmitted or received. The network interface (NET) 2004 may be a single network interface or a set of different network interfaces (e.g., wired and wireless interfaces, or different types of wired or wireless interfaces), running on the CPU. Under the control of the software application in 2001, data packets are written to the network interface for transmission or read from the network interface for reception; - User interface (UI) 2005, which can be used to receive input from the user or display information to the user; - Hard disk (HD) 2006, which can be configured as a mass storage device; - Input / output module (IO) 2007, which can be used to receive / send data from / to external devices (such as video sources or displays). Executable code can be stored in ROM 2003, HD 2006, or on a removable digital medium such as a disk. According to a variant, the executable code of the program can be received via NET 2004 through a communication network to be stored in one of the storage components (such as HD 2006) of the communication device 1300 before execution. CPU 2001 is adapted to control and direct the execution of instructions or portions of the software code of one or more programs according to embodiments of the present invention, the instructions being stored in one of the aforementioned storage components. For example, after power-on, CPU 2001 can execute software application-related instructions from main RAM memory 2002 after the instructions have been loaded from program ROM 2003 or HD 2006. Such software application, when executed by CPU 2001, causes the steps of the method according to the invention to be performed.
[0595] It should also be understood that, according to other embodiments of the invention, a decoder according to the above embodiments is provided in a user terminal such as a computer, mobile phone (cellular phone), tablet, or any other type of device capable of providing / displaying content to a user (e.g., a display device). According to yet another embodiment, an encoder according to the above embodiments is provided in an image capture device, which further includes a camera, video camera, or webcam (e.g., a closed-circuit television or video surveillance camera) for capturing and providing content for encoding by the encoder. See below. Figure 20 and 21 Here are two such examples.
[0596] Figure 20 This is a diagram illustrating a webcam system 2100 including a webcam 2102 and a client device 2104.
[0597] The network camera 2102 includes a camera unit 2106, an encoding unit 2108, a communication unit 2110, and a control unit 2112.
[0598] The network camera 2102 and the client device 2104 are interconnected via network 200 so that they can communicate with each other.
[0599] The camera unit 2106 includes a lens and an image sensor (e.g., a charge-coupled device (CCD) or complementary metal-oxide-semiconductor (CMOS)), and captures an image of an object and generates image data based on the image. The image can be a still image or a video image. The camera unit may also include zoom and / or pan components adapted (optically or digitally) for zooming or panning, respectively.
[0600] Encoding unit 2108 encodes image data using the encoding methods described in the first to sixteenth embodiments. Encoding unit 2108 uses at least one of the encoding methods described in the first to sixteenth embodiments. In other instances, encoding unit 2108 may use a combination of the encoding methods described in the first to sixteenth embodiments.
[0601] The communication unit 2110 of the network camera 2102 transmits the encoded image data encoded by the encoding unit 2108 to the client device 2104.
[0602] In addition, the communication unit 2110 receives commands from the client device 2104. These commands include those for setting parameters for encoding by the encoding unit 2108.
[0603] The control unit 2112 controls other units in the network camera 2102 according to the commands received by the communication unit 2110.
[0604] The client device 2104 includes a communication unit 2114, a decoding unit 2116, and a control unit 2118.
[0605] The communication unit 2114 of the client device 2104 transmits commands to the network camera 2102.
[0606] In addition, the communication unit 2114 of the client device 2104 receives encoded image data from the webcam 2102.
[0607] Decoding unit 2116 decodes the encoded image data using the decoding method described in any of the first to sixteenth embodiments. In other instances, decoding unit 2116 may use a combination of the decoding methods described in the first to sixteenth embodiments.
[0608] The control unit 2118 of the client device 2104 controls other units in the client device 2104 based on user operations or commands received from the communication unit 2114.
[0609] The control unit 2118 of the client device 2104 controls the display device 2120 to display the image decoded by the decoding unit 2116.
[0610] The control unit 2118 of the client device 2104 also controls the display device 2120 to display the values of the parameters for specifying the network camera 2102 (including the parameters for encoding the encoding unit 2108).
[0611] The control unit 2118 of the client device 2104 also controls other units in the client device 2104 based on user operation input to the GUI displayed on the display device 2120.
[0612] The control unit 2118 of the client device 2104 controls the communication unit 2114 of the client device 2104 based on user operation input to the GUI displayed on the display device 2120, so as to transmit commands for specifying the values of parameters of the network camera 2102 to the network camera 2102.
[0613] The network camera system 2100 can determine whether the camera 2102 is using zoom or pan during video recording, and can use such information when encoding the video stream because zoom or pan during recording can benefit from the use of affine modes, which are well-suited for encoding complex movements such as zoom, rotation and / or stretch (which can be a side effect of panning, especially when the lens is a "fisheye" lens).
[0614] Figure 21 This is a diagram illustrating a smartphone 2200.
[0615] The smartphone 2200 includes a communication unit 2202, a decoding / encoding unit 2204, a control unit 2206, and a display unit 2208.
[0616] The communication unit 2202 receives encoded image data via a network.
[0617] Decoding unit 2204 decodes the encoded image data received by communication unit 2202.
[0618] Decoding unit 2204 decodes the encoded image data using the decoding methods described in the first to sixteenth embodiments. Decoding unit 2204 may use at least one of the decoding methods described in the first to sixteenth embodiments. In other instances, decoding / encoding unit 2204 may use a combination of the decoding methods described in the first to sixteenth embodiments.
[0619] The control unit 2206 controls other units in the smartphone 2200 based on user operations or commands received from the communication unit 2202.
[0620] For example, the control unit 2206 controls the display device 2208 to display the image decoded by the decoding unit 2204.
[0621] The smartphone may also include an image recording device 2210 (e.g., a digital camera and associated circuitry) for recording images or videos. Such recorded images or videos can be encoded by a decoding / encoding unit 2204 under the instruction of the control unit 2206.
[0622] The smartphone may also include a sensor 2212 suitable for sensing the orientation of the mobile device. Such a sensor may include an accelerometer, gyroscope, compass, global positioning (GPS) unit, or similar position sensor. This sensor 2212 can determine whether the smartphone has changed orientation, and this information can be used when encoding video streams, as changes in orientation during recording can benefit from the use of affine patterns, which are well-suited for encoding complex motions such as rotation.
[0623] Replacement and modification
[0624] It should be understood that the purpose of this invention is to ensure the most efficient use of affine patterns, and some of the examples discussed above involve signaling the use of an affine pattern based on the perceived likelihood that the affine pattern is useful. Further examples of the invention can be applied to encoders when it is known that complex motion is being encoded (where affine transformations may be particularly effective). Examples of such cases include:
[0625] a) Camera zoom in / out
[0626] b) A portable camera (e.g., a mobile phone) changes orientation (i.e., rotates or moves) during shooting.
[0627] c) Panning of a fisheye lens camera (e.g., stretching / distortion of a portion of the image).
[0628] In this way, instructions for complex motions can be given during recording and processing, making it more possible to give affine modes to slices, frame sequences, or actually the entire video stream.
[0629] In a further example, the affine mode may be more likely to be used depending on the characteristics or functions of the device used to record video. For example, a mobile device may be more likely to change orientation than a fixed security camera, so an affine mode may be more suitable for encoding video from the former. Examples of characteristics or functions include: the presence / use of a zoom component, the presence / use of a position sensor, the presence / use of a pan / tilt component, whether the device is portable, or user selection on the device.
[0630] Although the invention has been described with reference to embodiments, it should be understood that the invention is not limited to the disclosed embodiments. Those skilled in the art will understand that various changes and modifications can be made without departing from the scope of the invention as defined by the appended claims. All features disclosed in this specification (including any appended claims, abstract, and drawings), and / or all steps of any disclosed method or process, can be combined in any combination except for at least some mutually exclusive combinations of such features and / or steps. Unless otherwise expressly stated, the various features disclosed in this specification (including any appended claims, abstract, and drawings) may be replaced by alternative features for the same, equivalent, or similar purposes. Therefore, unless otherwise expressly stated, the various features disclosed are merely examples of a general series of equivalent or similar features.
[0631] It should also be understood that any result of the above comparisons, determinations, evaluations, selections, executions, processes, or considerations (e.g., selections made during encoding or filtering processes) may be indicated in or determined / inferred from data in the bitstream (e.g., flags or data indicating the result), such that the indicated or determined / inferred result may be used for processing rather than actually being compared, determined, evaluated, selected, executed, processed, or considered, for example, during decoding processes.
[0632] In the claims, the word "comprising" does not exclude other elements or steps, and the indefinite articles "a" or "an" do not exclude multiple elements. The mere fact that different features are recited in mutually different dependent claims does not indicate that a combination of these features cannot be used advantageously.
[0633] The reference numerals appearing in the claims are for illustrative purposes only and should not be construed as limiting the scope of the claims.
[0634] In the foregoing embodiments, the described functions can be implemented in hardware, software, firmware, or any combination thereof. If implemented in software, the functions can be stored as one or more instructions or code on or transmitted through a computer-readable medium, and can be executed by a hardware-based processing unit.
[0635] Computer-readable media may include computer-readable storage media, which correspond to tangible media such as data storage media or communication media that include, for example, any medium facilitating the transfer of a computer program from one place to another according to a communication protocol. In this way, computer-readable media may generally correspond to (1) a non-transitory tangible computer-readable storage medium or (2) a communication medium such as a signal or carrier wave. Data storage media may be any available medium accessible by one or more computers or one or more processors to retrieve instructions, code, and / or data structures for implementing the techniques described herein. Computer program products may include computer-readable media.
[0636] By way of example and not limitation, such computer-readable storage media may include RAM, ROM, EEPROM, CD-ROM or other optical disc storage, disk storage or other magnetic storage devices, flash memory, or any other medium that can be used to store desired program code in the form of instructions or data structures and that can be accessed by a computer. Furthermore, any connection may be appropriately referred to as a computer-readable medium. For example, coaxial cable, fiber optic cable, twisted pair, DSL, or wireless technology (such as infrared, radio, and microwave) are included in the definition of medium if instructions are sent from a website, server, or other remote source using coaxial cable, fiber optic cable, twisted pair, digital subscriber line (DSL), or wireless technology (such as infrared, radio, and microwave). However, it should be understood that computer-readable storage media and data storage media do not include connections, carrier waves, signals, or other transient media, but rather refer to non-transient tangible storage media. The terms disk and disc used herein include compact discs (CDs), laser discs, optical discs, digital versatile discs (DVDs), floppy disks, and Blu-ray discs, where disks typically copy data magnetically, while discs optically reproduce data using lasers. The above combinations should also be included within the scope of computer-readable media.
[0637] Instructions can be executed by one or more processors, such as one or more digital signal processors (DSPs), general-purpose microprocessors, application-specific integrated circuits (ASICs), field-programmable gate / logic arrays (FPGAs), or other equivalent integrated or discrete logic circuits. Therefore, the term "processor" as used herein can refer to any of the foregoing structures or any other structure suitable for implementing the techniques described herein. Additionally, in some aspects, the functionality described herein can be provided within dedicated hardware and / or software modules configured for encoding and decoding, or incorporated into combined codecs. Furthermore, the technique can be fully implemented within one or more circuit or logic elements.
Claims
1. A method for encoding a motion information prediction sub-index, the method comprising: One of a first mode and a second mode is determined from multiple modes as the mode for predicting motion information of the block to be encoded, wherein the first mode is a sub-block merging mode with sub-block affine prediction, and the second mode is a merging mode without sub-block affine prediction. When using the first mode, a first list of first mode motion information prediction sub-candidates is generated; one of the first mode motion information prediction sub-candidates in the first list is selected; a first motion information prediction sub-index for the selected first mode motion information prediction sub-candidate is generated; and the first motion information prediction sub-index is encoded using CABAC encoding, wherein all bits of the first motion information prediction sub-index except the first bit are encoded using bypass encoding, and the first bit of the first motion information prediction sub-index is encoded using a first context variable via CABAC encoding; and When using the second mode, a second list of second mode motion information prediction sub-candidates without affine motion information prediction sub-candidates is generated. One of the second mode motion information prediction sub-candidates in the second list is selected, a second motion information prediction sub-index for the selected second mode motion information prediction sub-candidate is generated, and the second motion information prediction sub-index is encoded using CABAC encoding. The second list includes one or more spatial motion information prediction sub-candidates that are associated with a block in the same frame and adjacent to the block to be encoded. All bits of the second motion information prediction sub-index except the first bit are encoded by bypass encoding, and the first bit of the second motion information prediction sub-index is encoded using a second context variable by CABAC encoding.
2. The method according to claim 1, wherein, The method further includes, when using the first mode, including data used to indicate the use of the first mode in the bit stream.
3. The method according to claim 1, further comprising: The data used to determine the maximum number of motion information prediction sub-candidates that can be included in the list of generated first-mode motion information prediction sub-candidates or second-mode motion information prediction sub-candidates is included in the bit stream.
4. A method for decoding a motion information prediction sub-index, the method comprising: One of a first mode and a second mode is determined from multiple modes as the mode for predicting motion information of the block to be decoded, wherein the first mode is a sub-block merging mode with sub-block affine prediction, and the second mode is a merging mode without sub-block affine prediction. When using the first mode, a first list of motion information prediction sub-candidates for the first mode is generated; When using the second mode, a second list of second mode motion information prediction sub-candidates without affine motion information prediction sub-candidates is generated, wherein the second list includes one or more spatial motion information prediction sub-candidates that are associated with and adjacent to the block to be decoded in the same frame. When using the first mode, the first motion information prediction sub-index is decoded using CABAC decoding, wherein all bits of the first motion information prediction sub-index except the first bit are decoded by bypass decoding, and the first bit of the first motion information prediction sub-index is decoded by CABAC decoding using the first context variable. When using the second mode, the second motion information prediction sub-index is decoded using CABAC decoding, wherein all bits of the second motion information prediction sub-index except the first bit are decoded by bypass decoding, and the first bit of the second motion information prediction sub-index is decoded by CABAC decoding using the second context variable. When using the first mode, the decoded first motion information prediction sub-index is used to identify one of the first mode motion information prediction sub-candidates in the first list; and When using the second mode, the decoded second motion information prediction sub-index is used to identify one of the second mode motion information prediction sub-candidates in the second list.
5. The method according to claim 4, further comprising: Obtain data from the bitstream to determine the maximum number of motion information prediction sub-candidates that can be included in the generated first or second list.
6. The method according to claim 4, wherein, The motion information prediction sub-candidates include information used to obtain motion vectors.
7. The method according to claim 4, wherein, The generated list of first-mode motion information prediction sub-candidates or the list of second-mode motion information prediction sub-candidates includes candidates for juxtaposed sub-block time prediction.
8. An apparatus for encoding a motion information prediction sub-index, the apparatus comprising: A determining component is used to determine one of a first mode and a second mode from a plurality of modes as a mode for predicting motion information of a block to be encoded, wherein the first mode is a sub-block merging mode with sub-block affine prediction, and the second mode is a merging mode without sub-block affine prediction. A generation component is configured to, when using the first mode, generate a first list of first mode motion information prediction sub-candidates and select one of the first mode motion information prediction sub-candidates in the first list, and when using the second mode, generate a second list of second mode motion information prediction sub-candidates that do not have affine motion information prediction sub-candidates and select one of the second mode motion information prediction sub-candidates in the second list, wherein the second list includes one or more spatial motion information prediction sub-candidates that are associated with a block in the same frame and adjacent to the block to be encoded. An index generation component is configured to generate a first motion information prediction sub-index for selected first mode motion information prediction sub-candidates when using the first mode, and to generate a second motion information prediction sub-index for selected second mode motion information prediction sub-candidates when using the second mode; and An encoding component is configured to encode the first motion information prediction sub-index using CABAC encoding when using the first mode, and to encode the second motion information prediction sub-index using CABAC encoding when using the second mode, wherein all bits of the first motion information prediction sub-index except the first bit are encoded by bypass encoding, and the first bit of the first motion information prediction sub-index is encoded by CABAC encoding using a first context variable, and all bits of the second motion information prediction sub-index except the first bit are encoded by bypass encoding, and the first bit of the second motion information prediction sub-index is encoded by CABAC encoding using a second context variable.
9. An apparatus for decoding a motion information prediction sub-index, the apparatus comprising: A determining component is used to determine one of a first mode and a second mode from a plurality of modes as a mode for predicting motion information of a block to be decoded, wherein the first mode is a sub-block merging mode with sub-block affine prediction, and the second mode is a merging mode without sub-block affine prediction. A generation component is configured to generate a first list of first-mode motion information prediction sub-candidates when using the first mode, and to generate a second list of second-mode motion information prediction sub-candidates that does not have affine motion information prediction sub-candidates when using the second mode, wherein the second list includes one or more spatial motion information prediction sub-candidates that are available if they are associated with a block in the same frame and adjacent to the block to be decoded. A decoding unit is configured to decode a first motion information prediction sub-index using CABAC decoding when using the first mode, and to decode a second motion information prediction sub-index using CABAC decoding when using the second mode, wherein all bits of the first motion information prediction sub-index except the first bit are decoded via bypass decoding, and the first bit of the first motion information prediction sub-index is decoded via CABAC decoding using a first context variable; all bits of the second motion information prediction sub-index except the first bit are decoded via bypass decoding, and the first bit of the second motion information prediction sub-index is decoded via CABAC decoding using a second context variable; and The identification component is configured to, when using the first mode, identify one of the first mode motion information prediction sub-candidates in the first list using the decoded first motion information prediction sub-index, and when using the second mode, identify one of the second mode motion information prediction sub-candidates in the second list using the decoded second motion information prediction sub-index.
10. A computer-readable storage medium comprising computer-executable instructions that cause a computer to perform a method for encoding a motion information prediction sub-index, the method comprising: One of a first mode and a second mode is determined from multiple modes as the mode for predicting motion information of the block to be encoded, wherein the first mode is a sub-block merging mode with sub-block affine prediction, and the second mode is a merging mode without sub-block affine prediction. When using the first mode, a first list of first mode motion information prediction sub-candidates is generated; one of the first mode motion information prediction sub-candidates in the first list is selected; a first motion information prediction sub-index for the selected first mode motion information prediction sub-candidate is generated; and the first motion information prediction sub-index is encoded using CABAC encoding, wherein all bits of the first motion information prediction sub-index except the first bit are encoded using bypass encoding, and the first bit of the first motion information prediction sub-index is encoded using a first context variable via CABAC encoding; and When using the second mode, a second list of second mode motion information prediction sub-candidates without affine motion information prediction sub-candidates is generated. One of the second mode motion information prediction sub-candidates in the second list is selected, a second motion information prediction sub-index for the selected second mode motion information prediction sub-candidate is generated, and the second motion information prediction sub-index is encoded using CABAC encoding. The second list includes one or more spatial motion information prediction sub-candidates that are associated with a block in the same frame and adjacent to the block to be encoded. All bits of the second motion information prediction sub-index except the first bit are encoded by bypass encoding, and the first bit of the second motion information prediction sub-index is encoded using a second context variable by CABAC encoding.
11. A computer-readable storage medium comprising computer-executable instructions that cause a computer to perform a method for decoding a motion information prediction sub-index, the method comprising: One of a first mode and a second mode is determined from multiple modes as the mode for predicting motion information of the block to be decoded, wherein the first mode is a sub-block merging mode with sub-block affine prediction, and the second mode is a merging mode without sub-block affine prediction. When using the first mode, a first list of motion information prediction sub-candidates for the first mode is generated; When using the second mode, a second list of second mode motion information prediction sub-candidates without affine motion information prediction sub-candidates is generated, wherein the second list includes one or more spatial motion information prediction sub-candidates that are associated with and adjacent to the block to be decoded in the same frame. When using the first mode, the first motion information prediction sub-index is decoded using CABAC decoding, wherein all bits of the first motion information prediction sub-index except the first bit are decoded by bypass decoding, and the first bit of the first motion information prediction sub-index is decoded by CABAC decoding using the first context variable. When using the second mode, the second motion information prediction sub-index is decoded using CABAC decoding, wherein all bits of the second motion information prediction sub-index except the first bit are decoded by bypass decoding, and the first bit of the second motion information prediction sub-index is decoded by CABAC decoding using the second context variable. When using the first mode, the decoded first motion information prediction sub-index is used to identify one of the first mode motion information prediction sub-candidates in the first list; and When using the second mode, the decoded second motion information prediction sub-index is used to identify one of the second mode motion information prediction sub-candidates in the second list.